Beware of Dumb Objects

In some projects I often see systems which look like this:

class InvoiceFactory
{
    public function __construct($service);
    public function createObjects($param);
}

class Invoice
{
    public function __construct($array);
    public function createObjects();
}

class InvoiceProducts
{
    public function __construct($object);
}

class WarehouseInvoice
{
    public function __construct($invoice, $products);
}

// Classes are used like this:
$invoices = new InvoiceFactory($database);
// Load database rows by id.
$array = $invoices->fetchInvoices($someId);
// Create useable objects from those rows
$invoice = new Invoice($array);
// List of products
$productList = $invoice->getProducts();
// Usable list products.
$products = InvoiceProducts($productList);
// Convert those rows into a usable form.
$objectsDto = $products->fetchProducts();
// Convert one useable form into the one that we actually need.
$products = new WarehouseProducts($invoice, $products);
// Convert list of products to one we can use in a template
$yetAnotherDto = $products->listProducts();
// Actually use the object.
doSomething($yetAnotherDto[0]->actualProperty);

Here we’ve gone from a factory to an object to a list of objects and back to a single object again several times. This can go on and on through several iterations of tiny one-function stateless objects that just convert data.

In this example, the code never actually does anything until the objects are converted into primitive data because all the models only accept primitive data. It seems attractive at first that all the objects would be without dependencies and behave as simple services, however there are several problems with this:

  • usage code is long
  • all objects involved need to be understood in terms of how they accept and produce data in order to know what’s going on
  • difficult to include data from other services without performing more data wrangling
  • difficult to change the behaviour of primitive data since it’s all pre-computed
  • no common API for access to data derived from the service, every client has to build their own conversion
  • difficult to unit test correct usage because it all relies on integration

There seems to be a natural tendency to obsessively break objects down to their smallest possible form, especially around a single database table or similar technical object.

I think the example above would have been just as good with a single object of functions to perform these conversions.

When you find yourself implementing this kind of pattern, consider using just one root service model and smarter objects to access the data.

// Persistence and external service access.
class ObjectService
{
    public function __construct($service);
    public function fetchObjectById($id);
    public function fetchExtraObjectData($id);
}

// Containing value object.
class ObjectList
{
    public function __construct($service, $id);
    public function getActualObjects();
}

// Element value object.  This can still use the ObjectService.
class ActualObject 
{
    public function __construct($objectList, $id);
    public function getActualProperty();
}

// Declare dependencies.
$service = new ObjectService($database);
$objects = new ObjectList($service, $id);
// Load usable data.
$list->getActualObjects()[0]->getActualProperty();

This gives you a more flexible implementation where your data access concerns are dealt with by the smart value objects and the service concerns are dealt with the root object service. The maintainer does not need to know anything about the interactions between these objects, they only need to know the dependencies (which should be defined by the constructor interface).

The conclusions I draw are:

  1. don’t be scared of high-level value objects
  2. dependencies are fine if they mean client code needs to know less
  3. create new abstractions only when they remove work from client code
Beware of Dumb Objects

W3af Quickstart

Sometimes a blog post is better than memory.

mkdir -p src/w3af
cd src/w3af
git clone --depth 1 https://github.com/andresriancho/w3af.git .
virtualenv venv
# It does seem to be necessary to do this rather than just using the bins.
./venv/bin/activate
./w3af_console
# This is generated by the consolescript
. /tmp/w3af_dependency_install.sh

Next we’ll create a usable profile to spider a website and try to xss it.

./venv/bin/python ./w3af_console
w3af>>> plugins
w3af/plugins>>> list crawl
(snip)
w3af/plugins>>> crawl web_spider
w3af/plugins>>> audit xss
w3af>>> profiles
w3af/profiles>>> save_as spider-xss
Profile saved.

(To disable a plugin later use type !plugin, e.g. audit !xss.)

When you next use w3af you can run profile use spider-xss.

Now exploit some things:

w3af>>> target set target https://site-you-own.example.com
The configuration has been saved.
w3af>>> 
w3af>>> start
New URL found by web_spider plugin: "https://site-you-own.example.com/"
(...)

Pressing enter will give you a quick status on what it’s doing.

Note that in my version the console will hang if you haven’t yet set a target which will mean you have to reconfigure everything so take care to avoid this.

Wait a very long time and you’ll get a report. Any found exploits will be printed out in real time.

It’s a good idea to test your profile like this:

w3af>>> plugins audit xss
w3af>>> plugins crawl web_spider
w3af>>> target set target http://localhost/inject-me.php
w3af>>> plugins crawl config web_spider 
w3af/plugins/crawl/config:web_spider>>> set only_forward True 
w3af/plugins/crawl/config:web_spider>>> back
The configuration has been saved.
w3af>>> start

Here’s the exploitable script:

<html>
<head>
</head>
<body>

<?php echo @$_GET['text']; ?>

<form method="GET">
<input type="text" name="text">
<input type="submit" value="go">
</form>

The result:

w3af>>> start
New URL found by web_spider plugin: "http://localhost/inject-me.php"
A Cross Site Scripting vulnerability was found at: "http://localhost/inject-me.php", using HTTP method GET. The sent data was: "text=" The modified parameter was "text". This vulnerability was found in the request with id 37.
Found 1 URLs and 2 different injections points.
The URL list is:
- http://localhost/inject-me.php
The list of fuzzable requests is:
- Method: GET | http://localhost/inject-me.php/inject-me.php
- Method: GET | http://localhost/inject-me.php/inject-me.php | URL encoded form: (text)
Scan finished in 1 second.
Stopping the core...

The output plugins are also useful.

plugins
  output text_file
  output config text_file
    set output_file output-w3af.txt
    set verbose True
    back

The above will produce more greppable output. Useful if you’re spidering a whole site with thousands of pages.

$ grep ' vulnerability\]' output-w3af.txt
[Wed Aug 10 14:09:36 2016 - vulnerability] A Cross Site Scripting vulnerability was found at: "http://localhost/inject-me.php", using HTTP method GET. The sent data was: "text=" The modified parameter was "text". This vulnerability was found in the request with id 37.

Not sure why it tells you that the sent data was “text=”. It looks from the request log like it actually sent something else entirely. I think the problem may be that it’s looking for the string =" in the document.

GET /inject-me.php?text=shlj2%3C%2F-%3Eshlj2%2F%2Ashlj2%22shlj2shlj2%27shlj2shlj2%60shlj2shlj2%20%3D

So over all I’m not too impressed with the xss detection. There are far to many false positives, the UI is rather buggy, and it’s difficult to understand exactly why it reports empty strings as injections. Hopefully someone will find this braindump more useful though.

W3af Quickstart

Event Based Actions to Remove Boilerplate Code

In my application I tend to have a large number of actions that fall into a set of very similar boilerplate structures. They are never close enough to use a standardised model framework and there are too many different structures to use fancy controller configuration. A useful pattern to deal with this is to use events to remove the boilerplate.

public function createSomeObjectAction() 
{
    $form = new SomeForm();
    $editor = new SomeEditor();
    $action = FormAction::create($this, $form);

    $action->onValid(function($context) use($editor) {
       $id = $editor->createSomeObject($context->getValidData());
       return $this->redirect()->toRoute('EditObject', array('id' => $id));
    });

    $action->onOther(function($context) {
       $view = new \Zend\ViewModel\ViewModel();
       $view->setVariable('form', $context->getForm());
       $view->setTemplate('app/generic/bootstrap-from');
       return $view;
    });

    return $action->run();
}

It’s useful to see if you can write controller actions this way without using any if statements. This makes it easy to establish integration tests which just trigger each of the events. You also don’t need to do any deep zend specific configuration like you would with rendering strategies or event listeners — it’s “just php”.

The alternative here is to implement some interface which has its own idea of the onValid and onOther functions and passing this to some strategy.

public function createSomeObjectAction() 
{
    $editor = new SomeEditor();
    $model = new SomeObjectActionModel();
    $model->setEditor($editor);
    return ActionModelRunner::create($model)->run();
}

While this makes the action a lot shorter, it means that the maintainer now has to understand more APIs: the controller action, the action model interface, the action model, and the action model runner. With the event based system you only really need to understand the controller action and the FormAction. The functionality is right in front of you.

I find in general that reducing the number of APIs that the maintainer has to deal with is usually a good idea up to the point where the implementation complexity is too hard to understand in one go. These actions are basically just chaining function calls together — no if statements — so abstracting things does not have any benefit.

Another use case is to deal with errors as a structure:

public function createSomeObjectAction() 
{
    return $this->handleErrors(function() {
        $model = new SomeModel();
        $model->doSomethingFailable();
        return ViewModel();
    });
}

private function handleErrors($action) 
{
    $handler = new ApiErorrHandlerAction($action);
    $handler->on('SpecificException', function($handler, $ex) {
        return $handler->respondWithTemporaryError($ex->getMessage());
    });
    return $handler->run();
}

This is very useful when dealing with APIs which can throw errors at any point and reduces your need for the same boilerplate try-catch in every single action. Instead your intention is shown by the structure of the action.

The drawback is that your generic action needs to be somewhat understood by the reader. This is not too bad in the case of the FormAction example, but it’s more of a problem when you write something like a PaginatedListingAction or a CreateOrEditAction which will abstract dealing with your model code. It’s important to avoid making things so abstract and deeply layered that the maintainer can’t understand how the alter the behaviour of the system.

That said, if you have a lot of actions with the same structure, this technique of declaring behaviour as events can be more understandable and more flexible than a highly engineered framework based on model interfaces.

Event Based Actions to Remove Boilerplate Code

Getting Started With SciPy

user-agent-graph

SciPy is the name for a collection of python packages used for data analysis and similar scientific pursuits. This document is a rough braindump of what it takes to get it installed in a virtual environment on Linux since I have to do it fairly frequently and usually forget some of it.

Basic Installation

First install some OS packages.

# It's way too hard to get these working in a virtualenv
sudo aptitude install python-gtk2 python-qt4
# Dependencies of some things we're going to build
sudo aptitude install libblas-dev liblapack-dev
# Many things will want to compile fortran.
sudo aptitude install gfortran

Next build the virtual environment.

Before you do this, make sure to configure the pip cache. This saves you sloooowly redownloading everything if the build doesn’t work first time.

$ cat ~/.pip/pip.conf
[global]
download_cache = ~/var/cache/pip

Then you can install.

virtualenv ~/var/venvs/pylab
~/var/venvs/pylab/bin/pip install numpy 
~/var/venvs/pylab/bin/pip install scipy
~/var/venvs/pylab/bin/pip install pandas

If you don’t get it working then google the error and you’ll almost certainly be
able to find a missing dependency.

Here is what is in my venv. (I could not get pip freeze to work while site-packages were enabled but you can just touch the file again and it will work).

bunker@normandie:~/var/venv/pylab$ ./bin/pip freeze -v
Jinja2==2.8
MarkupSafe==0.23
Pygments==2.0.2
argparse==1.2.1
backports.ssl-match-hostname==3.4.0.2
certifi==2015.9.6.2
decorator==4.0.4
funcsigs==0.4
functools32==3.2.3.post2
ipykernel==4.1.1
ipython==4.0.0
ipython-genutils==0.1.0
jsonschema==2.5.1
jupyter-client==4.1.1
jupyter-core==4.0.6
matplotlib==1.4.3
mistune==0.7.1
mock==1.3.0
nbconvert==4.0.0
nbformat==4.0.1
nose==1.3.7
notebook==4.0.6
numpy==1.10.1
pandas==0.17.0
path.py==8.1.2
pbr==1.8.1
pexpect==4.0.1
pickleshare==0.5
ptyprocess==0.5
pyparsing==2.0.3
python-dateutil==2.4.2
pytz==2015.6
pyzmq==14.7.0
qtconsole==4.1.0
scipy==0.16.0
simplegeneric==0.8.1
six==1.10.0
terminado==0.5
tornado==4.2.1
traitlets==4.0.0
wsgiref==0.1.2

Using OS Python for Some Things

I briefly mentioned to use the OS for python's Qt and GTK bindings. If you fotgot to do this then you can easily turn your virtualenv into a non-isolated version by removing the following empty file in your virtualenv:

rm ~/var/venv/pylab/lib/python2.7/no-global-site-packages.txt

You can put it back again to reverse the operation.

If you don’t mind messing with symlinks then you can simply use ln -s for those directories in the global site-packages that you want.

Using Graphical Consoles

Install some more things:

~/var/venvs/pylab/bin/pip install ipython
~/var/venvs/pylab/bin/pip install notebook
~/var/venvs/pylab/bin/pip install qtconsole

You can pretty much explore these and see what works best for you.

Ipython is the standard command-line shell with colours and completion and similar. If you have no graphical environment then you can still write graphs to file. An image viewer like feh is useful for these.

If you install pylab then using --pylab inline will give you a bunch of matlab like imports.

Notebook gives you an html browser and shareable notebooks.

Qtconsole is the same as ipython but has graphical output. This is particularly useful since when you generate graphs they will show up inline.

Missing Things

I did not install pylab since I didn’t think I needed it so the above pip freeze does not include this. Pylab gives you a matlab-like environment and since I’m not familiar with matlab (and merely want to write scripts that produce pretty graphs) I didn’t see a need for this. The --pylab argument for ipython needs pylab.

I found when installing it later that it was necessary to configure my PATH using the activate script from virtualenv or it wouldn’t find certain dependencies. Also cython was necessary.

~/var/venvs/pylab/bin/pip install pylab 

Example Script

Here’s the very quickly thrown together script which produced the attached image:

import numpy as np
# If no X11 DISPLAY then we need to set a backend or it won't generate the png
# import matplotlib
# matplotlib.use('Agg')
import matplotlib.pyplot as plt
import pandas as pd

import re
import woothee
import datetime
from dateutil.parser import parse as parse_date

parts = [
r'(?P\S+)', # host %h
r'\S+', # indent %l (unused)
r'(?P\S+)', # user %u
r'\[(?P.+)\]', # time %t
r'&quot;(?P.*)&quot;', # request &quot;%r&quot;
r'(?P[0-9]+)', # status %&gt;s
r'(?P\S+)', # size %b (careful, can be '-')
r'&quot;(?P.*)&quot;', # referrer &quot;%{Referer}i&quot;
r'&quot;(?P.*)&quot;', # user agent &quot;%{User-agent}i&quot;
]
pattern = re.compile(r'\s+'.join(parts)+r'\s*\Z')

count = 0
agents = {}
times = {}

# I guess the problem here is you have to re-parse this massive set of data
# rather than just filtering it down in memory. I could probably have stored
# records in the data frame.
with open(&quot;raw&quot;) as io:
    for line in io:
        count += 1
        hit = pattern.match(line).groupdict()

        time = re.search(r'\[([^\]]+)\]', line).group(1)
        time, zone = hit['time'].split(&quot; &quot;)
        time = datetime.datetime.strptime(time, &quot;%d/%b/%Y:%H:%M:%S&quot;)

        if hit['request'].startswith(&quot;GET /images/&quot;):
            continue

        if hit['request'].startswith(&quot;GET /product_thumb.php&quot;):
            continue

        if hit['request'].startswith(&quot;GET /js/&quot;):
            continue

        if hit['request'].startswith(&quot;GET /css/&quot;):
            continue

        if hit['request'].startswith(&quot;GET /scripts/&quot;):
            continue

        agent = hit['agent']
        agent_details = woothee.parse(hit['agent'])

        agent_name = agent_details['name']
        if agent_name == &quot;UNKNOWN&quot;:
            if agent.startswith(&quot;curl/&quot;) and 'criteo' in agent:
                agent_name = &quot;Criteo Curl&quot;
            elif agent.startswith(&quot;python-request/&quot;):
                agent_name = &quot;Probably ESCIA&quot;
            else:
                agent_name = agent

        times.setdefault(time, {})
        times[time].setdefault(agent_name, 0)
        times[time][agent_name] += 1

        agents.setdefault(agent_name, {})
        agents[agent_name].setdefault(time, 0)
        agents[agent_name][time] += 1

        # if count &gt; 1000:
        # break

start_date = min(times.keys())
end_date = max(times.keys())

date = start_date.date()

start_date = datetime.datetime.combine(date, datetime.time(12, 43, 0))
end_date = datetime.datetime.combine(date, datetime.time(13, 23, 0))
dates = pd.date_range(start_date, end_date, freq='S')

tses = {}
sums = []

# There maybe be easier ways to do this but not ovious from a quick scan of the
# docs.
for name in agents:
    ser = pd.Series(agents[name], index=dates)
    sums.append((name, ser.sum()))
    tses[name] = ser

sums.sort(key=lambda elt: elt[1], reverse=True)
biggest = [name for (name, _) in sums[0:10]]

# or tses.keys() for all.
columns = biggest

# Label is so huge it's impossible to read.
df = pd.DataFrame(tses, index=dates, columns=columns)

# This does return a value, not change state. Don't get confused!
df = df.fillna(0)

# It does look somewhat crappy... but there are many other functions to help.
df.plot(figsize=(24, 8), stacked=True)

# Uses confusing global state but it works.
plt.legend(loc=2, prop={'size':6})
plt.savefig(&quot;out.png&quot;)
Getting Started With SciPy

Meaning of Special Characters in Vim

You will often see characters like “^@” and “^M” in files opened by vim, but it’s not well documented how to find what they actually mean.

These are called “digraphs” and, fortunately, the full list of these are available in the vim documentation.

http://vimdoc.sourceforge.net/htmldoc/digraph.html#digraph-table

To insert digraphs, you can use CTRL+K in insert mode and the two-letter alias for the digraph. CTRL+I will let you insert a literal character (e.g. a tab if you have those turned off). CTRL+V will insert a character by numeric value for example CTRL+V x 41 will insert a capital ‘A’.

These commands can be particularly useful in substitutions for junk characters, but if you need to type foreign characters it can be easier to set digraph and use backspace to add the accent to a character you just typed.

Meaning of Special Characters in Vim

Merging Separate Git Repositories

On occasion I need to merge two git repositories together. Fortunately git has a solution which I usually have to google every time I do it.

Firstly you really need an empty commit on the root otherwise you get a strange commit in the repository where files are still in their original location. That can cause you some problems later since this commit will conflict with basically everything.

git checkout --orphan newroot
git rm -rf .

git commit --allow-empty -m "Root commit"
git rebase --onto newroot --root master
git branch -d newroot

Source: http://stackoverflow.com/questions/645450/insert-a-commit-before-the-root-commit-in-git

Moving the files themselves is also easy by using a tree-filter.

git filter-branch --prune-empty \
  --tree-filter 'mkdir -p new-subdir; \
  git ls-tree --name-only $GIT_COMMIT | \
  xargs -I FILES mv FILES new-subdir'

Note that this is a tree-filter which involves checking out each revision in order to work on it and is therefore quite slow for some repositories. You can probably optimise this into an index-filter which would not do that.

Since the repositories won’t conflict now (I assume that new-subdir does not exist in the repository you’re merging into) then a simple git pull will merge the filtered repository into the other repository.

The commit history will look a bit weird because the dates will be out of order but you will still have the full history.

Merging Separate Git Repositories

Testing Upload Files In Zend Framework 2

I ran into a problem trying to write controller tests for one views which do file uploads. Unfortunately validation seems to fail every time for mocked requests with the unhelpful message “File was illegally uploaded. This could be a possible attack.”

Best I could find on the Internet:

if (IN_UNIT_TESTS) {
  $form->getInputFilte()->remove('name-of-upload-field');
}

Unfortunately this appears to break collections.

The problem is that ZF2 is unconditionally using is_uploaded_file in the UploadFile validator. (This is really a non-unit dependency so it should be mocked by default but it isn’t.) The FileInput input filter prepends that validator to the form without any option to override it. The File form element itself asks for the FileInput input filter by using its filter specification.

A solution is tricky because Zend Form’s heavily abstracted design design means that the call to “is_uploaded_file” is buried so many levels into Zend code. I think this kind of design is really only acceptable when you own the code and can get rid of these flaws fairly quickly. With Zend Form it means you effectively end up with your own form library on top of Zend’s which rather
defeats the object after a while.

The best workaround I found (once I worked out what was going on) was to replace the upload field entirely:

namespace FixedForms;

class FixedUploadFilter extends \Zend\InputFilter\FileInput {
  public function __construct() {
    $unhack 
      = defined('FIXEDFORMS_ENABLE_TEST_HACKS') 
      && FIXEDFORMS_ENABLE_TEST_HACKS;
    $this->setAutoPrependUploadValidator(! $unhack);
  }
}

class FixedUploadField extends \Zend\Form\Element\File {
  public function getInputSpecification() {
    return array(
      'type' => 'FixedForms\FixedUploadFilter',
      'name' => $this->getName(),
      'required' => false,
    );
  }
}

There may be a way to set the “auto prepend” property from the filter specification meaning you don’t need the extra InputFiler class but I lost patience before finding it.

This is an extra intrusive layer but hopefully it will save you a few hours trying hack around this problem.

Testing Upload Files In Zend Framework 2