Skip to main content

python AST vs XML

One of the things that I really like in python is its introspection capabilities. It goes as far as exposing its own syntax tree with the ast module.

AST

The ast module is usually used to do progammatically analysis, generation, refactoring/transformation of python code. But is it the right tool for the job?

API

The ast provides a very simple API with NodeVisitor subclasses being called for every node in the tree. It also provides a similar NodeTransformer to modify nodes in-place. This API feels a lot like SAX (Simple API for XML), an event sequential access parser API.

SAX style is simple, efficient and very useful in some situations, but not very powerful compared to DOM or other API's that support XPath. XPath is a very expressive query language for selecting nodes. In python there are several libraries that support it (on stdlib, lxml, ...)

Comments and Formatting

The ast throws away every information that is not important for the compiler, like code comments and formatting (white-space, new-lines, etc). This makes it very hard to modify existing code without messing up with other parts of the original source code.

The node tree

The AST tree is designed to be used by the compiler. It might no be optional for other uses...

py2xml

py2xml is a tool to covert python code into XML. It is already good enough to do lose-less (preserve formatting and comments) round-trip conversion of python to XML and back to python.

py2xml is still WIP (work in progress). The next step is to define a XML structure to make it easy for querying and transforming source code. Development is going on github.

Merging dicts using singledispatch

I am working on some code that reads config values from different sources. I would like to merge them in a way that values that are themselves dicts get the items from both sources, something like:

source_1 = {'foo': 10, 'options': {'a':1, 'b':1}
source_2 = {'foo': 20, 'options': {'a':2, 'c':3}

# merging source_1 and source_2 would give me:
{'foo': 20, 'options': {'a':2, 'b':1, 'c':3}

Notice how this is different from a simple dict.update() because options contains an item 'b'. Apart from that I would like list to be merged, that is merging [1, 2] and [3, 4] would result in [1 ,2, 3, 4].

I found some implementations but it seems that everybody has a different use-case for what "merge" should exactly do. So I decided to create a MergeDict that can be easily extended/configured.

singledispatch

The merge operation is specific depending on the type of the item. That makes a perfect opportunity to use singledispatch. Python 3.4 added support for single dispatch (see PEP 443). I am not using python 3.4 yet but there is also a package on pypi.

A single-dispatch is form of generic programming where you register different functions to be executed depending on the type of the first argument.

Here is a simple example:

from singledispatch import singledispatch

# the function to be executed by default gets a singledispatch decorator
@singledispatch
def fun(arg):
    print("default: {}".format(arg))

# register alternate function when argument is an int
@fun.register(int)
def _(arg):
    print("int: {}".format(arg))

# register alternate function when argument is a list
@fun.register(list)
def _(arg):
    print("list: {}".format(" ".join(str(a) for a in arg)))


fun('hi')
# default: hi

fun(1)
# int: 1

fun([1,2,3])
# list: 1 2 3

MergeDict

A MergeDict defines a merge() method. Each item is merged using the merge_value() function, by default it works the same as update() just replacing the value...

class MergeDict(dict):

    def merge(self, other):
        """merge other dict into self"""
        class Sentinel: pass
        for key, other_value in other.items():
            this_value = self.get(key, Sentinel)
            if this_value is Sentinel:
                self[key] = other_value
            else:
                self[key] = self.merge_value(this_value, other_value)

    @staticmethod
    def merge_value(this, other):
        """default merge operation, just replace the value"""
        return other

Applying singledispatch to class methods is a bit tricky, but ideally users would sub-class from MergeDict and define new merge() methods with a decorator similar to singledispatch.

The idea is to just annotate the methods but really register the dispatch only on object initialization.

class MergeDict(dict):

    def __init__(self, *args, **kwargs):
        super(MergeDict, self).__init__(*args, **kwargs)
        # register singlesingle dispatch methods
        self.merge_value = singledispatch(self.merge_value)
        for val in self.__class__.__dict__.values():
            _type = getattr(val, 'merge_dispatch', None)
            if _type:
                self.merge_value.register(_type, val)


    class dispatch:
        """decorator to mark methods as single dispatch functions."""
        def __init__(self, _type):
            self._type = _type
        def __call__(self, func):
            func.merge_dispatch = self._type
            return func

An example of a merge that sum up int values:

from mergedict import MergeDict

class SumDict(MergeDict):

    @MergeDict.dispatch(int)
    def merge_int(this, other):
        return this + other

a = SumDict({'a':2, 'b':3})
a.merge({'a':2, 'c':5})
print(a)
# {'a':4, 'b':3, 'c':5}

TL;DR;

Check mergedict on pypi.

github pull request workflow

A quick reference so I don't need to re-learn these steps every time I do a pull-request...

  • clone the repo in github

  • clone the repo on your machine

git clone xxx
  • create a branch
git branch fix-something
  • use the branch
git checkout fix-something
  • hack something then commit
git commit -a
  • push new branch
git push origin fix-something
  • on github click create pull request (or something like that)