python AST vs XML

schettino72

2014-09-16 09:46

One of the things that I really like in python is its introspection capabilities. It goes as far as exposing its own syntax tree with the ast module.

AST

The ast module is usually used to do progammatically analysis, generation, refactoring/transformation of python code. But is it the right tool for the job?

API

The ast provides a very simple API with NodeVisitor subclasses being called for every node in the tree. It also provides a similar NodeTransformer to modify nodes in-place. This API feels a lot like SAX (Simple API for XML), an event sequential access parser API.

SAX style is simple, efficient and very useful in some situations, but not very powerful compared to DOM or other API's that support XPath. XPath is a very expressive query language for selecting nodes. In python there are several libraries that support it (on stdlib, lxml, ...)

Comments and Formatting

The ast throws away every information that is not important for the compiler, like code comments and formatting (white-space, new-lines, etc). This makes it very hard to modify existing code without messing up with other parts of the original source code.

The node tree

The AST tree is designed to be used by the compiler. It might no be optional for other uses...

py2xml

py2xml is a tool to covert python code into XML. It is already good enough to do lose-less (preserve formatting and comments) round-trip conversion of python to XML and back to python.

py2xml is still WIP (work in progress). The next step is to define a XML structure to make it easy for querying and transforming source code. Development is going on github.

Merging dicts using singledispatch

schettino72

2013-12-15 00:00

Comments

I am working on some code that reads config values from different sources. I would like to merge them in a way that values that are themselves dicts get the items from both sources, something like:

source_1 = {'foo': 10, 'options': {'a':1, 'b':1}
source_2 = {'foo': 20, 'options': {'a':2, 'c':3}

# merging source_1 and source_2 would give me:
{'foo': 20, 'options': {'a':2, 'b':1, 'c':3}

Notice how this is different from a simple dict.update() because options contains an item 'b'. Apart from that I would like list to be merged, that is merging [1, 2] and [3, 4] would result in [1 ,2, 3, 4].

I found some implementations but it seems that everybody has a different use-case for what "merge" should exactly do. So I decided to create a MergeDict that can be easily extended/configured.

singledispatch

The merge operation is specific depending on the type of the item. That makes a perfect opportunity to use singledispatch. Python 3.4 added support for single dispatch (see PEP 443). I am not using python 3.4 yet but there is also a package on pypi.

A single-dispatch is form of generic programming where you register different functions to be executed depending on the type of the first argument.

Here is a simple example:

from singledispatch import singledispatch

# the function to be executed by default gets a singledispatch decorator
@singledispatch
def fun(arg):
    print("default: {}".format(arg))

# register alternate function when argument is an int
@fun.register(int)
def _(arg):
    print("int: {}".format(arg))

# register alternate function when argument is a list
@fun.register(list)
def _(arg):
    print("list: {}".format(" ".join(str(a) for a in arg)))


fun('hi')
# default: hi

fun(1)
# int: 1

fun([1,2,3])
# list: 1 2 3

MergeDict

A MergeDict defines a merge() method. Each item is merged using the merge_value() function, by default it works the same as update() just replacing the value...

class MergeDict(dict):

    def merge(self, other):
        """merge other dict into self"""
        class Sentinel: pass
        for key, other_value in other.items():
            this_value = self.get(key, Sentinel)
            if this_value is Sentinel:
                self[key] = other_value
            else:
                self[key] = self.merge_value(this_value, other_value)

    @staticmethod
    def merge_value(this, other):
        """default merge operation, just replace the value"""
        return other

Applying singledispatch to class methods is a bit tricky, but ideally users would sub-class from MergeDict and define new merge() methods with a decorator similar to singledispatch.

The idea is to just annotate the methods but really register the dispatch only on object initialization.

class MergeDict(dict):

    def __init__(self, *args, **kwargs):
        super(MergeDict, self).__init__(*args, **kwargs)
        # register singlesingle dispatch methods
        self.merge_value = singledispatch(self.merge_value)
        for val in self.__class__.__dict__.values():
            _type = getattr(val, 'merge_dispatch', None)
            if _type:
                self.merge_value.register(_type, val)


    class dispatch:
        """decorator to mark methods as single dispatch functions."""
        def __init__(self, _type):
            self._type = _type
        def __call__(self, func):
            func.merge_dispatch = self._type
            return func

An example of a merge that sum up int values:

from mergedict import MergeDict

class SumDict(MergeDict):

    @MergeDict.dispatch(int)
    def merge_int(this, other):
        return this + other

a = SumDict({'a':2, 'b':3})
a.merge({'a':2, 'c':5})
print(a)
# {'a':4, 'b':3, 'c':5}

TL;DR;

Check mergedict on pypi.

github pull request workflow

schettino72

2013-07-22 10:00

Comments

A quick reference so I don't need to re-learn these steps every time I do a pull-request...

clone the repo in github
clone the repo on your machine

git clone xxx

create a branch

git branch fix-something

use the branch

git checkout fix-something

hack something then commit

git commit -a

push new branch

git push origin fix-something

on github click create pull request (or something like that)