power up your tools

schettino72

2012-12-06 16:36

Goal

pyflakes is a static checker for python. It is great but it lacks a few features:

no parallel multi-processing execution
no cache to avoid checking files that were not modified
no option to have a long running process that watches the file system and automatically re-executes the checker when files are modified

These features are not specific to a static checker and are nice to have to many other tools. This post shows how to create an application adding those features. It uses pyflakes as an example but it could be applied to other tools also.

static checker tasks

doit is an "automation tool" that can execute tasks and provide the features which we are interested into. So the first step is to transform pyflakes operations into doit tasks.

In doit a task is composed of actions, and some other metadata.

actions

The action describes what the task does (some code to be executed). It can be a python function...

For pyflakes the function pyflakes.scripts.pyflakes.checkPath checks a file and returns the number of flakes or warnings found, so it is successful when zero flakes were found.

The action's return value is used to indicate if the execution was successful or not. So the action must return True if it has zero flakes

pyflakes checker action:

from pyflakes.scripts.pyflakes import checkPath

def check_path(filename):
    return not bool(checkPath(filename))

dependencies

A dependency indicates something that is required for a task execution.

For the pyflakes task, the file being checked is a file dependency for the task. That is, the task uses the file as input for its execution.

Explicit information of task dependencies are important for two factors:

1) cache results, if dependencies are not modified since last time the task was executed. It can use the results saved in a cache instead of re-executing the task.

2) execution order, when dealing with the execution of several tasks, the dependencies contains information on the order that tasks should be executed and whether they can be executed simultaneously (in parallel).

task creation

In doit usually task creation is done in a python module. This module functions that return/yield new tasks as a dict with metadata. It can also contain some extra configuration.

Something like:

import os
import glob

from pyflakes.scripts.pyflakes import checkPath


DOIT_CONFIG = {
    # output from actions should be sent to the terminal/console
    'verbosity': 2,
    # does not stop execution on first task failure
    'continue': True,
    # doit itself should not produce any output (use only actions output)
    'reporter': 'zero',
    # use multi-processing / parallel execution
    'num_process': 2,
    }


def check_path(filename):
    """execute pyflakes checker"""
    return not bool(checkPath(filename))

def task_pyflakes():
    """generate task for each file to be checked"""
    for filename in glob.glob('*.py'):
        path = os.path.abspath(filename)
        yield {
            'name': path, # name required to identify tasks
            'file_dep': [path], # file dependency
            'actions': [(check_path, (filename,)) ],
            }

If you drop the content above in a file called dodo.py in a folder. Executing doit from the command line you would execute pyflakes in all python modules in that folder.

It already has multi-process support. And it would execute only tasks that were failing or which checked module file was changed.

To execute in a long running process that watched the file system and re-execute tasks you could execute it as doit auto.

creating a new tool

Creating tasks in dodo.py works fine if you are working on your own project. But sometimes you just want to create a new application that you can easily distribute to other users without requiring them to add any special file into their project.

doit latest release (0.18) has exposed some of its internal API so you can create new applications and still use its task execution model.

The idea is that instead of loading tasks from a dodo.py module, the application itself create tasks and execute doit.

custom task loader

To create a custom task loader you should subclass doit.cmd_base.TaskLoader and implement the method load_tasks

pyflakes command line is extremely simple. It doesn't even support a --help option. It only takes positional parameter that specify python modules or folders containing python modules.

The first change from the previous example using dodo.py is to get a list of files in the same way pyflakes does instead of just getting all modules from a folder...

So on __init__ the loader will find the list of files to be checked.

from doit.cmd_base import TaskLoader

class FlakeTaskLoader(TaskLoader):
    """create pyflakes tasks on the fly based on cmd-line arguments"""

    def __init__(self, args):
        """set list of files to be checked
        @param args (list - str) file/folder path to apply pyflakes
        """
        self.files = []
        for arg in args:
            if os.path.isdir(arg):
                for dirpath, dirnames, filenames in os.walk(arg):
                    for filename in filenames:
                        if filename.endswith('.py'):
                            self.files.append(os.path.join(dirpath, filename))
            else:
                self.files.append(arg)

        for path in self.files:
            if not os.path.exists(path):
                sys.stderr.write('%s: No such file or directory\n' % path)

Than we need to change the function generating tasks to use the computed files instead of using glob.

    @staticmethod
    def check_path(filename):
        """execute pyflakes checker"""
        return not bool(checkPath(filename))

    def _gen_tasks(self):
        """generate doit tasks for each file to be checked"""
        for filename in self.files:
            path = os.path.abspath(filename)
            yield {
                'name': path,
                'file_dep': [path],
                'actions': [(self.check_path, (filename,))],
                }

Usually doit creates a file named .doit.db to store some information that will be used to determine which tasks are up-to-date. This file is created in the same location path as the dodo.py file. Since our new application works independent from the current path we specify the store file (dep_file) to be saved in the user's directory.

    DOIT_CONFIG = {
        'verbosity': 2,
        'continue': True,
        'reporter': 'zero',
        'dep_file': os.path.join(os.path.expanduser("~"), '.doflakes'),
        'num_process': 2,
        }

And finally we implement the TaskLoader interface:

    def load_tasks(self, cmd, params, args):
        """implements loader interface, return (tasks, config)"""
        return generate_tasks('pyflakes', self._gen_tasks()), self.DOIT_CONFIG

The command line implementation we just need to pass the positional parameters to the custom task loader. We also need to add a -w command line option to use the watch mode where the process keeps waiting for modifications in the file system.

from doit.doit_cmd import DoitMain

if __name__ == "__main__":
    cmd = 'run' # default doit command
    args = [] # list of positional args from cmd line
    for arg in sys.argv[1:]:
        if arg == '-w': # watch for changes
            cmd = 'auto'
        else:
            args.append(arg)
    doit_main = DoitMain(FlakeTaskLoader(args))
    sys.exit(doit_main.run([cmd]))

The full code can be found here doflakes.py .