Skip to main content

mongodb setup for deployd on heroku

I am using deployd to prototype an applicaton. It's been great, really helped me focus on what matters for the prototype.

Today I reached the first milestone and decided to deploy it in heroku.

I had a small problem because deployd uses an object to config mongodb, but heroku provides only a URL to mongodb server... So here is the script I am using to run deployd on heroku:

var deployd = require('deployd');

// on heroku must use port from env
var port = process.env.PORT || 3000;

var url = require('url');
var db_url = url.parse(
    process.env.MONGOHQ_URL || "mongodb://:@localhost:27017/my_db_name");

var options = {
    port: port,
    db: {
        "host": db_url.hostname,
        "port": parseInt(db_url.port),
        "name": db_url.pathname.slice(1),
        "credentials": {
            "username": db_url.auth.split(':')[0],
            "password": db_url.auth.split(':')[1]
        }
    }
};

var server = deployd(options);
server.listen();

server.on('listening', function() {
  console.log("Server is listening on " + port);
});

server.on('error', function(err) {
  console.error(err);
  process.nextTick(function() { // Give the server a chance to return an error
    process.exit();
  });
});

strace & build-tools

strace is a utility to monitor system calls. I.e. it can report all files open by a process.

fabricate.py is a build-tool based on strace. The idea is pretty cool. It will execute all commands through strace. Than it can automatically figure out the dependencie and targets by looking at the strace output. So you don't need to explicitly specifying the dependencies and targets.

But these approach has a few problems:

  • strace will slow down the execution

(not only because of strace itself but also becaue it will have to parse strace output)

  • not always correct. i.e. run('cp', '-r', 'src', 'out')

If you just add a file to the src it can not figure out that a "dependency" was added.

  • confuse targets & dependencies

It checks the mode files were open, if open in write mode it is target. The problem is that some programs have their own cache system, so a target might be taken as a dependency.

Where these limitations are a problem or not depends on your use-case...

doit & strace

doit takes a very different approach to dependency handling. All dependencies must be explicitly defined.

doit doesn't support any kind of implicit dependency but it now comes with a strace command that lets you easily check which files are being used.

The idea is that you can use this feature while developing your tasks and make sure you are setting the dependencies correctly.

Example:

def task_o():
    return {'actions': ['cp abc abc2', 'touch xyz']}
$ doit strace -f traceme.py o
.  o
.  strace_report
R /xxx/abc2
R /xxx/abc
W /xxx/abc2
W /xxx/xyz

For more details check the docs.

python code coverage & multiprocessing

I wanted to get code coverage for my python code that uses the multiprocessing module.

The default way of coverage measuring in subprocess is to set the python interpreter to turn on code coverage right on start-up. But this technique won't work if the process being measured forks (as it does in multiprocessing). See issue.

I solved this by monkey-patching the multiprocess to programmatically start the coverage. Only the method Process._bootstrap needs to be monkey-patched. It is just wrapped with some code to start the coverage and save it when it is done.

from multiprocessing import Process

def coverage_multiprocessing_process(): # pragma: no cover
    try:
        import coverage
    except:
        # give up monkey-patching if coverage not installed
        return

    from coverage.collector import Collector
    from coverage.control import coverage
    # detect if coverage was running in forked process
    if Collector._collectors:
        class Process_WithCoverage(Process):
            def _bootstrap(self):
                cov = coverage(data_suffix=True)
                cov.start()
                try:
                    return Process._bootstrap(self)
                finally:
                    cov.stop()
                    cov.save()
        return Process_WithCoverage

ProcessCoverage = coverage_multiprocessing_process()
if ProcessCoverage:
    Process = ProcessCoverage

Note that the monkey-patch is only applied when the original process was being covered. This is done by checking if there are any Collector.

When running the coverage you need use the parallel-mode and than combine the results before creating report:

$ coverage run --parallel-mode my_program.py
$ coverage combine
$ coverage report