24/06: python tools for Scons

Tags:
I've started using scons to manage my data analysis workflows. A lot of the work is done by python scripts that read in a bunch of data from one more sources, crunch it, and spit out a new table. So there is a dependency not only on the input data but on the script and any modules it imports. You can get scons to run a script using a simple Command builder. For instance, you have some "script.py" that expects a command line like "script.py input1 input2 output".


env.Command('output.tbl', ['script.py', 'input1.tbl', 'input2.tbl'], 'python $SOURCES $TARGETS')


This works fairly well, but if script.py imports another module (e.g. with functions common to a bunch of scripts) you have to manually specify that dependency. But with a little extra code you can get scons to automatically scan python scripts for import statements and include any imports from the local directory as dependencies. I also like to use an Emitter that will move the script to the front of the list of dependencies, so I don't have to worry about what order I specify them in.


import os,re

import1_re = re.compile(r'^from\\s+(\\S+)\\s+import',re.M)
import2_re = re.compile(r'import\\s+(.+)$',re.M)

def pyfile_scan(node, env, path):
imports = []
search_path = os.path.join(*os.path.split(str(node))[:-1])
text = node.get_contents()
for item in (import1_re.findall(text) + import2_re.findall(text)):
for x in item.split(','):
test_file = x.strip() + '.py'
if os.path.exists(os.path.join(search_path, test_file)): imports.append(test_file)
return imports

def py_targets(target,source,env):
""" pulls out the python script from the source list and generates a call to the script """
out = []
for x in source:
if str(x).endswith('.py'):
out.insert(0,x)
else:
out.append(x)
return target,out

pybuild = Builder(action='python $SOURCES $TARGETS $SCRIPTOPTS',
emitter=py_targets)
pyscan = Scanner(function = pyfile_scan,
skeys = ['.py'])



Now you can use the custom builder as follows, and scons will recognize any modules script.py depends on.


# add the python builder to the environment
env = Environment()
env.Append(BUILDERS = {'PyBuild' : pybuild})
env.Append(SCANNERS = pyscan)

env.PyBuild('output.tbl',['script.py','intput1.tbl','input2.tbl'])

Comments

S. Joshua Swamidass wrote:

Thanks for the post. It inspired a package I wrote to handle this exact problem, but do it with recursive imports, and appropriate handling of module scripts.

http://pypi.python.org/pypi...
23/09 07:34:01