2013/06/29

Software Rants 13: Python Build Systems

So after delving into pcsx2 for a week and having the wild ride of a mid-sized CMake project I can officially say any language that makes conditionals require a repetition of the initial statement dumb as hell. But CMake proves a more substancial problem - domain languages that leak, a lot.

Building software is a complex task. You want to call external programs, perform a wide variety of repetitious tasks, do checking, verifying, and on top of that you need to be able to keep track of changes to minimize time to build.

Interestingly, that last point leads me to a tangent - there are 3 technologies that are treated pretty much independently of one another but overlap a lot here. Source control, build management, and packaging all involve the manipulation of a code base and its outputs. Source control does a good job managing changes, build systems create conditional products for circumstance, and packagers prepare the software for deployment. 

I think it would be interesting if a build system took advantage of the presence of the other two dependencies of a useful large software project - maybe using git staging to track changes in the build repository. Maybe the build system can prepare packages directly, rather than having an independent packaging framework - after all, you need to recompile most of the time anyway.

But that is aside the point. The topic is build systems - in particular, waf. Qmake is too domain specific and has the exact same issues as make, cmake, autotools, etc - they all start out as domain languages that mutate into borderline turing complete languages because their domain is hugely broad and complex, and it has evolved more complex over time. This is why I love the idea of python based build systems - though at the same time, it occurs to me most python features go unused in a build system and just waste processor cycles too. 


But I think building is the perfect domain of scripting languages - python might be slow, but I could care less considering how pretty it is. However, my engagements with waf have made me ask some questions - why does it break traditional pythonic software development wholesale (from bundling the library with source distribution, to expecting fixed name files of wscript that provide functions with some wildcard argument that acts really magical).

What you really want is to write proj.py and use traditional pythonic coding practices with a build system library, probably from pypi, You download the library, do an import buildsystem, or from buildsystem import builder or something, rather than pigeonhole a 2 decade old philosophy of files without extensions in every directory with a fixed name.

Here is an example I'd like to write in this theoretical build system covering pretty much every aspect off the top of my head:

# You can play waf and just stick the builder.py file with the project, 
# without any of the extensionless fixed name nonsese.
from builder import recurse, find_packages, gcc, clang
from sys import platform

subdirs = ('sources', 'include', ('subproj', 'subbuilder.py'))
name = 'superproj'
version = '1.0.0'
args = (('install', ret='inst'),)
pkg_names = ('sdl', 'qt5', 'cpack')

builder.lib_search_path += ('/lib','/usr/lib','/usr/local/lib', '~/.lib', '/usr/lib32', '/usr/lib64', './lib')

# Start here, parse the arguments (including optional specifiers in args) a lot of the builder. global members
# can be initialized with this function via default arguments.
todo = builder.init('.', opt=args, build_dir='../build')

if(todo = 'configure'):
  # builder packages are an internal class, providing libraries, versioning, descriptions, and headers.
  # when you call your compiler, you can supply packages to compile with.
  pkgs += builder.find_packages(pkg_names)
  pkgs += find_packages('kde4')
  utils += builder.find_progs('gcc', 'ld', 'cpp', 'moc')
  # Find a library by name, it will do case insensitive search for any library file of system descript,
  # like libpulseaudio.so.0.6 or pulseaudio.dll. It would cache found libraries already and not repeat
  # itself on subsequent builds.
  libs += builder.find_lib('pulseaudio')
  otherFunction()
  builder.recurse(subdirs)
elif(todo = 'build'):
  # You can get environments for various languages from the builder, supplying them with 
  cpp = builder.env.cpp
  py = builder.env.py
  qt = builder.env.qt # for moc support
  
  # you can set build dependencies on targets, so if the builder can find these in the project tree
  # it builds them first
  builder.depends('subproj', 'libproj')
  
  # builder would be aware of sys.platform
  if platform is 'linux': # linux building
    qt.srcs('main.cpp', 'main.moc')
    qt.include('global.hpp')
    qt.pkgs = pkgs['qt5']
    qt.jobs = 8 # or the .compile syntax
    qt.cc = gcc # set the compiler
    qt.args = ('-wstring',)
    # qt.compile would always run the MOC
    qt.compile(jobs=8,cc=gcc,args=self.args+(, '-O2', '-pthread'),warn=gcc.warn.all,out=verbose)
    # at this point, you have your .o files generated and dropped in your builder.build_dir directory.
    builder.recurse(subdirs, 'build')
  if platform is 'darwin': # osx building
  if platform is 'win32': # windows building
elif todo='link':
  # do linking
elif todo='install':
  # install locally
elif tood='pack':
  # package for installation, maybe using cpack

Basically, you have a library to enable building locally, and you use it as a procedural order of operations to do so, rather than define black box functions you want some builder program to run. There could also be prepared build objects you could get from such a library, say, builder.preprocess(builder.defaults.qt) would supply an object to parse whatever operation is being incited (so you would use it regardless of the calling function on your script) to do the boilerplate for your choice platform. 

I imagine it could go as far as to include anything from defaults.vsp to defaults.django or defaults.cpp or defaults.android. It would search on configure, include on build, and package on pack all the peripheral libraries complementing the choice development platform in one entry line.

The principle concerns with such a schema are mainly performance. You want a dependency build graph in place so you know what you can build in parallel (besides inherently using nprocs forked programs to parse each directory independently, where the root script starts the process, so you need builder.init() in any script that is meant to start a project build, but if you recurse into a subproject that calls that function it doesn't do anything a second time).

You would want to support a lot of ways to deduce changes, besides just hashes, you could use file system modification dates, or maybe even git staging and version differences (ie, a file that doesn't match the current commit version is assumed changed). You would cache such post-changes afterwards. You would probably by default use all available means and the user can turn the up for speedups with potential redundant recompilation (ie, if you move a file, its modification date changes, the old cache is invalidated, but if they hash the same it is assumed the same file moved and isn't recompiled).

If you support build environments, you can support radically different languages. I just think there are some shortcomings in both scons and waf that prevent them from truly taking advantage of their pythonic nature, and using all the paradigms available to python is one of them, I feel.

No comments:

Post a Comment