November 04, 2007


So, on the near-anniversary of the last post, I've been thinking about configuration systems. What they are, what they do, and how you make 'em work.

They're one of those things most people never think about, or if they do it's at best as a user. You probably ran the autoconf-generated shell script, or did whatever else the package you installed needed. It did some strange magic, built some files, and you were done. Easy, right?

Well, okay, maybe not so much. If you've ever poked around inside the guts of an autoconf-generated script you're probably aware there's more concentrated evil in there than you'll find outside the source for Clippy. It's just nasty.

That's not really a surprise; it is a multi-unix shell script designed to probe the environment of a system it knows so little about it can't even really count on a properly functioning shell. And yeah, I know, it's a lot better than it was, but still... Ewww. Just because the insides are justified (and possibly obligatory) doesn't make it any less Lovecraftian. (So I'm not knocking autoconf, it just scares me. A lot)

As a developer, configuration systems are a pain, for a few reasons. The first is portability -- by their very nature, configuration systems go look for things that aren't on the system you're developing on. (Because if they were you wouldn't need to go look to see what they are) Writing portable code is annoying, because it requires making good assumptions from the beginning, and that's tough. Especially because, if you're at all comfortable with the system you're working with, it means making assumptions that are different than your default ones.

The second big pain with config systems is that it's a hassle to track down bugs, again because it involves systems different than your own. How can you tell that the code you built on your 32-bit x86 linux system doesn't work when configured and run on a 64 bit AIX box with more than 4G of memory if you don't actually have a 64 bit AIX box with more than 4G of memory?

And the big final annoying thing about configuration systems. Like test frameworks, they're probably completely different from the app you're actually writing. That is, there's likely little or no overlap between the mindset needed to do thing you like to do (writing the game or chat client or whatever it is) and the thing you need to do (write the configuration system). Unless you're writing a configuration system, in which case it gets all meta.

Then there's the whole not knowing what you actually need to go look for. Most of the annoyingly quirky things (like the PDP-11's wacky 32-bit integer format, which is neither big nor little endian, but rather middle endian) you had to deal with in the past have thankfully died enough that you don't have to care, but if you've never had to deal with the vagaries of how shared objects are built and work on a half dozen systems you'd never even think to go probe for them.

All of which is why people generally reach for autoconf, and you really can't blame them. But autoconf is evil, and for a complex system it's inadequate as well, since there's a damn sight more than just a makefile and config.h that you need to build up when you're configuring.

So anyway, configuration systems. What the heck are they, anyway?

If you think about it, there are four bits to configuration systems:

  1. Rules
  2. Probes
  3. Template instantiation
  4. Seed data

Seed data gives you sane, functional (though possibly only barely functional) defaults. Probes gather information from the environment to override the defaults, rules decide what probes need to be triggered and what values are produced, and instantiated templates are what you end up with. Possibly in an iterative way, since a lot of environment probing involves instantiating templates (that is, little C programs) which are compiled and possibly run.

It's all very dependency based; to instantiate a template you need to have all the input values, which different rules produce, and some template instantiations depend on other instantiations, which depend on yet other instantiations, and so on. Not all that much different than what make does, only with built-in actions (really built in, into the executable, rather than predefined ways to invoke programs) and generic dependencies rather than time-based file dependencies.

Or, if you like, the mutant bastard child of Prolog and Template::Toolkit.

More interestingly, if you're going to build a configuration system, you can actually manage to do so from scratch, with a bare-bones shell script or batch file, a C compiler, and the linker, which is a pleasant change from the past, where you couldn't. What I mean by this is that, if you assume a C89 standard C compiler (which, given that it's been 18 years since that standard was made seems safe), you can manage to get everything you need to probe the environment.

Think about it. What do you need to build our configuration system? You need to read in and parse rules data and templates, you need to do dependency ordering, you need to instantiate templates, you need to spawn off subprocesses to check individual bits and pieces, and you need to read in the results of those probes. That is, you need fopen, fread, fwrite, fclose, system, and a boatload of templates. Everything else is built into the configuration progam.

All of those things are guaranteed in C89. (And yeah, I know, there may be systems that don't handle them all right, but at this point it's pretty safe to assume they all do) Putting the code together so it compiles and links with close to no knowledge of the system is pretty straightforward too. A series of cc commands followed by ld for unix and its like, for Windows you use whatever the most common compiler is, or you use some environment variable to do it (and have a prompt at the beginning of the batch file for the C compiler and linker -- you can be pretty sure that if someone's running a .bat file it's windows, after all)

Yes, that does mean that maybe you're not using the compiler and linker options the user really wants, but that's fine, because once you've built yourself you just go ask and do it again. Or provide a way to set some environment variables or command line parameters if that's your preference. Worst case there is you need to compile everything twice, but you're probably going to do that anyway, since a full-fledged configuration engine needs to be able to do system-specific stuff, which means probes and rebuilding. The good bit about all that is that you don't need anything to configure your configuration system. Which is good, because otherwise you end up chasing your own tail, and that's annoying. (not to mention hell on your back)

And, of course, once the engine is already built, anything else that uses the engine for configuration already has a prebuilt set of values that can be pretty easily filled in to skip most of the probes. (Or you can package the config system with whatever needs it and just build it, since a half dozen or so extra compiles and links aren't going to noticeably increase the length of time it takes to probe the system for settings)

There's more, but this is enough for now.

(This is all Jim Keenan's fault, I should mention that, in case anyone's curious)

Posted by Dan at November 4, 2007 04:03 PM | TrackBack (0)

happy to see you back!

i hope we'll get more posts soon!

Posted by: anon at November 6, 2007 12:35 PM