Friday, March 06, 2015

Configuration Files Are Just Another Form of Message Passing (or Maybe Vice Versa)

In 1989, when I was at the National Center for Atmospheric Research or NCAR in Boulder Colorado, I was working on a spooling system for our output production system. When you have supercomputers (and NCAR had a lot of them, then and now), everything else has to be super too: storage systems, networking, and even your print system. Way back then, NCAR had two gigantic Xerox laser printers which would fill a normal sized room. It also had several automated film cameras to which we wanted to spool graphical output.

This being the time before digital images could be rendered on demand on a laptop and projected on a screen using equipment you could buy at your local office supply store, scientists wanted to express their high resolution scientific visualizations that they generated on the big Cray supercomputers on some persistent media. A Dicomed automated film camera would render the visualizations to a CRT screen and the camera would take photograph of it, all under software control. The system could produce still photographs, movies, and microfiche, in color or black and white. (If you don't recognize any of these technologies, don't sweat it, just go with me here.) So a small team of us set about to develop a distributed graphical spooling system in C to run under SunOS. The result was the Text And Graphics System or TAGS.

TAGS Development Team
The TAGS Development Team (ironic that this old photograph is out of focus).

I was tasked with developing a message passing system for TAGS. After a lot of thought and experimentation, I invented parcels. (Parcels with TAGS, get it?) Parcels were a text-based free-format mechanism that passed messages over stream sockets as a sequence of keyword-value pairs, what we might now refer to as a property. The message payload of a parcel might look something this.

user="Chip Overlock"
parameters="align=left title=\"Model A Run 14\""

When a parcel was received, it was parsed and loaded into a binary tree that could be search by keyword, so it was a kind of associative memory. As with the parameter property above, parcels could themselves be values of properties that could be subsequently parsed and loaded into yet another binary tree, so parcels could be nested.

Because parcels were text, we could generate messages for testing or other special purposes with our favorite text editor. Parcels could be written to and read from files; that's how TAGS maintained job state persistently across reboots. And best of all, if another developer added some properties to the message that his software sent my software, I didn't have to coordinate with him at all. My software didn't care that were were extra properties in the parcel, as long as the properties my software did care about hadn't changed. It hugely simplified development and deployment.

Yeah, seems kinda quaint now, doesn't it? But parcels predate the ubiquitous use of Extensible Markup Language or XML as a text-based message passing format. Using parcels was a revelation: it was a message passing format, it was a configuration file format, it was a floor polish and a desert topping.

TAGS ran in production for about eight years until both it and the hardware it served and on which it ran were obsolete. At its peak, TAGS produced a half a million frames of film per month. Output media included 35mm color slides, 105mm microfiche, and ultimately even video tape and monochrome and color paper. My parcel code ended up having more longevity than TAGS itself, going on to be used in other NCAR systems after I left the organization.

Years later,  I became quite conversant with the Document Object Model (DOM) and the facilities in Java to parse XML, as well as C and C++ based XML parsing libraries, as  I worked on subsequent projects in other organizations that took a similar approach to message passing and configuration files. You probably won't be surprised to learn that XML made me positively nostalgic for my NCAR days; XML had all the good stuff that I liked about parcels.

<user>Chip Overlock</user>
 <title>Model A Run 14</title>

A couple of years ago I had a gig with Aircell (now Gogo Business Aviation) to help a team develop a satellite communications system for business aircraft. We used Java Script Object Notation or JSON for much of our configuration files and message passing. It was parcels all over again.

 "job": "1234",
 "user": "Chip Overlock",
 "timestamp": "2014-01-28T14:15:04",
 "color": "yes",
 "parameters": {
  "align":  "left",
  "title": "Model A Run 14"

We also ended up making some serious use of Asterisk, the open source PBX. The final product, which today you might find in a Gulfstream or Lear business jet, allows a traveller to use a VoIP app on their smart phone to wirelessly place a SIP call that goes to space over the Iridium satellite network, down to a ground station into the telephone system, or even to another plane equipped with the same  product. It was international dialing to anywhere in the world, any time, from anywhere, at any altitude. It worked. You could even send a FAX. (Remarkably, billionaires flying in seventy million dollar business jets still insist on using facsimile machines. Go figure.)

My Pelican cases and the flight test aircraft (photo by Doug Young).

Yeah, I was impressed, too. But it's funny what you carry away from projects like that. What I really wanted to use in other projects in the future was Asterisk's configuration file support. Asterisk, and lots of other systems, use a really flexible configuration file format that is sometimes referred to as the INI (for initialization) format. INI files are similar to parcels in that they have properties that are keyword-value pairs, but they allow you to group collections of properties into sections using the square bracket notation as shown below.

user=Chip Overlock
title=Model A Run 14

Recently I found myself (somewhat deliberately, it must be admitted) between gigs with some time on my hands and a lot of unused computer equipment sitting around, so I perused my old-school index card file for project ideas.

I had always wanted to learn how to use Flex and Bison. Flex is an open source reimplementation  of Lex, a UNIX tool that generates a lexical scanner based on regular expressions. Bison is similarly an open source reimplementation of YACC or Yet Another Compiler Compiler, a UNIX  tool that generates a shift-reduce parser from an LALR(1) grammar. Over the years I had written a lot of recursive descent parsers and table-driven parsers based on push down automata (PDA), but had never used Lex or Yacc, although I had seen other developers make good use of Flex and Bison (including one on the SATCOM project). I had written my share of grammars that were more or less LL(1), but had only played with LALR(1) grammars as an academic exercise when I was in graduate school.

During another interstitial period, I had added an implementation of red-black balanced binary trees to my Diminuto library, a collection of C code for systems programming under Linux/GNU, portions of which has found its way into many commercial products (including, again, the SATCOM project). I wanted a non-trivial application for the red-black trees in Diminuto beyond the unit-tests that I had written.

And I really wanted a C-based parser for INI files.

Assay is my C-based parser for INI files. The lexical scanner and shift-reduce parser used by Assay  are generated by Flex and Bison from an LALR(1) grammar that I developed that describes, as near as I could determine anyway, the INI file format used by Asterisk. You can trivially import the sections and properties from an INI file into a red-black tree which can be efficiently searched. And you can just as trivially export the tree (what Assay refers to as a configuration), into a file. Or any other thing that can be represented as a standard I/O stream (FILE *).

Here's a code snippet from an Assay unit test that, given a path name, creates a new configuration object, imports the contents of an INI file into it, exports the resulting configuration in INI file format to standard output for inspection, and then interrogates the imported configuration. (I've removed all the unit test framework for readability.)

assay_config_t * cfp;
const char * value;
cfp = assay_config_export_stream(assay_config_import_file(assay_config_create(), PATH1), stdout);
value = assay_config_read_string(cfp, "section1", "keyword1");

You don't have to import an INI file to use Assay. You can create configurations programmatically (which is just what the Bison-generated Assay parser is doing as it reads the INI file).

assay_config_t * cfp;
assay_section_t * scp;
assay_property_t * prp;
cfp = assay_config_create();
scp = assay_section_create(cfp, "ALFA");
prp = assay_property_create(scp, "ONE");
assay_property_value_set(prp, "1", strlen("1") + 1);

Here's the funny thing: it wasn't until I was wrapping up the project, slamming what I thought were the last of the unit tests down to the Assay repo on GitHub, and fixing some typos in the Doxygen comments, that it occurred to me, maybe I could use Assay for message passing too, by using a FIFO or a socket that could be represented by a standard I/O stream object.

A half a day later I had new unit tests doing just that.

Here's two code snippets from an Assay unit test, one from a producer process that creates a configuration, loads it from an INI file, sends it across a socket, and then destroys the configuration, and another showing a consumer process that create a configuration and receives the INI message into it.

assay_config_destroy(assay_config_export_stream_send(assay_config_import_file(assay_config_create(), PATH1), stream = fdopen(pipeline[1], "w")));

cfp = assay_config_import_stream(assay_config_create(), stream = fdopen(pipeline[0], "r"));

So finally,  after more than a quarter of a century of implementing and using stuff just like this, over and over, it finally occurred to me that maybe configuration files are just a special case of message passing. Or possibly vice versa. Whenever I am faced with implementing either a configuration file handler, or a message passing sub-system, maybe it would be a really a good idea to consider whether perhaps those two tasks are really the same problem.


J. L. Sloan, Parcels with TAGS, NCAR Technical Note, NCAR/TN-377+IA, National Center for Atmospheric Research, 1992-10,

Wikipedia, "INI file",

Asterisk Project, "Asterisk Configuration Files",

VOIP Info, "Asterisk config files",

John R. Levine, Tony Mason, Doug Brown, lex & yacc, 2nd ed., O'Reilly, 1995

John R. Levine, flex & bison, O'Reilly, 2009

Tom Niemann, "LEX & YACC TUTORIAL",

Saumya, K. Debray, "Lex and Yacc: A Brisk Tutorial",

Assay, Digital Aggregate Corporation, 2015,


Craig Ruff said...

Heh, didn't remember I had a beard in those days.

I still use the C++ variant based on the libtools parcel code you wrote. We used it for both configuration files and message passing in the later versions of the NCAR Mass Storage Server (MSS) user interface code and servers.

Chip Overclock said...

That may be the longest surviving code I've ever written -- about a quarter of a century -- although possibly not. But even if the code doesn't last, good ideas do. Just recently I reimplemented from scratch the buffer management code from libtools that I also wrote and is used by my parcel code.

Always good to hear from you, Craig.