Saturday, May 09, 2015

Time Flies

From The Guardian, "US aviation authority: Boeing 787 bug could cause 'loss of control'" [May 1st 2015], describing a bug in the Dreamliner aircraft's electrical power generators that resulted in a U.S. Federal Aviation Administration (FAA) Airworthiness Directive (AD):
The plane’s electrical generators fall into a failsafe mode if kept continuously powered on for 248 days. The 787 has four such main generator-control units that, if powered on at the same time, could fail simultaneously and cause a complete electrical shutdown.
And from the FAA AD [2015-NM-058-AD May 1st 2015]:
We are adopting a new airworthiness directive (AD) for all The Boeing Company Model 787 airplanes. This AD requires a repetitive maintenance task for electrical power deactivation on Model 787 airplanes. This AD was prompted by the determination that a Model 787 airplane that has been powered continuously for 248 days can lose all alternating current (AC) electrical power due to the generator control units (GCUs) simultaneously going into failsafe mode. This condition is caused by a software counter internal to the GCUs that will overflow after 248 days of continuous power. We are issuing this AD to prevent loss of all AC electrical power, which could result in loss of control of the airplane.
Knowing nothing more its periodicity I'll make a prediction about the bug. I'm going to do this by making a successive series of transformations on the period until I find a number that I like. What "like" means in this context will become obvious to the reader.

248 days times 24 hours in a day is 5,952 hours.

5,952 hours times 60 minutes in an hour is 357,120 minutes.

357,120 minutes times 60 seconds in a minute is 21,417,200 minutes. Okay, I'm starting to like this number.

21,427,200 minutes times 10 is 214,272,000 1/10s of a second. Not quite there yet.

214,272,000 1/10s of a second times 10 is 2,142,720,000 1/100s of a second.

2,142,720,000 1/100s of a second. I bet you see it now, don't you? This value is perilously close to the maximum positive number in a signed 32-bit integer variable, which is 2,147,483,647 or 0x7fffffff in hexadecimal.

Suppose, just suppose, you were writing software to manage a power generator in a Boeing 787 Dreamliner. You needed to keep track of time, and you choose 1/100s of a second as your resolution, so that adding 1 to your variable indicated 1/100s of a second. Maybe that was the resolution of your hardware real-time clock. Maybe your RTOS had a time function that returned a value with that resolution. Or maybe - and this is the most common reason in my experience - you decided that was a convenient resolution because it was fine grained enough for your application while having dynamic range - the difference between the possible maximum and minimum values - that you could not imagine ever exceeding the capacity of the signed 32-bit integer variable into which you stored it. 248 days is a long time. What are the chances of the system running for that long?

So, if we chose a signed 32-bit integer variable, and 1/100s of a second as our the resolution of our "tick", when would the value in our variable roll over into the negative range? Because if it rolled over into the negative range - which would happen in the transition from 2,147,483,647 to 2,147,483,648 or 0x80000000 hexadecimal, which in a signed variable would actually be interpreted as -2,147,483,648 - wackiness would ensue. Time would appear to be running backwards, comparisons between timestamps would not be correct, human sacrifice, cats and dogs living together, a disaster of biblical proportions.

That sounds pretty dire. Maybe we better check our math by working backwards.

So the maximum value of 2,147,483,648 in 1/100s of a second divided by 100 is 21,474,836 seconds.

21,474,836 seconds divided by 60 is 357,913 minutes.

357,913 minutes is 5,965 hours.

5,965 hours is 248 days.

I feel pretty confident that, without knowing nothing other than the failure's periodicity, we should be looking for some place in the software where time is being kept with a resolution of 1/100s of a second in a signed 32-bit integer.

When this failure was first brought to my attention - which mostly happened because one of my clients is a company that develops avionic hardware and software for commercial and business aircraft - I did this calculation in just a few seconds and was pleased to discover once again that such a simple prediction was trivially easy to make.

I became really good at this kind of thing during my time working in a group at Bell Labs in the late 1990s developing software and firmware for ATM switches and network interface cards. We called this class of bug counter rollover because it manifests in all sorts of contexts, not just time, in any system which is expected to run 24x7 for as long as... well, in the telecommunications world, pretty much forever. I fixed this bug in the protocol stack in the ATM NIC we where developing, and then later proceeded to discover it in pretty much every ATM switch by every vendor our customers hooked our NIC up to.

It became a pretty regular occurence that I'd be on a conference call, sometimes with our own field support organization, sometimes with developers working for other ATM switch vendors, be told that the system was failing - disconnecting, or rebooting, or some such thing - with some periodicity like "once a day", and within a minute or two of noodling around in my lab notebook I'd astound everyone (except of course members of our own group) by announcing "You are maintaining your T125 timer in a 16-bit signed variable with a resolution of 5 seconds per tick. You have never tested your switch with a connection lasting longer than 22 hours."

Once, I said something like this, there was a long pause on the line, and an awestruck developer on the other end agreed that that was the case. But once you see the arithmetic - it's so simple you can't really even call it math - it's like the magician showing you how the trick is done. It sorta takes the magic out of it.

There are a few techniques that can be used to help address counter rollover.

Unsigned variables: When an unsigned variable rolls over, unsigned subtraction between the variable and its prior value still yields the correct magnitude, providing the variable has only rolled over once. You just have to understand that the value in your variable isn't absolute, but is only useful when subtracted from the some other unsigned value with the same units.

Wider variables or coarser resolution: Modern compilers and processors often offer 64-bit variables. This may increase the dynamic range to the point where a rollover won't occur until the Sun goes nova. Choosing a coarser resolution - a larger tick value - can have the same effect.

Detect the imminent counter rollover so that you can compensate for it: Instead of computing prime = value + increment you first compute temporary = maximium - increment and compare temporary < value; if so, then your level of entropy is about to increase. You need to do something about it in the code.

As developers of systems that are intended to run a long time - longer maybe than we typically even test it for - and especially for systems whose failure may have safety consequences - and that includes avionics systems, and telecommunications systems, and lots of other kinds of systems - we have to be hyperaware of the consequences of our decisions. This is why, whether organizations think of it this way or not, developers are making critical systems engineering decisions on a daily basis just in the choice of measurement units and data types.

Update (2015-05-10)

Coincidentally, I recently took a two day class on DO-178, an FAA guideline on processes to develop avionics software at various levels of safety criticality. At the higher levels of criticality, code test coverage of 100% is required. Surely the software in the Boeing 787 GCUs had to meet the higher Design Assurance Levels (DAL) specified in DO-178, like maybe B or even A, which require extremely stringent processes. I've written before about how unit test code coverage is necessary but not sufficient. In that article I was talking about the need for range coverage. This is yet another example of the same thing, where the range is a duration of time. Maybe we can call this temporal coverage.

Friday, April 03, 2015

Altered Finite States

Folks who have spent time in a flotation tank report that their first experience consisted mainly of figuring out just how the heck to use a flotation tank. Having now had two one-hour sessions in a flotation tank myself, I can confirm that this is accurate.

I-sopod Flotation Tank
Photograph: Wikimedia

Flotation tanks, also known as isolation tanks, and occasionally as sensory deprivation tanks, are these days large horizontal enclosed tubs filled with a few inches of water containing a very high concentration of dissolved epsom salts (magnesium sulfate). The resulting solution is so dense that you can’t help but float when you lie down in it. The solution in the flotation tank is heated to body temperature, sanitized, and continuously filtered. The solution does not taste salty; it tastes vile (take my word on this).



Flotation therapy is used for a variety of purposes ranging from relaxation, meditation, or (if you’re like me) because you’ve seen the movie Altered States or the television show Fringe. I am happy to report that I neither regressed to a neanderthal, nor did I emerge from the tank in a parallel universe. As far as I can tell, anyway. I did warn Mrs. Overclock ahead of time that she might get a phone call in the event I woke up naked in the Denver zoo.

March: The First Session

For my first session, I attended a local spa called A New Spirit. Yeah, I know. It was full of New Age music, gently falling water, and the smell of incense, just as you might imagine. Being an engineer, I would have been happier if it had been all stainless steel and the employees wore lab coats and carried clipboards. But what really surprised me was that it wasn’t until I completed my session and was ready to leave that I saw a woman who wasn’t an employee; up until then, all of the customers I saw sitting around wearing the terry cloth robes provided by the spa were men, some of whom were waiting their turn in one of the three flotation tanks.

Each tank was in a private room. I showered in one of the two bathrooms to remove surface dirt and oil from my skin. I stored all of my personal effects in the provided tote bag, put on the bath robe, entered the tank room that had been assigned to me, and closed the door. I hung up my robe, inserted the disposable foam earplugs provided by the spa, and, naked as the day I was born (but a lot fatter and hairier), I opened the hatch at the front of the tank and climbed in.

All three of the flotation tanks at this spa were of similar size, but this one had a hinged hatch at one end, while the other two tanks (which I peeked at when they weren’t being used) had sliding hatches in their mid-sections. The tank appeared to be fiberglass, in top and bottom halves. The bottom half had a water tight liner in it. Both the solution and the air inside the tank were heated to body temperature. I knelt down into the solution, closed the hatch, and laid down.

I spent the next hour trying, mostly unsuccessfully, to relax. At first I couldn’t see or hear anything. But gradually I became dimly aware of the New Age music, which didn’t seem that loud even outside of the tank room, but managed to permeate both the ear plugs and my tinnitus. Eventually the music became dominated by the sound of my pulse in my ears, and, after a bit, the noise of water trickling past the earplug in my left ear. This latter sound came to seem as loud as  the rushing torrent of a stream in the mountains in the spring.

As my eyes became adjusted to the dark, I because aware of a ring of light around the seam between the top and bottom halves of the tank. That’s kind of remarkable, since the tank room itself was very dimly lit. But unless I turned my head to either side, I couldn’t really see this band of light, nor did it illuminate the interior of the tank enough to actually make out any detail.

I went through the (apparently common) issue of where to put my arms: down at my sides, up above my head, somewhere in between. I felt compelled to explore my surroundings. I’m of average height and I found that if I stretched my legs down and my arms up above my head, I could just barely touch the far ends of the tank. The tank was proportionately narrower, so that I could touch either side without extending my arms very much at all. As I tried to relax and float, I drifted around in the tank, occasionally bumping into the sides.

The spa provided one of those foam "swim noodle" tubular pool toys to use as a sort of pillow, and I had to experiment with that as well: use it, don’t use it, put it under my neck, beneath my head, etc. The spa web site mentioned that some people find sitting up in the tank more comfortable. I tried that, only to discover that the heating elements for the water were below the floor, making my bum uncomfortably warm. Being an old guy, my arms and legs would stiffen up and become uncomfortable, so I had to flex them from time to time, which caused the water to slosh around, which caused me to repeatedly bump into the sides of the tank.

After a loooooooooong subjective period of time, an attendant entered the room, knocked on the outside of the tank to let me know my hour was up, and then left. (They ask that you knock back so that they know you aren’t asleep, or dead, or in a parallel universe, and I did that.) I got out of the tank, dried off, and put my robe on. I left the room, took another shower to wash the epsom salts off, got dressed, paid, and left.

First impression: it was interesting. But I spent most of my hour flailing around. I really needed to try it again.

April: The Second Session

I changed a few things for my second flotation session at A New Spirit. I bought some disposable silicon swimmer’s earplugs that I hoped would do a better job of keeping the solution out of my ears. (They did, but didn’t do any better in blocking out the New Age music.) I took a shower before I left the house to save a little time at the spa. Other than that, this visit started out much the same as the first one. I was even assigned the same tank room.

But this time I spent less time figuring out what to do and just tried to relax. I did better this time. I still had to flex my stiffened arms and legs occasionally. I found that I was the most comfortable with my arms in a “hands up, don’t shoot” posture, below the foam float that I still used behind my head, and sometimes with my hands lightly clasped. I was able to concentrate more on my breathing and less on my no longer so strange environment. I had less tension in my neck and back. All things considered, the second session went a lot better. I had some quality thinking time.

What did I think about? Being a typical male, sex, every few minutes. Sometimes with Blair Brown. But mostly I thought about socket multiplexing.

No, really.

I’ve been working on some open source Linux/GNU-based socket handling code based on work I did under SunOS way back around 1989. (This is part of my Diminuto C-based library you can find on GitHub.)  I had wanted to do some refactoring of the socket multiplexing unit test, which is a simple running example of a single-threaded service provider. I worked out all the details while in the tank. Once I got back to my office at home, I completed what I had originally estimated to be a day’s work in a couple of hours. I also fit in lunch at the local deli on my way home from the spa, and a trip to the gym late in the afternoon to lift weights. I call that a good day.

Future Sessions

I’m going to keep up monthly sessions in the flotation tanks at A New Spirit for a while. It’s part of my effort to learn to relax, to try to reduce my addiction to constant mental stimulation, to increase my attention span, and improve my ability to get into the zone. I’m also going to work on learning to meditate. All of this in an effort to think deeper thoughts, to gain better insights, and to enhance my ability to think longer term, by at least temporarily removing the more or less trivial distractions that are ubiquitous in today's world.

Or maybe I just like lying around buck naked.

Monday, March 16, 2015

Being Evidence-Based Using the sizeof Operator

How long is an int in C?

How does the length of an int compare to that of a long in C++?

These and related questions crop up, I dunno, seems like weekly, in discussion forums related to embedded development in social media sites like LinkedIn.com. What then follows is a lengthy meandering comment thread full of misinformation or at least many answers that are only applicable to a very specific toolchain or hardware target, probably the only target the commenter has ever used.

Folks, you don't have to guess at this. You can be evidence-based and find out exactly what the answer is for your particular combination of toolchain and target.

The sizeof operator in C and C++ determines the size in bytes of the variable, array, type, or structure to which it is applied, at compile-time. It can't tell you anything about the dynamic run-time behavior of how your application uses memory. Nor will it give you the results you might expect if the compiler cannot know the size of the argument. But it can tell you all sorts of other useful stuff.

For example, given

struct Header {
 struct Header * next;
 size_t size;
};

typedef struct Header header_t;

header_t datum;

header_t data[10];

header_t * that;

you can use expressions like

sizeof(struct Header)

sizeof(header_t)

sizeof(datum)

sizeof(data)

sizeof(datum*)

sizeof(that)

sizeof(*that)

and even

(sizeof(data)/sizeof(data[0]))

in your C and C++ code to automatically insert appropriate constant values of type size_t instead of coding those values as, say, preprocessor symbols, or hard coding them as integer constants.

You don't have to take my word for it. You can code it up, compile it, and run it yourself, and see what happens.

Below is a little C program that I compile and run every time I find myself using an unfamiliar compiler or processor. And sometimes even a familiar compiler or processor, because I am old and have senior moments. This is from my Diminuto library of stuff I have found useful for doing systems programming under Linux/GNU. But you can just write your own and omit all the Diminuto-specific types.

As long time readers of this blog know, I have run similar code on non-Linux systems like VxWorks-based PowerPCs; on Harvard architectures like Arduino and PIC micro-controllers; on chips that had twenty-four bit integers; on an architecture for which a signed integer type was not the same length as an unsigned of the same integer type. It pays to check and not guess. There's a lot of crazy stuff out there.

(Note that the blogger editor may wrap some of the source lines. You can always get the original source code from GitHub.)

/* vi: set ts=4 expandtab shiftwidth=4: */
/**
 * @file
 *
 * Copyright 2014 Digital Aggregates Corporation, Colorado, USA
 * Licensed under the terms in README.h
 * Chip Overclock
 * http://www.diag.com/navigation/downloads/Diminuto.html
 *
 * There's a lot of duplication here, but I'm paranoid that way. Remarkably,
 * I once worked on an embedded project using a proprietary non-GNU C compiler
 * in which the sizeof of signed type was not the same as the sizeof of the
 * unsigned of the same type. I was a little astounded by that. Also, note
 * that you can take the sizeof a void and of a function type (as opposed to a
 * void or function pointer). It's good to know these things.
 */

#include <stdio.h>
#include <pthread.h>
#include <stdint.h>
#include <stddef.h>
#include "com/diag/diminuto/diminuto_types.h"

#define printsizeof(_TYPE_) printf("sizeof(%s)=%zu\nsizeof(%s*)=%zu\n", #_TYPE_, sizeof(_TYPE_), #_TYPE_, sizeof(_TYPE_*))

typedef enum Enum { ENUM = 0 } enum_t;

typedef void (function_t)(void);

int main(void)
{
    printsizeof(char);
    printsizeof(signed char);
    printsizeof(unsigned char);
    printsizeof(short);
    printsizeof(signed short);
    printsizeof(unsigned short);
    printsizeof(int);
    printsizeof(signed int);
    printsizeof(unsigned int);
    printsizeof(long);
    printsizeof(signed long);
    printsizeof(unsigned long);
    printsizeof(long long);
    printsizeof(signed long long);
    printsizeof(unsigned long long);
    printsizeof(float);
    printsizeof(double);
    printsizeof(void);
    printsizeof(enum_t);
    printsizeof(function_t);
    printsizeof(int8_t);
    printsizeof(uint8_t);
    printsizeof(int16_t);
    printsizeof(uint16_t);
    printsizeof(int32_t);
    printsizeof(uint32_t);
    printsizeof(int64_t);
    printsizeof(uint64_t);
    printsizeof(intptr_t);
    printsizeof(uintptr_t);
    printsizeof(size_t);
    printsizeof(ssize_t);
    printsizeof(off_t);
    printsizeof(pid_t);
    printsizeof(pthread_t);
    printsizeof(pthread_mutex_t);
    printsizeof(pthread_cond_t);
    printsizeof(diminuto_ticks_t);
    printsizeof(diminuto_ipv4_t);
    printsizeof(diminuto_port_t);
    printsizeof(diminuto_unsigned_t);
    printsizeof(diminuto_signed_t);
    return 0;

}

My current build system is a Dell PC with a four-core 2.4GHz Pentium processor. It's running Ubuntu 14.04, Linux 3.13.0, and gcc 4.8.2. Here is what I get when I run the program there.

sizeof(char)=1
sizeof(char*)=8
sizeof(signed char)=1
sizeof(signed char*)=8
sizeof(unsigned char)=1
sizeof(unsigned char*)=8
sizeof(short)=2
sizeof(short*)=8
sizeof(signed short)=2
sizeof(signed short*)=8
sizeof(unsigned short)=2
sizeof(unsigned short*)=8
sizeof(int)=4
sizeof(int*)=8
sizeof(signed int)=4
sizeof(signed int*)=8
sizeof(unsigned int)=4
sizeof(unsigned int*)=8
sizeof(long)=8
sizeof(long*)=8
sizeof(signed long)=8
sizeof(signed long*)=8
sizeof(unsigned long)=8
sizeof(unsigned long*)=8
sizeof(long long)=8
sizeof(long long*)=8
sizeof(signed long long)=8
sizeof(signed long long*)=8
sizeof(unsigned long long)=8
sizeof(unsigned long long*)=8
sizeof(float)=4
sizeof(float*)=8
sizeof(double)=8
sizeof(double*)=8
sizeof(void)=1
sizeof(void*)=8
sizeof(enum_t)=4
sizeof(enum_t*)=8
sizeof(function_t)=1
sizeof(function_t*)=8
sizeof(int8_t)=1
sizeof(int8_t*)=8
sizeof(uint8_t)=1
sizeof(uint8_t*)=8
sizeof(int16_t)=2
sizeof(int16_t*)=8
sizeof(uint16_t)=2
sizeof(uint16_t*)=8
sizeof(int32_t)=4
sizeof(int32_t*)=8
sizeof(uint32_t)=4
sizeof(uint32_t*)=8
sizeof(int64_t)=8
sizeof(int64_t*)=8
sizeof(uint64_t)=8
sizeof(uint64_t*)=8
sizeof(intptr_t)=8
sizeof(intptr_t*)=8
sizeof(uintptr_t)=8
sizeof(uintptr_t*)=8
sizeof(size_t)=8
sizeof(size_t*)=8
sizeof(ssize_t)=8
sizeof(ssize_t*)=8
sizeof(off_t)=8
sizeof(off_t*)=8
sizeof(pid_t)=4
sizeof(pid_t*)=8
sizeof(pthread_t)=8
sizeof(pthread_t*)=8
sizeof(pthread_mutex_t)=40
sizeof(pthread_mutex_t*)=8
sizeof(pthread_cond_t)=48
sizeof(pthread_cond_t*)=8
sizeof(diminuto_ticks_t)=8
sizeof(diminuto_ticks_t*)=8
sizeof(diminuto_ipv4_t)=4
sizeof(diminuto_ipv4_t*)=8
sizeof(diminuto_port_t)=2
sizeof(diminuto_port_t*)=8
sizeof(diminuto_unsigned_t)=8
sizeof(diminuto_unsigned_t*)=8
sizeof(diminuto_signed_t)=8
sizeof(diminuto_signed_t*)=8

My current reference target is an Nvidia Jetson board with a TK1 four-core ARMv7 processor. It's running Ubuntu 14.04, Linux 3.10.24, and gcc 4.8.2. Here is what I get when I run the program there.

sizeof(char)=1
sizeof(char*)=4
sizeof(signed char)=1
sizeof(signed char*)=4
sizeof(unsigned char)=1
sizeof(unsigned char*)=4
sizeof(short)=2
sizeof(short*)=4
sizeof(signed short)=2
sizeof(signed short*)=4
sizeof(unsigned short)=2
sizeof(unsigned short*)=4
sizeof(int)=4
sizeof(int*)=4
sizeof(signed int)=4
sizeof(signed int*)=4
sizeof(unsigned int)=4
sizeof(unsigned int*)=4
sizeof(long)=4
sizeof(long*)=4
sizeof(signed long)=4
sizeof(signed long*)=4
sizeof(unsigned long)=4
sizeof(unsigned long*)=4
sizeof(long long)=8
sizeof(long long*)=4
sizeof(signed long long)=8
sizeof(signed long long*)=4
sizeof(unsigned long long)=8
sizeof(unsigned long long*)=4
sizeof(float)=4
sizeof(float*)=4
sizeof(double)=8
sizeof(double*)=4
sizeof(void)=1
sizeof(void*)=4
sizeof(enum_t)=4
sizeof(enum_t*)=4
sizeof(function_t)=1
sizeof(function_t*)=4
sizeof(int8_t)=1
sizeof(int8_t*)=4
sizeof(uint8_t)=1
sizeof(uint8_t*)=4
sizeof(int16_t)=2
sizeof(int16_t*)=4
sizeof(uint16_t)=2
sizeof(uint16_t*)=4
sizeof(int32_t)=4
sizeof(int32_t*)=4
sizeof(uint32_t)=4
sizeof(uint32_t*)=4
sizeof(int64_t)=8
sizeof(int64_t*)=4
sizeof(uint64_t)=8
sizeof(uint64_t*)=4
sizeof(intptr_t)=4
sizeof(intptr_t*)=4
sizeof(uintptr_t)=4
sizeof(uintptr_t*)=4
sizeof(size_t)=4
sizeof(size_t*)=4
sizeof(ssize_t)=4
sizeof(ssize_t*)=4
sizeof(off_t)=4
sizeof(off_t*)=4
sizeof(pid_t)=4
sizeof(pid_t*)=4
sizeof(pthread_t)=4
sizeof(pthread_t*)=4
sizeof(pthread_mutex_t)=24
sizeof(pthread_mutex_t*)=4
sizeof(pthread_cond_t)=48
sizeof(pthread_cond_t*)=4
sizeof(diminuto_ticks_t)=8
sizeof(diminuto_ticks_t*)=4
sizeof(diminuto_ipv4_t)=4
sizeof(diminuto_ipv4_t*)=4
sizeof(diminuto_port_t)=2
sizeof(diminuto_port_t*)=4
sizeof(diminuto_unsigned_t)=8
sizeof(diminuto_unsigned_t*)=4
sizeof(diminuto_signed_t)=8
sizeof(diminuto_signed_t*)=4

See? Now was that so hard?

Friday, March 13, 2015

Finding the Door into Summer

Not being much of a fantasy reader, I wasn't a fan of Terry Pratchett's fiction, although I did develop a lot of affection for the man from the times I've seen him on panels and such at science fiction conventions (SF fans being like the ancient Greeks in that our gods walk among us and occasionally meddle in our affairs). Nor am I a huge fan of the late Robert Heinlein, although unlike Pratchett I did read a lot of his books in my youth. But the recent death of the former -- at an age not all that much greater than mine -- got me thinking about the latter.

In Heinlein's 1956 novel The Door Into Summer, the book begins with the protagonist renting a rural farmhouse and living with a cat named Petronius, indeed the very same after which my own feline overlord Petronius is named.

Petronius Sloan

Typical of many rural buildings, the house had been added to over the years such that it had an unusual number of doors leading outside. And like many cats, Petronius was constantly demanding to go outside, even in the dead of winter. The cat insisted on exploring all the doors, all the time, and the hero of the story remarked that Petronius was looking for the "door into summer", the door that lead to the grassy meadow full of butterflies that it remembered from the prior season. (SPOILER ALERT) So at the end of the book when Petronius dies of old age after a long and happy life, the hero says that his beloved cat finally found that door he had been looking for.

I liked that metaphor very much. When it is my time to have that final stroll with Death, I hope my friends say "Chip finally found the door into summer".

Friday, March 06, 2015

Configuration Files Are Just Another Form of Message Passing (or Maybe Vice Versa)

In 1989, when I was at the National Center for Atmospheric Research or NCAR in Boulder Colorado, I was working on a spooling system for our output production system. When you have supercomputers (and NCAR had a lot of them, then and now), everything else has to be super too: storage systems, networking, and even your print system. Way back then, NCAR had two gigantic Xerox laser printers which would fill a normal sized room. It also had several automated film cameras to which we wanted to spool graphical output.

This being the time before digital images could be rendered on demand on a laptop and projected on a screen using equipment you could buy at your local office supply store, scientists wanted to express their high resolution scientific visualizations that they generated on the big Cray supercomputers on some persistent media. A Dicomed automated film camera would render the visualizations to a CRT screen and the camera would take photograph of it, all under software control. The system could produce still photographs, movies, and microfiche, in color or black and white. (If you don't recognize any of these technologies, don't sweat it, just go with me here.) So a small team of us set about to develop a distributed graphical spooling system in C to run under SunOS. The result was the Text And Graphics System or TAGS.

TAGS Development Team
The TAGS Development Team (ironic that this old photograph is out of focus).

I was tasked with developing a message passing system for TAGS. After a lot of thought and experimentation, I invented parcels. (Parcels with TAGS, get it?) Parcels were a text-based free-format mechanism that passed messages over stream sockets as a sequence of keyword-value pairs, what we might now refer to as a property. The message payload of a parcel might look something this.

job=1234
user="Chip Overlock"
parameters="align=left title=\"Model A Run 14\""
timestamp=1989-12-01T18:04:22
color=yes

When a parcel was received, it was parsed and loaded into a binary tree that could be search by keyword, so it was a kind of associative memory. As with the parameter property above, parcels could themselves be values of properties that could be subsequently parsed and loaded into yet another binary tree, so parcels could be nested.

Because parcels were text, we could generate messages for testing or other special purposes with our favorite text editor. Parcels could be written to and read from files; that's how TAGS maintained job state persistently across reboots. And best of all, if another developer added some properties to the message that his software sent my software, I didn't have to coordinate with him at all. My software didn't care that were were extra properties in the parcel, as long as the properties my software did care about hadn't changed. It hugely simplified development and deployment.

Yeah, seems kinda quaint now, doesn't it? But parcels predate the ubiquitous use of Extensible Markup Language or XML as a text-based message passing format. Using parcels was a revelation: it was a message passing format, it was a configuration file format, it was a floor polish and a desert topping.

TAGS ran in production for about eight years until both it and the hardware it served and on which it ran were obsolete. At its peak, TAGS produced a half a million frames of film per month. Output media included 35mm color slides, 105mm microfiche, and ultimately even video tape and monochrome and color paper. My parcel code ended up having more longevity than TAGS itself, going on to be used in other NCAR systems after I left the organization.

Years later,  I became quite conversant with the Document Object Model (DOM) and the facilities in Java to parse XML, as well as C and C++ based XML parsing libraries, as  I worked on subsequent projects in other organizations that took a similar approach to message passing and configuration files. You probably won't be surprised to learn that XML made me positively nostalgic for my NCAR days; XML had all the good stuff that I liked about parcels.

<job>1234</job>
<user>Chip Overlock</user>
<parameters>
 <align>left</align>
 <title>Model A Run 14</title>
</parameters>
<timestamp>2006-06-15T08:24:00</timestamp>
<color>yes</color>

A couple of years ago I had a gig with Aircell (now Gogo Business Aviation) to help a team develop a satellite communications system for business aircraft. We used Java Script Object Notation or JSON for much of our configuration files and message passing. It was parcels all over again.

{
 "job": "1234",
 "user": "Chip Overlock",
 "timestamp": "2014-01-28T14:15:04",
 "color": "yes",
 "parameters": {
  "align":  "left",
  "title": "Model A Run 14"
 }
}

We also ended up making some serious use of Asterisk, the open source PBX. The final product, which today you might find in a Gulfstream or Lear business jet, allows a traveller to use a VoIP app on their smart phone to wirelessly place a SIP call that goes to space over the Iridium satellite network, down to a ground station into the telephone system, or even to another plane equipped with the same  product. It was international dialing to anywhere in the world, any time, from anywhere, at any altitude. It worked. You could even send a FAX. (Remarkably, billionaires flying in seventy million dollar business jets still insist on using facsimile machines. Go figure.)

image
My Pelican cases and the flight test aircraft (photo by Doug Young).

Yeah, I was impressed, too. But it's funny what you carry away from projects like that. What I really wanted to use in other projects in the future was Asterisk's configuration file support. Asterisk, and lots of other systems, use a really flexible configuration file format that is sometimes referred to as the INI (for initialization) format. INI files are similar to parcels in that they have properties that are keyword-value pairs, but they allow you to group collections of properties into sections using the square bracket notation as shown below.

job=1234
user=Chip Overlock
timestamp=2015-03-06T11:08:30
color=yes
[parameters]
align=left
title=Model A Run 14

Recently I found myself (somewhat deliberately, it must be admitted) between gigs with some time on my hands and a lot of unused computer equipment sitting around, so I perused my old-school index card file for project ideas.

I had always wanted to learn how to use Flex and Bison. Flex is an open source reimplementation  of Lex, a UNIX tool that generates a lexical scanner based on regular expressions. Bison is similarly an open source reimplementation of YACC or Yet Another Compiler Compiler, a UNIX  tool that generates a shift-reduce parser from an LALR(1) grammar. Over the years I had written a lot of recursive descent parsers and table-driven parsers based on push down automata (PDA), but had never used Lex or Yacc, although I had seen other developers make good use of Flex and Bison (including one on the SATCOM project). I had written my share of grammars that were more or less LL(1), but had only played with LALR(1) grammars as an academic exercise when I was in graduate school.

During another interstitial period, I had added an implementation of red-black balanced binary trees to my Diminuto library, a collection of C code for systems programming under Linux/GNU, portions of which has found its way into many commercial products (including, again, the SATCOM project). I wanted a non-trivial application for the red-black trees in Diminuto beyond the unit-tests that I had written.

And I really wanted a C-based parser for INI files.

Assay is my C-based parser for INI files. The lexical scanner and shift-reduce parser used by Assay  are generated by Flex and Bison from an LALR(1) grammar that I developed that describes, as near as I could determine anyway, the INI file format used by Asterisk. You can trivially import the sections and properties from an INI file into a red-black tree which can be efficiently searched. And you can just as trivially export the tree (what Assay refers to as a configuration), into a file. Or any other thing that can be represented as a standard I/O stream (FILE *).

Here's a code snippet from an Assay unit test that, given a path name, creates a new configuration object, imports the contents of an INI file into it, exports the resulting configuration in INI file format to standard output for inspection, and then interrogates the imported configuration. (I've removed all the unit test framework for readability.)

assay_config_t * cfp;
const char * value;
cfp = assay_config_export_stream(assay_config_import_file(assay_config_create(), PATH1), stdout);
value = assay_config_read_string(cfp, "section1", "keyword1");

You don't have to import an INI file to use Assay. You can create configurations programmatically (which is just what the Bison-generated Assay parser is doing as it reads the INI file).

assay_config_t * cfp;
assay_section_t * scp;
assay_property_t * prp;
cfp = assay_config_create();
scp = assay_section_create(cfp, "ALFA");
prp = assay_property_create(scp, "ONE");
assay_property_value_set(prp, "1", strlen("1") + 1);

Here's the funny thing: it wasn't until I was wrapping up the project, slamming what I thought were the last of the unit tests down to the Assay repo on GitHub, and fixing some typos in the Doxygen comments, that it occurred to me, maybe I could use Assay for message passing too, by using a FIFO or a socket that could be represented by a standard I/O stream object.

A half a day later I had new unit tests doing just that.

Here's two code snippets from an Assay unit test, one from a producer process that creates a configuration, loads it from an INI file, sends it across a socket, and then destroys the configuration, and another showing a consumer process that create a configuration and receives the INI message into it.

assay_config_destroy(assay_config_export_stream_send(assay_config_import_file(assay_config_create(), PATH1), stream = fdopen(pipeline[1], "w")));

cfp = assay_config_import_stream(assay_config_create(), stream = fdopen(pipeline[0], "r"));

So finally,  after more than a quarter of a century of implementing and using stuff just like this, over and over, it finally occurred to me that maybe configuration files are just a special case of message passing. Or possibly vice versa. Whenever I am faced with implementing either a configuration file handler, or a message passing sub-system, maybe it would be a really a good idea to consider whether perhaps those two tasks are really the same problem.

References

J. L. Sloan, Parcels with TAGS, NCAR Technical Note, NCAR/TN-377+IA, National Center for Atmospheric Research, 1992-10,  http://opensky.library.ucar.edu/collections/TECH-NOTE-000-000-000-188

Wikipedia, "INI file", http://en.wikipedia.org/wiki/INI_file

Asterisk Project, "Asterisk Configuration Files", https://wiki.asterisk.org/wiki/display/AST/Asterisk+Configuration+Files

VOIP Info, "Asterisk config files", http://www.voip-info.org/wiki/view/Asterisk+config+files

John R. Levine, Tony Mason, Doug Brown, lex & yacc, 2nd ed., O'Reilly, 1995

John R. Levine, flex & bison, O'Reilly, 2009

Tom Niemann, "LEX & YACC TUTORIAL", http://epaperpress.com/lexandyacc/download/LexAndYaccTutorial.pdf

Saumya, K. Debray, "Lex and Yacc: A Brisk Tutorial", http://www.cs.arizona.edu/~debray/Teaching/CSc453/DOCS/tutorial-large.pdf

Assay, Digital Aggregate Corporation, 2015, http://www.diag.com/navigation/downloads/Assay.html

Tuesday, February 03, 2015

The More Things Change, The More They Change

Stick a fork in RadioShack because it's done, reports Bloomberg. 


How a chain founded on selling parts from the then new-fangled electronics industry can up and die in the midst of the biggest DIY high-tech "maker" movement in history is almost beyond me. But besides competition from the Internet and the difficulty in stocking inventory for the "long tail", I have a sneaking suspicion this is yet another example of short term optimization -- "Let's sell phones, it'll be a lot cheaper and easier, and everybody loves phones!" -- over long term optimization -- "Let's sell complex electronic components and kits at low margins and that require knowledgeable sales people".

I used to routinely buy electronic parts at the Shack at our local outlet mall -- "I need some 120 ohm resistors to terminate a CAN bus." -- where I dealt with an old guy (older than me!) who was clearly retired from one of the Denver area's high-tech manufacturers and who was always interested in what I was working on. "Bring it in, I'd like to see it!" he'd tell me. Wish I knew where he moved on to. But I'm part of the problem, not the solution. Me, I'm the guy that bought an oscilloscope off Amazon.com.

On the plus side, at the coffee shop yesterday morning a college-age guy sitting at the same communal table out of the blue asked me if I was familiar with the programming language Haskell. "Is that the one that's purely functional?" I asked, which was all that was necessary for us to nerd bond. In his defense, I was reading an IEEE book written by a capital theorist that was an economic analysis of the software development process. So he might have had a clue that we were of the same tribe.

It is both the end, and the beginning, of an era. Just like always

Monday, January 19, 2015

Exceeding the Speed of Obsolescence

I drive a 1998 Subaru Legacy GT sedan. I bought it way back in December 1997. It's been a great car, but as much as I love it, I've been looking into replacing it. Along the way, I stumbled into an unexpected connection with my professional life.

Back when I worked for the Scientific Computing Division at the National Center for Atmospheric Research in Boulder Colorado, I spent some time looking at the rate at which supercomputer horsepower increased over the years. When you get a supercomputer, everything else has to be super too, networking, storage, I/O, otherwise you are just wasting your money heating the data center. Same is true for computing at all scales, but with supercomputers (or, today, cloud data centers), you're talking real money. I came up with this logarithmic chart that may seem familiar to long time readers. It shows the rate of growth of a number of computing technologies over time, normalized to unity. For example, if you bought a mass storage device of size n this year, in about eleven years for the same money you will be able to buy a mass storage device of size 10n.

Power Curves

The data on which this chart is based is probably obsolete, but that's not the point. The point is that different technologies grow at different rates. Achieving a balanced architecture is a moving target. As you upgrade one component or subsystem, perhaps because the newer technology is more cost effective, or better performance, or maybe because the manufacturer has discontinued it, the newer component or subsystem is so much more capable that it stresses the rest of the architecture. The memory bus can't keep up with the new CPU cores. The disk farm can't keep up with the higher rate of data generation. You ended up wasting a lot of money, unable to take advantage of the full capability of the new hardware.

The folks at NCAR rightfully worried about this. And it's one of the reasons, I believe, that laptops have replaced desktop computers. It used to seem that desktops, like the old component stereo systems, offered the capability of incremental upgrade. But in the long run, it made a lot more sense to replace the entire system, under the assumption that the new system -- a laptop -- would have a balanced set of components chosen by designers and architects that knew a lot more about it than you did.

This insight came back to haunt me years later when I left NCAR to work at a Bell Labs facility near Denver Colorado. The Labs had a long history of producing large distributed telecommunications  systems, either for the public switched telephone network or for large enterprises, as well as lots of other pretty useful things like C, C++, and UNIX.

NCAR was an organization that never saw a high performance computer it didn't like, and seemed to have one of everything. I had become accustomed to equipment becoming obsolete in just a short few years. Sometimes it seemed like the doors into the NCAR data center should have been revolving, with new computers coming and going all the time. I routinely walked out onto the floor of the main computer room at NCAR's Mesa Laboratory to find some new computer system I didn't recognize being installed.

But organizations that bought large telecommunications systems thought differently about obsolescence. They expected to amortize the expense of their new telecom equipment over a much longer period of time, typically a decade. That placed interesting design constraints on the hardware and software that we developed. We all knew stuff would come and go, because's that the nature of high technology. So the entire system had to be built around the assumption that individual components and subsystems were easily replaceable. Making it more complicated was the assumption -- and sometimes the regularity requirement -- that systems have five nines reliability: that is, the system was up and available 99.999% of the time. This was the first place I ever worked that built products that had to have the capability of patching the software on a running system, not to mention almost all of the hardware being hot-swappable.

Just like NCAR, the disparate rates of growth of different high technologies drove a lot of the design and architecting done by the folks at Bell Labs, but in a completely different way.

The other day I read an article that advised folks thinking of buying a new automobile not to purchase the in-dash navigation system. This made a lot of sense to me. Whether or not I use a navigation application on my iPhone 5, or the Garmin n├╝vi GPS unit that when Mrs. Overlock and I take a road trip we refer to as our "robotic overlord", such devices are on a faster high technology track to obsolescence than most other technology in my car.

That's when it struck me that the future of vehicle infotainment systems isn't to put more and more capability into the automobile dashboard. It's to make your automobile a peripheral of your mobile device. Because while I may still drive the Subaru I bought seventeen years ago, Americans replace their mobile phones every couple of years. Although it has been argued that this rate of replacement is environmentally unsustainable, it still means that my new vehicle purchase has to be considered in the context of the high technology growth curves that so affected my work at both NCAR and Bell Labs.

While many automobile manufacturers provide a mechanism to upgrade the software in their vehicle telematic systems, replacing all that obsolete hardware is a big ticket item. It's bad enough that my old Subaru doesn't have traction control, or continuously variable transmission, or LED headlights; replacing its ancient head unit, the in-dash component that not only controls the FM radio and the CD changer but is so old it actually has a cassette deck, is more than a thousand bucks. That's a chunk of change for a car as old as mine.

What I really want is a set of peripherals -- display, microphone, amplifier and speakers, maybe some buttons on the steering wheel -- that can be slaved to my iPhone while it is plugged into a USB port to power it. And I want it all to work with my new iPhone or Android when I replace my mobile device. The less the car implements itself, the better. Investing in a high-zoot in-dash infotainment system just doesn't make sense, no matter what value proposition the auto manufacturers believe it has.

The broader lesson here: beware of coupling technologies in your product that have very different growth rates. If you must, make sure you can replace components incrementally. If that's infeasible, be prepared for a forklift upgrade. Even so, few devices operate these days standalone; what seems like an independent device is probably just a component in a larger ecosystem.

Coincidentally, I generated this particular version of my chart of technology growth curves the same year that I bought my Subaru. Both continue to serve me well.

But I could totally see myself in a new Subaru WRX sedan.

Update (2016-04-05)

At so it came to pass. But regardless of what I wrote here, I got the Limited trim level of the WRX that has virtually every infotainment gizmo known to, well, at least Subaru.

Untitled