Friday, December 16, 2011

The Cost of Experimenting with the Cloud

So after two months of my part-time experimentation with the Simple Storage Service (S3) from Amazon Web Services (AWS) using Hayloft, my company finally got charged for its usage. Three cents. It will be interesting to see if it actually shows up on my company credit card bill, or if defers it until the amount gets a little larger.

It's going to take a while. Typically I run the automated test suite many times a day, and each execution issues dozens of transactions to S3, but each transaction involves a tiny amount of data. Still, considering all I've learned while working on this project, this is a bargain.

Update 2011-12-18

I just counted: 198 individual S3 operations in the Hayloft test suite as of today, each one involving at least one request and response with S3.

Friday, December 09, 2011

Fibonacci Scaling

Circa 1996 I found myself in an ideal situation: spending long hours working on a very technically challenging project with a multi-disciplinary bunch of engineers who were all smarter than I was. Good times.

During that project one of my esteemed colleagues, John Meiners, needed a way to scale up the bandwidth on communications channels allocated in real-time on an as-needed basis. We were using Asynchronous Transfer Mode (ATM), a technology in which endpoints have to negotiate with the network for the amount of bandwidth they needed. Asking for more bandwidth than we needed meant more cost for our customers. Asking for less meant data loss. Tearing down an existing channel to allocate a bigger one would be viewed as a catastrophic failure. Due to the hard real-time nature of the application, doing everything via best effort was not even on the table.

So we started small, then as load increased we allocated additional channels. But some back of the envelope calculations suggested that if we chose a fixed bandwidth increment, things could get out of hand quickly with the number of channels we had to manage. We needed to start small, but allocate larger channels as the need arose. What algorithm should we use?

John looked at various exponential algorithms, like doubling the bandwidth request of each subsequent channel. But that got really big (and hence really expensive) really quickly, and the scaling didn't match our expected use cases. What he finally settled on was Fibonacci numbers:

Fn = Fn-1 + Fn-2


F0 = 0
F1 = 1

Starting with F1 just as Fibonacci himself did in 1202 (since using zero as a bandwidth multiplier doesn't make much sense) produces a sequence that looks like this:

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...

This turned out to be pretty much ideal. It starts small and conservative, expands but not too quickly, and is trivially computed since you just have to keep track of the last two numbers in the sequence and merely do an addition.

Fibonacci numbers have been around a long time. Although Leonardo of Pisa (a.k.a. Fibonacci) gets the credit, the sequence was written down earlier by Indian mathematicians. It occurs frequently in nature, most noticeably in spirals such as those found in pine cones and shells. John's idea seems pretty innovative in 1996, but lots of high-technology applications of the Fibonacci sequence exist today, including network back off algorithms, lossy data compression, pseudo-random number generators, and tree and heap data structures.

I'm thinking about this because I am once again using this sequence as a way to scale up back off time delays when recovering from network errors in Hayloft, my little C++ research project using Amazon Web Services Simple Storage Service (S3). I have no doubt that Fibonacci's simple little recursive algorithm will continue to serve me well.

Thursday, December 08, 2011

Venkatesh Rao on Developernomics

Venkatesh Rao had an article on the Forbes web site on the valuation of software developers: "The Rise of Developeronomics". He comes closer than anyone else so far (including me) to describing how I see things going career wise for software developers, although it's written from the perspective of someone desperately trying to hire competent developers to compete in a world in which every company has to become a software company, explicitly or implicitly. It's worth a read.

[I]f you don’t have a skill, like baking, which the developer-centric economy can actually use, you are in deep trouble. One reason the Occupy Wall Street movement is not having the impact it seems like it should, based on sheer numbers of people involved, is that many participants simply have no cards left to play in this national game of economic poker.

Investing in good developers is such a good bet at the moment, that if you have money and you happen to find a talented developer who seems to like you and wants to work with you, you should give him/her your money to build something, anything, even if you have no really good product ideas (those are cheap; I’ll sell you a dozen for a dollar). If you don’t have money, you should offer up whatever other kind of surplus you do have. The NPV on a strong and positive relationship with a talented developer today is ridiculously high. If you gain the trust of a talented developer to the point that they are likely to drop any active gig in the future in favor of joining one of your projects, the value is through the roof. The world is your oyster, which you with your developer will open.

The idea of using Net Present Value (NPV) to compare the value of a software developer with alternative investments is one of those things that for me is initially surprising yet obvious in hindsight. In today's economy, where instead of owning a factory and a warehouse of parts all of a company's worth is in intellectual capital (which really means, in the brains of its employees), it makes sense.

Rao writes the blog where he ponders, among other things, organizational issues in high-technology firms, which is probably how he appeared on my radar screen. I don't always agree with him. Sometimes I'm not sure I'm smart enough to agree or disagree with him. But he's got some interesting ideas.

Tuesday, December 06, 2011

Eventual Consistency and Hayloft

Given my thirty plus years of experience in distributed computing, it's a given that I've had to more or less continuously grapple with the Fallacies of Distributed Computing as laid out by L Peter Deutsch, James Gosling, and others:

  1. The network is reliable.
  2. Latency is zero.
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn't change.
  6. There is one administrator.
  7. Transport cost is zero.
  8. The network is homogeneous.

The implications of these myths led Eric Brewer to make Brewer's Conjecture, which became the CAP Theorem once it was proven by Seth Gilbert and Nancy Lynch.

A distributed system can guarantee at most two of the three following qualities:

  1. all nodes can see the same data at the same time (consistency);
  2. every request receives a response as to whether it succeeded or failed (availability);
  3. the system continues to operate despite message loss (partition tolerance).

I've been thinking a lot about the implications of this for storage systems ever since attending the 27th IEEE Symposium on Massive Storage Systems and Technologies, a.k.a. MSST 2011. One of the big take aways I had from that conference was that distributed storage systems cannot reliably support POSIX semantics, in particular the POSIX features of file locking and consistency in file modification. Thanks to the CAP Theorem, POSIX semantics don't scale.

(I find that all of the really interesting problems are scalability problems, regardless of the problem domain. If solving a problem at scale n seems easy, trying applying the same solution at scale n x 1000. Not so much. That's why domains like cloud computing and exascale high performance systems have caught my interest. You can't just take current architectures and multiple by one thousand.)

This is one of the reasons that cloud storage has become popular: the various web-based mechanism for data transfer used by the cloud, like the RESTful HTTP GET and PUT, and XML-based SOAP, have semantics that have proven to be highly scalable. Hence they map well to large distributed system architectures.

It was this very issue that lead to adopt a strategy for its globally distributed Amazon Web Services (AWS), including S3, its non-POSIX-compliant Simple Storage Service, which its CTO Werner Vogels refers to as eventually consistent, choosing availability and partition tolerance over consistency:

Data inconsistency in large-scale reliable distributed systems has to be tolerated for two reasons: improving read and write performance under highly concurrent conditions; and handling partition cases where a majority model would render part of the system unavailable even though the nodes are up and running.

Indeed, the well publicized failures in AWS have been not in S3, but in its Elastic Block Store (EBS), which tries to support POSIX semantics on top of a globally distributed system.

I can see the Fallacies, and the eventual consistency architecture, at work when I play with Hayloft, my little C++ research project that uses S3. Any realistic application using S3 has to deal with network failures, where simply retrying the operation may be sufficient. But it also has to deal with the issues of consistency convergence: the fact that every S3 action may take place on a different AWS server that doesn't yet know what actions have taken place on other AWS servers, even though those actions may have been performed on the same storage objects.

Hayloft features both a synchronous and an asynchronous interface. The former makes it easy to play with. The latter is more likely to be what a production application would use. The asynchronous interface features a mechanism to block until an S3 action completes, or to execute multiple S3 actions incrementally and in parallel. Unlike my prior article on Hayloft, I'll use the asynchronous interface here, but use the blocking completion mechanism because it's simpler. (As before, all of this code is taken from working unit tests in the Hayloft distribution, but edited for readability. Apologies for any typos.)

Here is a code snippet that creates a bucket to hold objects. It starts the action and blocks until it completes.

Multiplex multiplex;
BucketCreate bucket("Bucket", multiplex);

But that's not sufficient. Sometimes (rarely), the action will return a status indicating a temporary failure, like "connection failed", "name lookup error", or "request timed out". The network isn't reliable, and the failure more likely to be on my end, or somewhere along the convoluted path between my system and S3, than in S3 itself. Simply retrying the action after a one second delay is usually sufficient unless the outage is severe. (Note that "success" is not a retryable condition.)

Multiplex multiplex;
BucketCreate bucket("Bucket", multiplex);
for (int ii = 0; ii < 10; ++ii) {
if (!bucket.isRetryable()) { break; }

Once the bucket is created, we can store an object in it. Here's a code snippet similar to the one above except it has to deal with rewinding the input data source if we need to retry the action. You can see that this might get a smidge complicated depending on where your data is coming from.

PathInput * source = new PathInput("./file.txt");
Size bytes = size(*input);
ObjectPut put("InputFile.txt", bucket, multiplex, source, bytes);
for (int ii = 0; ii < 10; ++ii) {
if (!put.isRetryable()) { break; }
source = new PathInput("./file.txt");
bytes = size(*input);
put.reset(source, bytes);

But network outages aren't the only failures we have to worry about. The following code snippet retrieves the metadata for the object we just created.

ObjectHead head(put, multiplex);
for (int ii = 0; ii < 10; ++ii) {
if (!head.isRetryable()) { break; }

This usually works. Except when it doesn't. Sometimes (and much more frequently than I see retries due to temporary network failures) the action will return a non-retryable status indicating the object wasn't found. WTF?

The object head action was serviced by a different AWS server than that of the object put action, one that hadn't yet been notified of the object put. This is eventual consistency in action. The unit test tries a code snippet like the following, treating the non-existence of the object as a retryable error, since it knows darn well the object put was successful.

ObjectHead head(put, multiplex);
for (int ii = 0; ii < 10; ++ii) {
if (head.isRetryable()) {
// Keep trying.
} else if (head.isNonexistent()) {
// Keep trying.
} else {

Eventually this works. But it's even more complicated than that. If you ran this exact same code snippet again, it might still fail at first, because it was run on yet another server which had not yet been updated to be consistent with the others. In fact, its success on a subsequent attempt may be because it just happened to be executed on the original server, not because all the servers had been updated. Applications have to be designed around this. Amazon's S3 best practices document recommends not using code snippets like this to verify the successful put of an object.

The unit tests log a message every time they retry. I can actually watch how long it takes for the AWS servers to become consistent, and how this consistency convergence changes with the load on the system. Typically it converges so quickly that the unit test doesn't have to retry. Rarely, it takes a second or two. Or sometimes, not so rarely. I was testing a lot of this code on the day after the Thanksgiving holiday in the United States, also known as Black Friday, typically the busiest shopping day of the Christmas season. It frequently took two or three seconds for the system to converge to a consistent state. I haven't seen this kind of latency before or since. But, to its credit, S3 always converged and the unit tests all eventually ran successfully to completion.

Whether or not this is an issue for you depends on your S3 application. For a lot of applications, it won't matter. For applications where read-after-write consistency is important, AWS offers a higher (more expensive) tier of service where objects are co-located in one of their big data centers instead of possibly being spread out among multiple centers; this tier offers read-after-write consistency. If immediate consistency is an issue for you, S3 may not be right platform for your application. However, I would argue that distributed computing isn't the right paradigm for you.

The cloud is not the place for everyone.

Thursday, December 01, 2011's Simple Storage Service and Hayloft

Information technology is like fashion: you live long enough, you see it change many times. And sometimes the new stuff looks kinda like the old stuff. But that doesn't mean you don't get excited about it. Change is inevitable; you might as well learn to enjoy it.

I'm watching the IT world evolve to a model that is handheld battery powered consumer devices, which these days we're calling smartphones and tablets, communicating wirelessly with enormous distributed computing systems, which we now call cloud computing. I'm all excited. For lots of reasons. Not the least of which is the fact that my professional career over the past thirty-five years as wobbled back and forth between developing software for small embedded systems and for high performance computers, gigantic distributed systems, and server farms. Turns out, the skill sets are the same. I am not the first person to have noticed this.

My most recent project is to develop Hayloft, a C++ object oriented interface to Amazon Web Services (AWS) Simple Storage Service (S3). My goal is for Hayloft to make it easy to use S3 from embedded devices. S3 is's web-based storage service that can (if you do it right) reliably and inexpensively store blobs of arbitrary data. How much data? A single blob, or object, can be up to five terabytes in size. And you can store a lot of objects. Objects are organized into buckets. Each bucket can be thought of as a web site with its own domain name, which means bucket names have to conform to the internet's Domain Name System (DNS) syntax. Each object is independently addressable via its own URL. To S3, these objects are simply opaque data files in the classic mass storage system reference model sense: they can range from HTML pages to scientific databases to virtual machine images to what have you.

(Indeed, I strongly suspect you could build a distributed mass storage system from S3 just as Cycle Computing built a 10,000 core distributed supercomputer from's Elastic Computing Cloud (EC2) for one of their customers. I'm pretty excited about that, too.)

Why would you you want to access S3 from an embedded device? Can you think of any applications for an infinitely large, internet-accessible, web-addressable, reliable, secure, storage system? Whether you are building clever consumer applications for the commercial space, developing sensor platforms for the defense and intelligence domain, or information processing applications for the industrial, financial, or medical markets, if the thought of this doesn't make your hands shake, you aren't thinking clearly. Seriously. Terabytes of storage are just a wireless connection away from your handheld and embedded applications.

Here's a little taste of what Hayloft is like. These examples are taken directly from the unit test suite included with Hayloft with all the EXPECT and ASSERT statements removed. I've also removed all of the error recovery and consistency convergence code which in any practical circumstances are very necessary; I'll talk more about that in later articles. Hayloft presents both a synchronous and an asynchronous interface. These examples use the simpler synchronous interface: when the C++ constructor completes, all the S3 work is done.

Here's a code snippet that creates a new bucket, and writes a local file into an object in it. The user key id and secret access key, which are sort of like your login and password for AWS, and which are provided by, are in environmental variables. Also in an environmental variable is a bucket suffix appended to all bucket names to make them globally unique; I use my internet domain name. All of the other S3 parameters, like location constraint, endpoint, access control list, and so forth can be omitted, because for experimenting, the defaults are reasonable. Input and output is handled using Desperado I/O functors.

BucketCreate create("bucket");
PathInput input("./oldfile.txt");
Size bytes = size(input);
ObjectPut put("Object.txt", create, input, bytes);

By default, buckets and objects are created by Hayloft with private access. With just a few more lines you can specify public read access so that you can (and I have) retrieve this object using your web browser.

AccessPublicRead access;
Context context;
BucketCreate create("bucket", context);
Properties properties;
PathInput input("./oldfile.txt");
Size bytes = size(input);
ObjectPut put("Object.txt", create, input, bytes, properties);

This object can now be accessed using the following URL.

This is a code snippet that reads the object into a local file, then deletes the object.

PathOutput output("./newfile.txt");
ObjectGet get("Object.txt", create, output);
ObjectDelete delete("Object.txt", create);

Other than some #include statements for header files, and some curly brackets and what not, that's about all there is to it, if you ignore (at your peril) error recovery. Here's a snippet that gets a table of contents from an existing bucket, then uses the C++ Standard Template Library (STL) to iterate through it and print the object names.

BucketManifest manifest("bucket");
BucketManifest::Manifest::const_iterator here = manifest.getManifest().begin();
BucketManifest::Manifest::const_iterator there = manifest.getManifest().end();
while (here != there) {
const char * key = here->first.c_str();
printf("%s\n", key);

While using Hayloft is easy, installing it may take a few minutes. Hayloft is built for GNU and Linux on top of Desperadito, my C++ systems programming library (a subset of the much larger Desperado), and libs3, Bryan Ischo's excellent C-based S3 library. libs3 is built on top of CURL, Open SSL, and XML2, all of which can be probably acquired through the standard package manager on your Linux system. The unit tests for Hayloft are built with Google Test (or Google Mock, which includes Google Test), Google's outstanding C++ unit testing framework, and Lariat, my thin layer over Google Test.

But once you get past the installation, starting to play with S3 in C++ applications is as simple as pulling out your credit card, signing up for an AWS account, and writing a few lines of code.

Saturday, November 26, 2011

Making Both Archives and Shared Objects

It's not uncommon to want to support both static linking, with a library archive, and dynamic linking, with a shared object. I almost always choose the latter, but it is not uncommon for small embedded applications to be statically linked. This can actually reduce the memory footprint of the entire system when not many object files are shared. Or sometimes it's necessary for esoteric reasons of shared memory or what not. Google Test, my favorite C++ unit testing framework, actually recommends static linking against its libgtest.a archive.

When supporting both static and dynamic linking, I always generate the library archive, for example libhayloft.a, then automate the generation of the shared object, Here's a Makefile snippet that does that. It simply unloads the entire archive into a temporary directory, creates a shared object, then removes the directory. (I've resolved all the make variable names to make this a little more comprehensible.) libhayloft.a
HERE="`pwd`"; \
THERE="`mktemp -d /tmp/hayloft.XXXXXXXXXX`"; \
( cd $$THERE; ar xv $$HERE/libhayloft.a ); \
gcc -shared -Wl,-soname, -o \ $$THERE/*.o; \
rm -rf $$THERE

Thursday, November 24, 2011

Dependency Generation with Subdirectories using gcc

Here I am on Thanksgiving morning working on my latest project, Hayloft. As is not uncommon, Mrs. Overclock (a.k.a. Dr. Overclock, Medicine Woman) has to work today, leaving your fearless leader to finish up the dinner preparations later in the day. So I thought I'd debug an issue that's been driving me to distraction: automatic dependency generation using the GNU gcc/g++ compilers.

In Hayloft, I have translation units (that's what the standards call C and C++ source files) organized into subdirectories, for example hayloft/Logger.cpp and s3/BucketCreate.cpp, but using just one Makefile. This makes it easier to manage the code base, with absolutely no additional effort on my part, since the make command's pattern matching causes a rule like

%.o: %.cpp
g++ -o $@ -c $<

to do exactly what I want: The % in the target %.o matches the entire file name path, for example s3/BucketCreate, and is propagated into the prerequisite %.cpp. The object file ends up in the same subdirectory. Life is good.

Alas, if I do the usual command to generate dependencies

g++ -MM -MG s3/BucketCreate.cpp

it creates a rule not with the s3/BucketCreate.o target (which is what I want) but with BucketCreate.o. Since I'm running the Makefile from the project's root directory, there will never be a BucketCreate.o nor a requirement for it. If I edit a prerequisite for s3/BucketCreate.cpp (that is, any header file that it includes), s3/BucketCreate.o will never be automatically regenerated.

So I had to write a little rule to go through the source code base, generate the dependencies for each source file individually, and prepend the directory path for that source file onto the make target. Here's what this looks like. (Apologies as usual for any Blogger editor weirdness.)

DEPENDS:=${shell find . -type f \( -name '*.c' \
-o -name '*.cpp' \) -print}

cp /dev/null
for F in $(DEPENDS); do \
D=`dirname $$F | sed "s/^\.\///"`; \
echo -n "$$D/" >>; \
$(CXX) $(CPPFLAGS) -MM -MG $$F \
>>; \


It's a bit of a tribute to make and the shell that this is even possible.

Update (2012-03-14)

I think this rule might be a little simpler, having just now discovered the -MT option, although I had to separate the C and C++ files to handle their differing suffixes.

CFILES:=$(shell find . -type f -name '*.c' \
CXXFILES:=$(shell find . -type f -name '*.cpp' \

cp /dev/null
for F in $(CFILES); do \
D=`dirname $$F`; \
B=`basename -s .c $$F`; \
$(CC) $(CPPFLAGS) -MM -MT $$D/$$B.o -MG $$F \
>>; \

for F in $(CXXFILES); do \
D=`dirname $$F`; \
B=`basename -s .cpp $$F`; \
$(CXX) $(CPPFLAGS) -MM -MT $$D/$$B.o -MG $$F \
>>; \

Update (2013-11-06)

I upgraded my server to a later version of Ubuntu which doesn't appear to have the -s flag on the basename command. So now I'm back to something more like the prior approach. 

Monday, November 14, 2011

Abstraction in C++ using I/O Functors

In Abstraction in C Using Source and Sinks, I wrote about how useful I've found it to abstract out I/O interfaces so that I could write software that didn't have to know from where its input was coming or to where its output was going: sockets, standard I/O library FILE pointers, memory buffers, the system log, etc. That C-based project, Concha, was a clean room reimplementation of work I had done for a client back in 2007. What I didn't mention was that it was in turn inspired by work I had done in 2006 for a C++-based project, Desperado. Now that I'm building yet another project, Hayloft, on top of Desperado, I'm reminded how much I like the Desperado I/O abstraction.

The Desperado I/O abstraction defines two interfaces, Input and Output. These interfaces make use of the ability in C++ to overload the parentheses operators to create functors (a.k.a. function objects): objects that can be manipulated through a function call interface. Looking at code, you will think you are looking at function calls. What you are really seeing are instance method calls against an object that overrides the parentheses operators.

The Input interface defines the following four operations.

int operator() ();

This returns the next character as an integer, or EOF (end of file) to indicate that no more input is available.

int operator() (int ch);

This pushes a single character back into the input stream. (This vastly simplifies the implementation of many common parsing algorithms that require one-token look ahead.) The interface guarantees that at least one character can be pushed back. It need not be the most recent character read. How the push back is actually done is up to the underlying implementation; buffering it inside the object in the derived class implementation is acceptable. If the push back is successful, the character is returned, otherwise EOF is returned.

ssize_t operator() (char * buffer, size_t size);

This returns a line terminated by a newline character (/n) or EOF. The optional size parameter allows the caller to place a limit on the number of characters returned. The result is guaranteed to be NUL terminated as long as the buffer is at least one byte in length. The actual number of characters input is returned, or EOF if none.

ssize_t operator() (void * buffer, size_t minimum, size_t maximum);

This is typically used for unformatted data. The minimum parameter indicates the minimum number of characters to be returned. The implementation blocks until that many characters are available or EOF is reached. The maximum parameter indicates the maximum number of characters to be returned if it can be done without blocking. Specifying a minimum of zero is a common way to implement non-blocking I/O using polling. Specifying the same value for minimum and maximum simply blocks for a fixed amount of data. Using the value one for minimum results in something similar to the POSIX read system call. The actual number of characters input is returned, or EOF if none.

That's it. No open or close: those are the job of either the caller, or of the implementation's constructor and destructor. The Input base class isn't pure: it actually implements all of these operators, returning EOF for all operations. That makes the base class the equivalent of /dev/null.

Since all implementations derive from the Input base class, you can pass a pointer or reference to any implementation to a function expecting an Input pointer or reference and it will read its data from that object without having any idea what the actual underlying data source is. Desperado implements a variety of derived classes, such as DescriptorInput (its constructor takes a file descriptor as an argument), FileInput (a FILE pointer), BufferInput (a read/write memory buffer), DataInput (a read-only memory buffer), and PathInput (a path name in the file system). All of these derived classes implement the full Input interface.

The Hayloft project leverages this in its Parameter class. Parameter takes an Input reference or pointer, uses the line input functor, and reads a parameter value into a C++ std::string that is a instance variable named parameter. It has no idea where this value is coming from: a file, a socket, a memory location, what have you. Here's the complete implementation of the method in Parameter that does this.

(I apologize in advance for any violence done to this and other code snippets by the Blogger formatter - which truly sucks at code examples even when editing raw HTML - and for any typos I make when transcribing this from actual working source code.)

void Parameter::source(Input & input, size_t maximum) {
int ch;
while (maximum > 0) {
ch = input();
if ((ch == EOF) || (ch == '\0') || (ch == '\n')) { break; }
parameter += ch;

You can see the Input functor called on the fourth line: the input object reference is used just as if it were a function.

The Output interface defines the following five operations.

int operator() (int c);

A single character in an integer is emitted. The character is returned or EOF if unsuccessful.

ssize_t operator() (const char * s, size_t size);

A null terminated string is emitted. The optional size parameter places a limit on the number of characters emitted. The actual number of characters emitted is returned or EOF if none.

ssize_t operator() (const char * format, va_list ap);

A variable length argument list is emitted according to the printf-style format string. The actual number of characters emitted is returned or EOF if none.

ssize_t operator() (const void * buffer, size_t minimum, size_t maximum);

At least a minimum and no more than a maximum number of bytes are emitted. As before, more than the minimum is emitted if it can be done without blocking. The actual number of characters emitted is returned or EOF if none.

int operator() ();

Any data buffered in the underlying implementation are flushed to the output stream. A non-negative number is returned for success, EOF for failure.

Similar to the Input interface, the Output base class is not pure: it implements all of these functors, each of which throws the data away and returns success. The Output base class is also the equivalent of /dev/null.

As you might expect, Desperado implements a variety of derived classes: DescriptorOutput, FileOutput, BufferOutput, PathOutput, and SyslogOutput (which writes all of its output to the system log).

The Desperado Print class makes effective of Output functors. Its constructor takes a single Output reference as its only parameter.

Print:Print(Output & output);

Print defines its own functor which takes a variable length argument list. Here is its entire implementation.

ssize_t Print::operator() (const char * format ...) {
va_list ap;
ssize_t rc;
va_start(ap, format);
rc = output(format, ap);
return rc;

You can see the output constructor reference used just like a function call on the fifth line.

In an application, you can now write code like this, which I do all the time. (Warning: head explosion may be imminent.)

extern Output * myoutputp;
Print printf(*myoutputp);

printf("An error occurred!\n");
printf("errno=%d\n", errno);

This illustrates one of the greatest strengths and greatest weaknesses of C++: if you encountered this while reading code, you would likely to assume that the printf was the standard I/O statement you know and love. But it isn't at all; it's a Print object. And that Print object has no idea where it's output is going. Depending on actual type of its constructor argument, it could be a socket, a file, or even a memory buffer.

It is possible to have a class that offers both an Input and an Output interface. Hayloft does this for its Packet class. Packet implements an infinite (as long as memory holds out anyway) bi-directional memory buffer. Although it offers specialized methods to prepend and append data, it also exposes both an Input and Output interface so that a Packet can be used as a data sink (by passing its Output interface) or as a data source (through its Input interface). A Packet can be used in this manner as a ring buffer.

This pattern is so common that Desperado defines an InputOutput interface for such implementations. This interface defines two methods that each return a reference to the appropriate interface.

Input & input();

Output & output();

This allows you to do things like

extern Packet * mypacketp;
Print printf(mypacketp->output());

to collect printed output into a Packet.

I/O functors illustrate one of those capabilities that makes C++ both extremely powerful and often hard to understand, because it allows one to use C++ not just as a programming language but as a meta-language, effectively creating a new domain specific language with its own operations, one that just happens to look vaguely like C++. It also means that while reading C++ code, especially in large code bases, it can be very difficult to make any assumptions about what is going on without both a broad and a deep understanding of both C++ and the underlying code.

Monday, October 31, 2011

Up the Source Code Organization

I've spent decades working in humongous C and C++ source code bases. Humongous means many millions of lines of code. Some of it I even wrote. Most of it I didn't. Most of it didn't even originate in the organization in which I found myself working. Integrating large source bases in which major components weren't necessarily designed to work together can be a challenge.

With the plethora of open source software now available, it's not unusual that to use that new software stack that could save you months of development time, you discover you need to install a handful of other software stacks on which it depends. And those stacks have their own dependencies. And so on. Gone are the days in which you could just get by with the standard C library.

These days at Digital Aggregates I've been working on a project that incorporates not only several third-party software stacks written in C or C++, but some of my own software libraries, some of which I wrote years ago. My integration experience has caused me to revisit how I put those libraries together. Laziness being a virtue among software developers, I've come up with some organizational techniques to make my life easier at least when using my own software. These techniques exploit two things: the Digital Aggregates domain name,, and a project name unique within the company, and borrows from similar techniques used in the Java world.

Every Digital Aggregates project gets a project name. The name itself doesn't have to have any significance to the project, although it usually does in at least a pun-ish way that may only have meaning to me. The name is not an acronym, nor is it a name already used at the time by some well known (to me, anyway) software package

For example, today I wrote code using Desperado, a collection of C++ classes that implement design patterns I've found useful in embedded software development; Diminuto, a collection of C functions that implement design patterns I've found useful in systems programming on Linux and GNU based systems; Lariat, a small software wrapper around Google Test, Goggle's excellent C++ unit testing framework, that allows you to set resource limits like real-time execution duration on unit tests from the command line; and Hayloft, a work in progress. The project names help me keep everything straight.

As mundane as it sounds, the project name starts life on a manilla file folder. I keep at least some paper documentation around, either temporarily or permanently. The manilla file folder keeps all of it together and it can be easily identified as it lays on my desk or is filed in the file cabinet. On my desk in my home office right now I have file folders labelled Biscuit, Lariat, and Hayloft. Just minutes ago I consulted the file folder labelled Desperado in the file cabinet.

I keep lab notebooks that are organized not by project but chronologically and by client. I use the project name to identify notes I make in the notebook.

Obviously, I use the project name when I write about my work here in my blog, as in Automating Maintenance on Embedded Systems with Biscuits where Biscuit is the project name.

I also use it as part of the URL for the project page on the Digital Aggregates web site, for example

which of course incorporates the Digital Aggregates domain name as well. Furthermore, the project name becomes part of the tar ball name. For example

is a compressed tar ball for the Zinc distribution of Desperado.

I use the project name as the repository name in Subversion, my current source code control system of choice. My Subversion layout for every project follows the pattern I used for Desperado, and is more or less right out of the Subversion documentation.


desperado is the Subversion repository name. The directory trunk contains the main development branch, tag contains a check point of each major release, and branches contains any temporary or ancillary development branches.

Directory names on disk incorporate the project name. For example, the implementation files for Desperado are in


where this is just a check out of the main development branch from Subversion.

I'm a big fan of Eclipse, the open source GUI-based IDE, for C, C++, and Java development. The project name becomes the Eclipse project name, so the name Desperado shows up in the Project Explorer or C/C++ views in Eclipse, with all the source files underneath it.

Organizing header files for projects which use multiple components can be especially challenging, and I frequently see it done especially poorly. For my C and C++ projects that result in libraries intended to be used in other work, I've borrowed from some Java best practices and use both the domain name and project name as part of the header file path name. For example, a source file I was edited today had the following #include statements.

#include "com/diag/hayloft/Packet.h"
#include "com/diag/desperado/Platform.h"
#include "com/diag/desperado/Print.h"
#include "com/diag/desperado/Dump.h"

The domain name becomes part of the path used to include the header file for a specific project. This approach is used not just by source files using the libraries, but also in the source files that implement the library. This makes it perfectly clear from which project a header file is being included, and prevents any header file name collisions. For example, the Desperado header files are under


while the Hayloft header files are under


and the GNU g++ or gcc command line options




are used to point the compiler to the right places. (You might choose to use the -iquote option instead.) Although these point to the source code directories where I do development, they could just as easily point to /usr/include or maybe /usr/local/include and the same naming system would prevent any conflicts with other header files.

I'm also a fan of namespaces in C++, where I use a similar approach. All of the Desperado C++ symbols are in the namespace


where as all of the Hayloft C++ symbols are in the namespace


which results in code snippets that look like the one below.

namespace com {
namespace diag {
namespace hayloft {

* Ctor.
explicit Logger(::com::diag::desperado::Output &ro)
: ::com::diag::desperado::Logger(ro)
, mask(0)


This is a constructor for the Hayloft class Logger that derives from the Desperado class Logger and uses a reference to an object of the Desperado Output class as an argument.
Furthermore, some projects require more complex namespace organizations, and this is reflected in both the implementation and header file directory hierarchies. For example, C++ symbols in the namespace


find their header files in


and their implementation files in

${HOME}/src/Hayloft/s3 .

This may sound complex, but it is in practice easily done.

I predict now you are wondering what I do for C functions, where namespaces aren't an option. Using the Digital Aggregates domain name as part of C functions violates my laziness rule. It just seems to be more work than it's worth. But I do include the project name and a name roughly equivalent to a C++ class name into the function name. For example

void * diminuto_map_map(
  uintptr_t start,
  size_t length,
  void ** startp,
  size_t * lengthp

is a prototype for a C function in the Diminuto library that is part of the map feature for a function that maps a physical memory address to a virtual memory address.

I use a similar approach when defining macros that are expanded by the C preprocessor, resulting in names like




which can sometimes seem a little cumbersome.

I try not to use preprocessor macros at all when writing C++. But I do use the expanded naming system in both C and C++ when defining preprocessor guard symbols that prevent header files from being included more than once. This results, for example, in the guard preprocessor symbol


for a C++ header file in the s3 sub-directory and sub-namespace.

I do use the domain name in the name of any environmental variables. For example, you can set the log level in Hayloft by setting the value of an environmental variable


which incorporates the domain name, the project name, and even the class name. Since environmental variables are in a global namespace that is be shared among pretty much all of the software being used by a particular user, and because you typically don't have to type the environmental variable name very often, this seems the safest approach.
In a future article I'll be talking about how I apply Google Test in projects like Hayloft, where I use a similar naming scheme that differs slightly to prevent collisions between the header files and classes under test and the test and mock classes themselves.

Update (2016-07-22)

Some time ago I converted from Subversion to Git, and started hosting my source code repositories on GitHub. I follow a similar convention as described above by giving my repositories names like com-diag-diminuto and com-diag-scattergun. This makes it really easy to keep track of stuff when I clone a repo in my src directory on my build server. When I use Eclipse, I give my projects the same name as the repo name. After using these scheme for almost five years now, I like it better and better.

Friday, October 14, 2011

Automating Maintenance on Embedded Systems with Biscuits

If you make a living, as I have from time to time, developing embedded systems, it's not unusual to find your work day interrupted on a regular basis by application developers that want their own development systems updated to the latest platform release, or firmware installed to support a new version of an FPGA, or some other routine system maintenance chore. Even embedded systems based on Linux and GNU are frequently too small to implement the tools used by big iron, graphical interfaces like the Synaptic Package Manager for Ubuntu, or even command line tools like yum, apt, or dpkg. Sometimes the chore is as mundane as creating a tar ball of the system logs to ship back to you so you can debug a field problem. But you still don't want to burden a non-geek user with all the skills, and maybe even the root password, necessary to do that. What would be great is if you could automate that system maintenance chore in a secure fashion and get it done by just handing the user a USB thumb drive and telling them where to stick it. Biscuits do that.

A biscuit is a cpio archive that has been compressed with bzip2 and then encrypted using gpg (GNU Privacy Guard a.k.a. GNUPG) into a binary file. The biscuit command decrypts and decompresses the file, by default named biscuit.bin in the current working directory, and extracts its contents into a temporary directory. It then looks for an unencrypted executable file named biscuit, script or binary, in that directory. If it finds it, it modifies the PATH and LD_LIBRARY_PATH environmental variables to include the temporary directory, changes its current working directory to the original delivery media (for example, a USB drive), and then runs the executable. What happens next is up to the executable. When the executable exits, the temporary directory is cleaned up and and deleted. While it runs, the executable has access to both the temporary directory, where any other collateral material from inside the biscuit can be found, and the working directory, where results may be placed.

Usage: biscuit [ -C working_directory ] [ -f biscuit_file ]

The encryption defaults to EIGamal (ELG-E), an asymmetric key encryption algorithm based on Diffie-Hellman key exchange, using 1024 bit keys. EIGamal is not patent encumbered, and although longer keys are possible, 1024 bits strikes a good balance between speed and security for slower embedded processors. The server where biscuits are packaged maintains a key ring containing the public (encrypting) keys for all the embedded systems on which biscuits may be deployed. Each embedded system has a key ring containing its own secret (decrypting) keys.

Different unique public/secret key pairs can be created, allowing you to distinguish between product lines, architectures, even individual systems, whatever makes sense for your business. This allows you to create biscuits for product flavor A that cannot be executed on product flavor B: the wrong biscuits simply won't successfully decrypt. The encryption serves as both an authentication mechanism (the biscuit comes from a reputable source) and an authorization mechanism (the biscuit can be executed using root privileges). Since users can't crack open biscuits, they not only can't create their own biscuits, they can't even see what your biscuits do. Biscuits are opaque binary files. They cannot be easily reverse engineered without cracking the encryption. They can be delivered on removable media such as USB thumb drives, CD-ROMs, or DVDs, or they can be downloaded across a network via ftp, tftp,or scp.

I've been using this or similar approaches to automating system maintenance of embedded systems for several years now. A link to a clean room implementation of biscuits can be found on the Biscuit project web page. It's mostly just a Makefile to build GNUPG for multiple architectures, generate and manage the keys for both the build server and the embedded hosts, create the biscuit command, and package up biscuit binary files, plus a couple of example biscuit scripts. But the distribution tar ball also contains some useful stuff for automating the use of biscuits in your embedded system, depending on your needs and comfort level.

Users can just run the biscuit command manually and point it at a biscuit.bin binary file. You can set your embedded host up so that only root can do so, so that users must use sudo to run biscuit.

If your embedded Linux system uses udev, like big iron distributions like Ubuntu, you can install the 99-com-diag-biscuit.rules file into /etc/udev/rules.d/99-com-diag-biscuit.rules . That will cause your system to temporarily mount any block device that is inserted, look for a biscuit.bin file, run the biscuit command against it, and unmount the device when it completes. The udev rules I wrote for my use only apply to removable storage devices by excluding the device names used for permanently attached block storage. You should customize these rules to fit your own platform.

If your system uses a simpler version of udev, like embedded distributions like Angström, you can install the script into /etc/udev/scripts/ to do the same thing. Or just look at the one line I inserted into Angström's that runs biscuit against the auto-mounted media, and do something similar on your system.

If you have an older embedded system that doesn't use udev but instead uses hotplug, you can install the script into /etc/hotplug.d/block/biscuit.hotplug to accomplish the same thing.

If your kernel was configured to support hotplug, but you don't have any of the infrastructure, you can still do the above, plus you can install into /sbin/hotplug. Look for /proc/sys/kernel/hotplug on your embedded host. If it exists, your kernel is built to support it. If its contents are blank, then you need to tell the kernel to invoke /sbin/hotplug by echo-ing that string into /proc/sys/kernel/hotplug every time you boot your system, or rebuild your kernel configured to do this automatically.

There are some caveats you would be wise to heed.

Once biscuits are released into the wild, you must assume that they will go feral. People will dig up a USB drive you gave them years before, and insert it into a system. They may do so out of desperation, remembering that this did something useful long ago. Or they may simply have forgotten where the drive came from and don't even know that there is a biscuit on it. This is the dark side of biscuits: they are effectively viruses.

Biscuit scripts deal with the former by being paranoid: for example, before loading new firmware on a system, the script makes sure that the system isn't already running an even newer version of the firmware. I have also written scripts that deleted the original encrypted file from the USB drive, making it a use once biscuit. Or the script dropped an identifying file on the USB drive (I like using a MAC address from the host embedded system as a file name), and checked for the presence of this breadcrumb first whenever it runs to check whether it has passed this way before. This approach works well because the user can delete the breadcrumb manually to force the biscuit to rerun.

If you include collateral material on the USB drive, for example scripts, shared libraries, binaries, data, try hard to put it inside the biscuit. I have seen scripts that invoked another script in cleartext on the USB drive outside of the encrypted file. This allows a villain to simply rewrite the cleartext script to do whatever they want, like starting a root shell on the console, subverting the authentication and authorization provided by the encryption. Sometimes the collateral is simply too large to put inside the biscuit because of limits in available temporary storage, which on embedded systems is often RAM disk, into which the biscuit binary file is unpacked. If that's the case, put checksums or hashes (I like MD5 or SHA-1) of the collateral inside the biscuit and have the script verify the likely integrity of the collateral before using it.

Similarly, use some care when the biscuit script executes commands that are on the embedded host. This is of course typically done, for example running fundamental stuff like bash. But don't run scripts or binaries that can be altered by unprivileged users. Your security is only as good as that of your root password.

It is important that you build both the build server gpg and the host embedded system gpg from the same GNUPG source tree. You must not assume that different versions of GNUPG are interoperable. That may be the case, but if it isn't you'll end up creating biscuits on your build server that your embedded hosts can't use. If you do upgrade to a new version of GNUPG on your build server while you have deployed embedded hosts using the older version, be very thorough in your interoperability testing.

Once you build keys and have installed them on embedded hosts, check the keys into source code control or otherwise archive them. You will not get the same keys if you generate them again, even if you use exactly the same parameters. Again, you can render biscuits unusable if you are not careful. However, with some not too mad skills, you can recover lost keys from your deployed embedded systems themselves should a crisis occur.

Be paranoid about your keys, not just on the build server, but on your embedded hosts too. On the embedded hosts, which may be deployed into the field and hence presumably beyond your physical control, the keys and the directory that holds them should be accessible only by root. Unprivileged users should use sudo to run the biscuit command manually, if they can do so at all.

But with some care, biscuits are a means of safely automating a variety of system maintenance functions without giving away the keys to the kingdom or forcing end users or even application developers to become embedded systems programmers. For complex tasks like software update, they are a way of implementing complex logic and making it a part of the software distribution package, not part of a command in the embedded host itself.

Wednesday, July 27, 2011

Hidden Variables

Ever wonder what preprocessor symbols your GNU C and C++ compiler automatically includes in your program? I'm talking about symbols above and beyond those specified in the ANSI C and ISO C++ standards, like __FILE__, __LINE__, __func__, and __cplusplus. Having gotten bit by this once or twice, I've learned it pays to know these things. Try

gcc -dM -E - < /dev/null


g++ -dM -E - < /dev/null

which produce the same set of #define lines for me.

#define __DBL_MIN_EXP__ (-1021)
#define __FLT_MIN__ 1.17549435e-38F
#define __CHAR_BIT__ 8
#define __WCHAR_MAX__ 2147483647
#define __DBL_DENORM_MIN__ 4.9406564584124654e-324
#define __FLT_EVAL_METHOD__ 2
#define __unix__ 1
#define __DBL_MIN_10_EXP__ (-307)
#define __FINITE_MATH_ONLY__ 0
#define __GNUC_PATCHLEVEL__ 3
#define __DEC64_MAX_EXP__ 385
#define __SHRT_MAX__ 32767
#define __LDBL_MAX__ 1.18973149535723176502e+4932L
#define __UINTMAX_TYPE__ long long unsigned int
#define __linux 1
#define __DEC32_EPSILON__ 1E-6DF
#define __unix 1
#define __LDBL_MAX_EXP__ 16384
#define __linux__ 1
#define __SCHAR_MAX__ 127
#define __DBL_DIG__ 15
#define __SIZEOF_INT__ 4
#define __SIZEOF_POINTER__ 4
#define __STDC_HOSTED__ 1
#define __LDBL_HAS_INFINITY__ 1
#define __FLT_EPSILON__ 1.19209290e-7F
#define __LDBL_MIN__ 3.36210314311209350626e-4932L
#define __DEC32_MAX__ 9.999999E96DF
#define __SIZEOF_LONG__ 4
#define __DECIMAL_DIG__ 21
#define __gnu_linux__ 1
#define __LDBL_HAS_QUIET_NAN__ 1
#define __GNUC__ 4
#define __FLT_HAS_DENORM__ 1
#define __SIZEOF_LONG_DOUBLE__ 12
#define __BIGGEST_ALIGNMENT__ 16
#define __DBL_MAX__ 1.7976931348623157e+308
#define __DBL_HAS_INFINITY__ 1
#define __DEC32_MIN_EXP__ (-94)
#define __LDBL_HAS_DENORM__ 1
#define __DEC128_MAX__ 9.999999999999999999999999999999999E6144DL
#define __DEC32_MIN__ 1E-95DF
#define __DBL_MAX_EXP__ 1024
#define __DEC128_EPSILON__ 1E-33DL
#define __LONG_LONG_MAX__ 9223372036854775807LL
#define __SIZEOF_SIZE_T__ 4
#define __SIZEOF_WINT_T__ 4
#define __GXX_ABI_VERSION 1002
#define __FLT_MIN_EXP__ (-125)
#define __DBL_MIN__ 2.2250738585072014e-308
#define __DECIMAL_BID_FORMAT__ 1
#define __DEC128_MIN__ 1E-6143DL
#define __DBL_HAS_DENORM__ 1
#define __NO_INLINE__ 1
#define __i386 1
#define __FLT_MANT_DIG__ 24
#define __VERSION__ "4.4.3"
#define __DEC64_EPSILON__ 1E-15DD
#define __DEC128_MIN_EXP__ (-6142)
#define __i486__ 1
#define unix 1
#define __i386__ 1
#define __SIZE_TYPE__ unsigned int
#define __ELF__ 1
#define __FLT_RADIX__ 2
#define __LDBL_EPSILON__ 1.08420217248550443401e-19L
#define __SIZEOF_PTRDIFF_T__ 4
#define __DEC32_SUBNORMAL_MIN__ 0.000001E-95DF
#define __FLT_HAS_QUIET_NAN__ 1
#define __FLT_MAX_10_EXP__ 38
#define __LONG_MAX__ 2147483647L
#define __DEC128_SUBNORMAL_MIN__ 0.000000000000000000000000000000001E-6143DL
#define __FLT_HAS_INFINITY__ 1
#define __DEC64_MAX__ 9.999999999999999E384DD
#define __CHAR16_TYPE__ short unsigned int
#define __DEC64_MANT_DIG__ 16
#define __DEC32_MAX_EXP__ 97
#define linux 1
#define __LDBL_MANT_DIG__ 64
#define __DBL_HAS_QUIET_NAN__ 1
#define __WCHAR_TYPE__ int
#define __SIZEOF_FLOAT__ 4
#define __DEC64_MIN_EXP__ (-382)
#define __FLT_DIG__ 6
#define __INT_MAX__ 2147483647
#define __i486 1
#define __FLT_MAX_EXP__ 128
#define __DBL_MANT_DIG__ 53
#define __DEC64_MIN__ 1E-383DD
#define __WINT_TYPE__ unsigned int
#define __SIZEOF_SHORT__ 2
#define __LDBL_MIN_EXP__ (-16381)
#define __SSP__ 1
#define __LDBL_MAX_10_EXP__ 4932
#define __DBL_EPSILON__ 2.2204460492503131e-16
#define __SIZEOF_WCHAR_T__ 4
#define __DEC_EVAL_METHOD__ 2
#define __INTMAX_MAX__ 9223372036854775807LL
#define __FLT_DENORM_MIN__ 1.40129846e-45F
#define __CHAR32_TYPE__ unsigned int
#define __FLT_MAX__ 3.40282347e+38F
#define __SIZEOF_DOUBLE__ 8
#define __FLT_MIN_10_EXP__ (-37)
#define __INTMAX_TYPE__ long long int
#define i386 1
#define __DEC128_MAX_EXP__ 6145
#define __GNUC_MINOR__ 4
#define __DEC32_MANT_DIG__ 7
#define __DBL_MAX_10_EXP__ 308
#define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L
#define __STDC__ 1
#define __PTRDIFF_TYPE__ int
#define __DEC64_SUBNORMAL_MIN__ 0.000000000000001E-383DD
#define __DEC128_MANT_DIG__ 34
#define __LDBL_MIN_10_EXP__ (-4931)
#define __SIZEOF_LONG_LONG__ 8
#define __LDBL_DIG__ 18
#define __GNUC_GNU_INLINE__ 1