Tuesday, January 30, 2007

Is C no longer a subset of C++?

In his C++ column over on informit.com, Daney Kaley has written a couple of articles about how C is diverging from C++. As C compilers become fully compliant with C99, the latest C standard, and (more seriously) as C standard library header files and application code evolve to adopt its features, C++ will no longer be able to compile them.

"A Tour of C99" gives a brief synopsis of the new C99 features. "Incompatibilities between C99 and C++09" points to specific areas which are likely to cause grief. He also mentions that there is (understandably) some effort at reconciling these differences. Stroustrup has called for a merging of the two languages. Unless this is resolved, those who use C++ as a "better C" are going to have some heartache. Not to mention we may have two different sets of equivalent header files to keep track of. Hopefully the implications of this are enough to bring some sanity to the situation.

One of the C99 features that caught my eye particularly was the "restrict" keyword that is used in a function prototype parameter list to offer a guarantee to the compiler that two pointers in the list cannot point to the same object. If you've done any reading on how processor memory models have evolved to break commonly used idioms like the double-checked locking pattern for implementing singletons, you can see where this is going. It sounds like an optimization enabler, something along the lines semantically of a hypothetical "guaranteed non-volatile" keyword.

Kaley was on the C++ standards committee for several years, and has a lot of interesting and sometimes scary stuff to say on this topic and other C++ issues.

Monday, January 29, 2007

Generic Embedded Programming Using Templates

A couple of years ago a senior technologist for the CTO organization of the company I was working for at the time remarked that real-time applications could not be written in C++. He said this in the presence of myself and another developer who had spent the past decade in product development for this same company writing real-time embedded code in C++. She and I just looked at each other and shrugged.

Developing embedded systems in C++ is not new. Embedded pundits like Jack Ganssle and Dan Saks have been writing about it for years in venues like Embedded Systems Design. But some of the C++ features that sound like old news to developers working in the mainstream still sound exotic to many embedded developers, who by nature are a conservative lot. And rightly so. Embedded systems frequently find themselves in mission critical applications that are intended to run 24x7. And embedded requirements often impose severe real-time and memory constraints on their implementations. No matter what language you use and what features of it you choose to exploit, writing embedded code requires knowledge of how the language works under the hood and a discipline regarding what features to use and what ones to stay away from.

One of the features of C++ that still seems exotic to many embedded developers is its ability for generic programming. Generic programming is where the developer writes an algorithm in a data-type independent way, and then tells the compiler to generate an implementation for a specific data type. Generic programming allows you to debug an algorithm once then apply it to a much broader variety of circumstances. It reduces the amount of code that must be maintained, and since code maintenance is as much as two-thirds of the total software lifecycle cost, this can directly affect the bottom line of product development. Generic programming abstracts out implementation details which increases the reusability of code.

Most people associate generic programming in C++ with templates, and in this article so shall we. Templates directly support generic programming in exactly the way described above, by allowing you to specify an algorithm then apply the algorithm to different data types, leaving it up to the compiler to generate the appropriate code. As we shall see in later articles, there are other features of C++ (and even of C) that also support generic programming. But since templates are the first thing that folks think about when they think about generic programming, we’ll look at examples of using templates from the Desperado open source C++ library.

(Disclaimer: my goal here is not to teach you C++ or how to use templates. There are lots of books that can do that. I’m going to try to show you some examples of how templates, and generic programming in general, can be used to solve common problems in embedded applications. Rather than go into a lot of detail, I’m just trying to give you a flavor of how generic programming might be applied in an embedded context. Downloading and perusing the Desperado source code is a good idea if you really want to get into the nitty-gritty of the implementation using templates)

Example: Memory Testing

Michael Barr wrote an article in Embedded Systems Programming (now Embedded Systems Design) on how to do efficient testing of memory in software. Barr implemented a suite of three tests: a data bus test, an address bus test, and a memory device test. The code for these tests is dependent on the width of the address and data busses. Depending on the design of the embedded memory system, these widths (which may not be the same) may be 16, 32, or 64 bits.

Desperado implements Barr’s memory test suite as a template, a model algorithm coded without exactly defining the data type, which it stores in the header file <Ram.h>. The definition of the Ram template in <Ram.h> starts out looking something like


template <typename TYPE>
class Ram { ... }


where TYPE is a data type passed as a compile-time parameter to the template when the C++ compiler generates the resulting type-specific code.

An application may ask the C++ compiler to generate the code for a 32-bit wide memory test with the following code snippet, using the uint32_t hardware-dependent 32-bit integer data type defined in the standard POSIX header file <stdint.h>, and the size_t operating-system-dependent data type defined in the standard C header file <stddef.h>. (These system header files are included automatically by the header file <Ram.h>.)


#include <Ram.h>
...
Ram<uint32_t> test32;


The application then may run all three tests of the suite against the memory at location 0x80000000 that is 4KB in length;


uint32_t* address = (uint32_t*)0x80000000;
size_t size = 4096;
...
test32.address(address, size);
test32.data(address);
test32.device(address, size);


This assumes that the address and data buses are the same width. If they are not, the application can ask C++ to generate a second instance of the code with a different width and use both tests appropriately.


Ram<uint64_t> test64;
...
uint64_t* address = (uint64_t*)0x80000000;
...
test64.data(address);


Desperado makes this process even simpler by pre-instantiating code for 8, 16, 32 and 64-bit bus widths, which can be used by an application by just linking them in.


#include <uint32_Ram.h>
...
uint32_t* address = (uint32_t *)0x80000000;
size_t size = 4096;
...
uint32_Ram.address(address, size);
uint32_Ram.data(address);
uint32_Ram.device(address, size);


Example: Cyclic Redundancy Checks

Ross Williams wrote a great paper, A Painless Guide to CRC Error Detection Algorithms, which describes a cyclic redundancy check (CRC) algorithm which can be used to implement any of a wide variety of standard CRCs. CRCs are used to detect data corruption in just about every telecommunications and networking protocol, as well as in many data storage systems. The CRC algorithm he provides is specific to the data type of the resulting CRC value that it generates. Commonly used CRC algorithms produce either 16-bit or 32-bit values. Williams’ algorithm also takes four parameters, the CRC polynomial (which describes how the CRC is computed, encoded as a bit pattern the same width as the resulting CRC value), a boolean indicating whether or not the CRC is reflected (which changes how the value is computed), an initial value of the CRC before any data is passed through the algorithm, and a value with which the CRC partial result is exclusive-or’ed to get the final CRC value.

Desperado implements Williams’ CRC algorithm as a template, with the resulting CRC data type as its parameter, which it defines in the header file <Crc.h>. The resulting class generated by the C++ compiler has a constructor that takes the four parameters of Williams’ CRC algorithm. The definition of the Crc template in <Crc.h> starts out looking something like


template <typename TYPE>
class Crc { ... }


where TYPE is a data type passed as a compile-time parameter to the template when the C++ compiler generates the resulting type-specific code. The generic constructor has a prototype that looks something like


explicit Crc(TYPE poly, bool refl, TYPE init, TYPE xoro);


which declares the four parameters needed by Williams’ CRC algorithm. Note that the parameters poly (the polynomial), init (the initial value), and xoro (the xor bit mask) are all of type TYPE, which is not specified until the template is instantiated by the application. The code for the Crc algorithm is written without knowing exactly what that type is.

The application can ask the C++ compiler to generate the industry standard CRC32 algorithm like so


#include <Crc.h>
...
Crc<uint32_t> crc32(0x04c11db7, true, 0xffffffff, 0xffffffff);


which also passes the four parameters of the CRC32 algorithm to the constructor when it instantiates an object of type Crc<uint32_t>. (Note that the value 0x04c11db7 is a common bit-encoding used to represent the polynomial used to compute the standard CRC32 value.) The application then can use this CRC32 algorithm to compute a cyclic redundancy check across a buffer of data, first initializing the crc32 object. (The Desperado Crc template defines operators for the resulting class which allow an object of that type to be used as a functor, that is, an object which can be called like a function. Hence crc32 is not a function, but rather an object, and the following code is calling methods defined for that object. Williams’ CRC algorithm also requires a state variable, which Desperado chooses not to keep inside the Crc object so that the same Crc object may be used in reentrant code.)


uint32_t state;
...
crc32(state);
uint32_t crc = crc32(state, buffer, size);


To make the power of templates for embedded systems a little more obvious, here are the complete definitions for the industry standard CRC16, CRC32 and ATM AAL5 cyclic redundancy check implementations using the Desperado Crc template.


#include <Crc.h>
...
Crc<uint16_t> crc16(0x8005, true, 0, 0);
Crc<uint32_t> crc32(0x04c11db7, true, 0xffffffff, 0xffffffff);
Crc<int32_t> aal5crc(0x04c11db7, false, 0xffffffff, 0xffffffff);


As before, Desperado pre-instantiates the commonly used CRC16 and CRC32 algorithms so that they can be used by an application by just linking them in.


#include <crc32_Crc.h>
...
uint32_t state;
crc32_Crc(state);
uint32_t crc = crc32_Crc(state, buffer, size);


Example: Variable-sized Objects on the Stack

Most people think of the compile-time template parameters as always being data types, but they can also be primitive types, like integers. This can be used to get around a limitation of many embedded systems: the desire not to allocate and free memory from the heap except at boot time.

Memory allocation can be really problematic for systems with real-time constraints. The time required for heap management is non-deterministic, dependent on the current state of the heap and what has been allocated and freed in the past. This makes it difficult or impossible to predict how long it will take to allocate and free memory in a time-constrained section of code.

Embedded systems very commonly address this issue by pre-allocating all objects at boot time and placing them in pools, which are often implemented as linked lists. When you need a buffer, you do not malloc or new memory from the heap, you merely unlink a buffer from the buffer pool. When you are done with the buffer, you do not free or delete it, you link it back onto the same buffer pool. If you need a buffer and the pool is empty, you choices are [1] break down and allocate a new buffer from the heap, making your pool a little larger when you return it, [2] try again later, or [3] signal an alarm, which might be as simple as lighting a little red LED on the front panel. Choice #3 is more common than you might think. Running out of a resource in an embedded application often indicates a serious failure in traffic engineering of the system in which it is embedded.

Unlike Java (prior to version 6 anyway), C++ also gives you the option of allocating an object temporarily on the stack. Stack variables exist as long as they are in scope, which means within the function or execution block in which they are defined. Often this is perfect for allocating an object whose lifespan is only as long as the current function. (Under no circumstances however should you allocate an object on the stack, pass a pointer to it to another functional unit, and then exit the function. The other functional unit holding the pointer now has the address of a section of the stack which has been de-allocated and which will eventually be reused for some other purpose. Wackiness will ensue.) This is done as simply as


#include <Transliterator.h>
...
{
uint8_t stack[16];
Transliterator transliterator(stack, 16);
...
}


where the object transliterator of type Transliterator will be deleted as soon as the thread of control passes the closing curley brace. In the example above, Transliterator is a simple general-purpose parser implemented in Desperado as a table-driven push-down automata (the original meaning of PDA) and defined in the header file <Transliterator.h>. You can imagine an embedded application defining a simple language for a command line interpreter available via a console port for purposes of field troubleshooting. The application defines the language in the form of a table, which the Transliterator interprets to parse the language.

PDAs require a stack in which to keep track of their current context when the language that they are parsing contains nested elements, for example nested parenthesis for a parser that interprets arithmetic expressions. This is what the array is used for in the example above. How big a context stack does a Desperado Transliterator need? It depends entirely on the language it is parsing, which is defined not by the Transliterator itself, but by the application that is using it.

If allocating objects on the heap were not a problem during run-time, the size of the necessary context stack could just be passed as a parameter to the Transliterator constructor and it would just allocate an array of that size to use during parsing. But since we want to avoid allocating and freeing memory after boot time, we instead allocate a context stack on the C++ stack, as shown above.

Desperado streamlines this process by defining a second class, TransliteratorType, defined in the header file <TransliteratorType.h>, that extends Transliterator by taking the desired size of the context stack as a template parameter. The definition of TransliteratorType in <TransliteratorType.h> looks something like this


template <size_t CONTEXTS>
class TransliteratorType : public Transliterator { ... }


where CONTEXTS is the size of the context stack. As part of the definition of the TransliteratorType class there is a field


uint8_t array[CONTEXTS];


that defines and allocates the context stack as part of the object.

When the application creates a TransliteratorType object by doing the following


#include <TransliteratorType.h>
...
{
TransliteratorType<16> transliterator;
...
}


then the TransliteratorType object, the Transliterator object that is its base class object, and the context stack in a size 16, are all allocated on the C++ stack as part of the TransliteratorType object itself, to be automatically freed when the thread of control exists the code block.

I admit that this example is not a great one, since the context stack can be just as easily allocated on the C++ stack explicitly without resorting to the use of a template. But this technique can also be used in a similar fashion to define variable length objects to be allocated on the stack and later deallocated for use within a code block, without resorting to new and delete. And it does serve as an example of how to bundle such a variable length allocation into the class definition.

What’s the fine print?

There can be some significant compile-time overhead for using templates, particularly when using very complex template definitions like those found in the Standard Template Library (STL), which is now part of the C++ standard library. However, if you are writing your own templates like the ones I described above that are part of Desperado, you are unlikely to notice much difference in compile time. Even when using the STL (which I recommend), compile time performance differences are typically negligible. (I have had the 3.3.2 version of the GNU C++ compiler go into what appeared to be an infinite loop when using the template-related –frepo compiler option, so your mileage may vary.)

There is no more run-time execution overhead to using templates than if your application used the equivalent non-template C++ code using virtual methods, where all method calls are mapped through a virtual table or v-table. I used to worry about this, but I got over it, since it only adds a very few machine instruction to every method call. Still, if your real-time constraints are that tight, you should worry not just about templates, but about any C++ virtual function. If you are not using virtual functions, your C++ code will be at least as efficient as the equivalent C code, if not better, thanks to the many improved optimization capabilities of C++.

Templates can cause code bloat, and this must be managed carefully in an embedded application. Instantiating a template at compile-time causes the C++ compiler to generate the code to implement the class given the template parameters. This can cause multiple instantiations of the same code to be generated by the compiler. In the best case this may be detected at link time, but in the worst case the resulting executable may be a lot larger than it needs to be.

Section 6.5, “Where’s the Template?” of the GCC manual describes your options for controlling when and where the G++ compiler generates codes for templates. Desperado’s makefile uses the G++ command line parameter


–fno-implicit-templates


which forces the application to explicitly define when and where code for templates is generated. For any template instantiations that Desperado uses itself or provides by default, those instantiations are generated once and only once and are stored as object files in the Desperado library, where they may be incorporated into an application at link time. For example, this is done in Desperado as simply as having a translation unit having just these two lines


#include <Crc.h>
template class Crc<uint32_t>;


which causes the compiler to generate and compile the code to implement a Crc object for an unsigned 32-bit value. Any other translation unit, in Desperado or an application using it, that uses the Crc template in a manner like


Crc<uint32_t> crc32(0x04c11db7, true, 0xffffffff, 0xffffffff);


would just link in this pre-defined and pre-compiled code. I recommend this explicit approach for using templates, although the Section 6.5 offers other alternatives.

Templates are just one way for writing generic code. In future articles I will describe other techniques, some of which you may be already using without realizing it.

Update 2009-11-22

Nearly three years later, here I am doing an application with the Standard Template Library and still trying to use the -fno-implicit-templates compiler option. While I could probably eventually get it to compile by explicitly instantiating the requisite STL templated classes and functions, I have convinced myself that it's not worth it for my current project. The option might make sense if you (like me in 2007) have complete control over your template classes. But using the STL, or other complex templated frameworks, makes this fine control over template instantiation much harder. I say: drop the -fno-implicit-templates compiler option if the resulting compiled code fits your memory footprint.

Update 2012-04-30

But if you are really resource constrained, you probably aren't using the STL anyway, and the -fno-implicit-templates option is still a very good idea. For example, when using templates on tiny eight-bit microcontrollers, memory is so precious that I think explicitly instantiating templates is a very good idea. That what I've done for my Amigo project, yet another three years later.

Sources

Michael Barr, “Software-Based Memory Testing”, Embedded Systems Programming, July 2000, pp. 28-40

Richard M. Stallman et al., Using the GNU Compiler Collection (GCC), 4.1.1, 2005

Ross Williams, A Painless Guide to CRC Error Detection Algorithms, Rocksoft Pty Ltd., Adelaide, Australia, 1993

Desperado, Digital Aggregates Corp., 2007

Amigo, Digital Aggregates Corp., 2012

Saturday, January 27, 2007

Pair Programming is Bad and You Should Never Do It

In a review of the book Dreaming In Code, Joel Spolsky recently remarked in his blog:

"I can’t tell you how many times I’ve been in a meeting with even one or two other programmers, trying to figure out how something should work, and we’re just not getting anywhere. So I go off in my office and take out a piece of paper and figure it out. The very act of interacting with a second person was keeping me from concentrating enough to design the dang feature."

Finally someone has hit the nail on the head about what has been bothering me about pair programming. Here is what I think about it:

If a problem is so trivial that two people sitting in front of a computer chattering away at each other can solve it, then it should be the part of the project that is outsourced, preferably offshore where the cost of doing trivial work is lower.

For sure I have done pair debugging, sometimes even trio debugging when it's an especially thorny problem involving multiple functional units, race conditions, and hardware. And, yes, I have done pair programming when trying to do maintenance on some code in a language I don't know (typically some form of assembly), on a processor I've never used before (typically a DSP), where the original developers left the company years ago, and a vice president is calling every half hour to helpfully ask how that escalated field problem is coming along. And sometimes getting a bunch of folks in the same room in front of a whiteboard is a very good thing indeed, particularly in those cases where the Engineering staff is pretty much convinced that what Marketing has promised is pretty much impossible (although to be fair, it never was).

So I understand there are circumstances where working closely in pairs or even in larger groups is not only a good idea, it is downright mandatory (and sometimes it can even be fun). But the way pair programming is usually described sounds more distracting than anything else. When having to interact with another person while trying to figure out something difficult, my interior monologue runs to "STFU!"

Having said that, I am very much in favor of 100% coverage by code inspections. The thought that I might be the only engineer to see my code terrifies me, and I consider myself a pretty good developer, with a long track record of successful product development. But I'm far from perfect. I would not have thought this to be controversial, but apparently it is, if my own recent personal experience is any indication. But I have never been in a code inspection in which I didn't learn something useful. Sometimes it's "Hey, that's a great idea, I'm going to steal that!" Or "Oh boy, I'm going to make sure I never make this mistake." Or maybe "Man, what was I thinking?"

What I am not sure of is when to do the code inspection. In his seminar "How to develop better firmware faster", Jack Ganssle cites studies that indicate the sooner the better. Like, right after you have written the code and before you've even compiled it. This is an economic argument: inspections are a more effective use of engineers' time than debugging, presumably because an inspection that eliminates a bug on average takes less total engineer-time than debugging to remove the bug.

This was likely true back when we had long modify-compile-test cycles, particularly in the dark ages when such cycles involved keypunches and computer operators. But with modern tool chains and disciplines like test driven development (TDD), I'm not convinced. When I'm inspecting code, I like to be able to make basic assumptions, like [1] it compiles, and [2] at least the constructors and getters work. Otherwise I'm too paranoid and spend way too much bandwidth looking at the nesting of parentheses instead of concentrating on the overall design intent.

Since I mentioned TDD, I'd better plug that too. Like a lot of old guys, I stumbled into TDD myself before I ever heard the term "unit test", just because I have the virtue of all good developers: I'm lazy. I stumbled into structured programming, and later into object oriented design, the same way. Not the easy way, of course, by learning about it from a class or a book. But by doing software development the hard way and finding I was spending so much time at work that I wasn't keeping up with Battlestar Galactica. (That would be the original BSG.) The good part about this was when I was finally introduced to these concepts, there was zero resistance on my part. It was more like I took serious offense that no one had explained this to me earlier, saving me a lot of time and effort.

So if there's anything you know that will help me keep up with the new BSG, why don't you just let me know now, instead of forcing me to painfully learn it myself? I appreciate it.

Thursday, January 25, 2007

If Java is the new COBOL, is C++ the new assembly?

The phrase “Java is the new COBOL” is so widely used now that I can’t pin down its origins. When you do a web search on it, it is interesting to see the different interpretations.

Clearly some folks mean “Java is obsolete, just like COBOL”, and maybe in some problem domains it is. Domain-specific languages like Ruby on Rails are making in-roads in a lot of roles that were once nearly exclusively Java-based, like web commerce.

But many seem to mean “Java is very broadly deployed and very stable, and is widely used and trusted in enterprise applications, just like COBOL”. Of course, the subtext here is that Java is no longer cool because it’s mainstream.

Being compared to COBOL doesn’t really seem that bad of a thing. I am a little embarrassed to admit that very early in my career I was paid to write a little COBOL. I even got to meet Grace Murray Hopper. But I hadn’t really given COBOL much serious thought for years until in the span of a few days I read some pretty startling (and somewhat contradictory) statistics about COBOL.

In his column “Bar Bets” in a recent edition of Dr. Dobb’s Journal, Jonathan Erickson quotes that there are “250 billion lines of COBOL source code in use”, and “1.5 billion new lines of COBOL are written every year”.

In “How to Interpret COBOL Statistics” in Bob LewisAdvice Line column, a reader quotes “various Gartner reports”: “Five billion lines of new COBOL are being written every year” and “Thirty-four percent of all coding activities are in COBOL”.

No matter which set of statistics are correct, this is not bad for a forty-year-old programming language. If Java reached the level of acceptance and use within the enterprise that COBOL has obviously achieved, I would count that as an incredible success. Considering that most information technologies have a half-life of about five years, technologies like FORTRAN, COBOL, C, and TCP/IP which have done substantially better than that have to be admired.

Bob Lewis’ reader also cites “There are 310 billion lines of legacy code operating in the world (65% of all software)”. This jives with my visualization of the total amount of source code in the world as an expanding sphere. The labor spent maintaining legacy code is represented by the volume of the interior of the sphere, whereas the labor spent writing new code is represented by the surface of the sphere. They are both expanding, but while the surface of the sphere expands proportional to the square of the diameter, the volume expands proportional to the cube. Once it ships or goes production, all code becomes legacy code. If you drilled a core down through the sphere, you would find some Ruby scattered around on the surface, a thick crust of Java, a churning molten layer of C++ and C, and a solid core of FORTRAN, COBOL and assembly.

So there will always be a lot of legacy code, and in greater amounts than there is freshly minted code. As long as it is in use, all of that code has to be maintained. If you asked the COBOL developers of the 1960s what the life span of their creations were likely to be, could they possibly have predicted that their work would still have been running in the twenty-first century? Yet Y2K showed us that this is indeed the case. I suspect Java developers will have a lot of work ahead of them maintaining the ever growing legacy Java base.

One thing I find particularly interesting about the success Java has had in the enterprise is the relative lack of success it has had in the embedded arena, at least in the United States. Someone remarked recently that in Europe, Java is considered an embedded programming language, while in the United States it is seen as an enterprise programming language. Witness the fact the new Apple iPhone does not include a Java Virtual Machine (JVM), and apparently, if Steve Jobs has any say (and one must assume he does), it never will. Yet dozens of other mobile phones and wireless devices have a JVM nestled inside to run Java applications. This is a little ironic since Java was first conceived at Sun Microsystems as an embedded language for applications like set top boxes, where it is in fact used. I suspect most of Java's success came from its application in the enterprise domain, the embedded domain as usual getting no respect.

I have a long background in developing real-time applications. This has understandably manifested in my working on a lot of embedded products, which tend to be heavy on concurrency, synchronization, and message passing, stuff I have seared in my brain if not tatooed on my body. I have watched, and participated, as embedded development moved from assembly language to C and then to C++. Way back in 1999, I worked on a project that developed an embedded product in Java, with the low level device code in C. I thought that Java was a great embedded language back then.

I still think so today, tens of thousands of lines of both C++ and Java later. It has all the stuff you want in an embedded language, mostly stuff that C++ lacks, like built-in concurrency and synchronization, not to mention a simpler syntax than C++. Yes, it lacks some of the features you need to easily do C-ish things like access memory mapped registers as variables. The JVM and its bytecode can be troublesome (although I believe they solve far more problems than they cause). The way memory deallocation is hidden from view means that garbage collection may have to be managed. And the fact that you can’t swing a cat in Java without allocating and deallocating a bunch of objects, if not in your code then in the dozens of classes that constitute the Java utility classes, or in the hundreds of frameworks that have served to complicate Java development, causes some headaches.

But this shouldn’t disqualify Java as an embedded language, any more than any of the quirks in C++ disqualified it. It took discipline and a thorough understanding of how C++ works to use it effectively in embedded applications, but doing so yielded economic benefits like greater productivity, simplified maintenance, and ultimately reduced time-to-market. I was lucky enough to have mentors like Tamarra Noirot, Randy Billinger, and Doug Gibbons who carefully tutored me in the design patterns I needed in my transition from C to C++ for embedded development. Yet even as effective as C++ is for embedded applications, there are still always portions of the application that have to be written in C, or even assembly, to accommodate the hardware and the tougher real-time portions of the system, just as there were in that embedded Java project back in 1999.

It will take a similar discipline and understanding of Java to use it for embedded development, but I believe that that must happen. I recently spent more than a year working on yet another Java-based project, this one a non-embedded product to expose feature-rich multi-modal communication capabilities to business process automation via web services. I was reminded how much more productive I am in Java than in C++ or C. It's not just that I'm more productive writing code in Java (although I am), but it is the amazing tool chain that supports Java. Tools like Eclipse (an IDE which works for C++ too but is much better with Java), Ant (build), JUnit (unit testing), Cobertura (code coverage), and JProfiler (memory usage). Not to mention the rich set of utility classes and frameworks available for Java. Programmer productivity is one of the keys to time-to-market, and time-to-market is one of the keys to success in today’s competitive environment. True, maybe you can’t write 100% of your embedded application in Java. But writing 100% of it in C++ will have similar economics as twenty-five years ago when I remember embedded developers arguing against writing in C instead of assembly, or ten years ago arguing against C++ instead of C.

So C++ is the new assembly. And that’s not a bad thing. It will always have its niche in embedded development. But for purely economic reasons, in the coming years you must minimize the amount of embedded code you write in C++, just as you must minimize the amount you write in C, or in assembly. You will do it because you cannot be competitive otherwise.

When I was discussing this with my friend and occasional colleague Demian Neidetcher, he asked “If Java is the new COBOL, and C++ is the new assembly, what does that make Visual Basic .NET?”

You already know the answer, Demian. It’s the new Basic.

Sources

Jonathan Erickson, “Bar Bets”, Dr. Dobb’s Journal, #392, January 2007

Bob Lewis, “How to Interpret COBOL Statistics”, Advice Line, January 17, 2007

Bill Venners, "The Art of Computer Programming: Conversation with James Gosling", Artima Developer, March 25, 2002

Wednesday, January 24, 2007

Job Lock and Universal Health Care

There has been an interesting discussion about health care in a discussion group on Joel on Software, motivated by a piece on NPR. This coincides with an article in the January 29th issue of BusinessWeek, "Held Hostage By Health Care". The gist of the BusinessWeek article is that many employees are held captive by their health insurance, the so-called "job lock". This results in two things: from a personal perspective, the employee is not doing what they really want to do, and from an economic perspective, the employee is not being utilized most effectively.

I am a 50-year-old self-employed software engineer who unashamedly sponges off his wife's benefits. When I quit my day job which provided full benefits, you can bet I would not have done so had I not had her benefits as a safety net. The risk for me was small: no kids, house almost paid off, no other large debt, large sums already in our 401(k)s, stockpile of firearms in the gun safe in the basement for the coming end-times. Given our personal situation and the fact that she's a physician with a large not-for-profit HMO, you can imagine that national health care issues have been a big topic of discussion during our 23-years of marriage.

Something will have to be done about this as the baby boomers start retiring in bulk. There will be a kind of domino effect as there aren't enough young people domestically to replace all those folks in their jobs, and so those positions will have to automated, off-shored, or just eliminated, and employers can no longer afford to continue to subsidize the growing health care burden of their retirees. The effect of this is that offshore employees will be paying for the health care of an America that is largely populated by retirees. This will not go over well. If the baby boomers become a majority voting block, they will simply keep voting elected officials out of office until someone solves the problem with some kind of socialized medicine. This is going to have to be done even though a big percentage of tax paying citizens are themselves retired, and a huge portion of the national wealth (and, hence, the ownership of publicly traded companies) is tied up in their retirement funds. The math alone makes by brain hurt.

This will also require a lot of rethinking, and some hard talk, about health care on each of our parts, given some of the stories my spousal unit tells me. No, you can't have a CAT scan just because you think you need one. Socialized medicine will pay for your kidney transplant three years from now, if you're still alive. Your alternative is to pay for it yourself. You're still drinking after your first liver transplant? Sorry, no third liver for you.

I'm amazed big corporations aren't all over this like white on rice. (The BusinessWeek article suggests some are.). Is employee mobility that scary? Have they mistreated their employees so badly that they think they need the health care lasso to keep them on the job? Is the cost of employee turnover in the short run so much scarier to management than the cost of retiree health care in the long run? Are we really talking about a fundamental change in the balance of power between employers versus employees? Surely getting, and keeping, the right employee in the right job is the right thing for both the employer and the employee. Otherwise we're making it sound like some kind of indentured servitude.

There is some evidence to suggest that high-tech will be hit worse than other job sectors by this effect. Another article from BusinessWeek, October 16, 2006, "Still Working and Loving It", cites the following statistics: "From 1980 to 2000 the total number of science and engineering degrees earned grew at an annual rate of 1.4%, far below the 4.2% growth of science and engineering jobs" and "nearly 30% of all science and engineering degree holders currently in the labor force are age 50 and older". Of course, these are just measures of science and engineering degree holders; some of the best developers I've ever worked with had degrees in music, some from very prestigious conservatories. Still, it does give one pause to think about where the intellectual property of the future is going to be developed, and by who.

Taking a purely global view, maybe it is okay if globalization moves all manufacturing (which in an information economy is the development of intellectual property) offshore, but only if the offshore work force also willingly participates in supporting the global collection of retirees. Isn't this fair? Otherwise you have a disconnect between business responsibility and social responsibility. If you're going to take the jobs, you have to take the responsibility that goes with them.

This suggests a shifting of the burden of the funding of socialized medicine from personal income taxes to corporate income taxes. I think it's going to get ugly.

Tuesday, January 23, 2007

Lies, Damn Lies, and Net Present Value

When you make a proposal for an investment in the development of something new, you are competing against all of the other potential ways that your funding source might invest their money. This is true whether you are a start-up pitching a new business plan to a venture capital firm, a market manager trying to sell new product or feature development to your upper management, or even a researcher making a proposal for a new research project to a funding agency.

Some of the competitors are obvious: other start-ups, other new products or features under consideration by your company, and other research proposals. But what isn't so obvious to most technologists, but is burned into the mind of every M.B.A., is that you are also competing against the stock market. You have to convince your funding source (or, as I like to think of them, money guys) that they would be better off giving you the money than simply investing it in the stocks or other financial instruments.

The bad news is that if the stock market is doing pretty well, or if the perceived risk of investment in stocks is a lot lower than the risk of investing in your project, you will be fighting an uphill battle. The good news is that there is a way to calculate how you stand compared to your money guys just calling their broker. This calculation is called net present value (NPV). It accounts for the time value of money by calculating an equivalent present value of an investment whose return will be made over a span of of time. It is a way of comparing apples to apples when evaluating several investment options.

Since most technologists are hard-wired for math, this is quite likely going to make a lot of sense. While I am most definitely a technologist (I just took business classes as electives as an undergraduate for fun), I found that understanding just a little bit about NPV gave me a lot of insight into how and why new product decisions are made the way they are, and why certain projections seemed to be unrealistically inflated.

NPV has just three variables. It is the choice of the values for these variables around which all discussion with your funding source will revolve. It is pointless to argue about the use of NPV, or how NPV is calculated, because the money guys will know that this is a standard business tool, and they will likely have been using it for years. But battles will be won or lost by the choices made, and the justifications offered, for these variables. The reliability of the value that NPV computes will be no better than the reliability of the values chosen for these variables.

Duration: this is the total life span of your project and the product that it creates, from the first year you start spending money to develop it until when you finally decide to discontinue the product. Don't kid yourself: your money guys know that all products have a life span. High-tech products tend to have shorter life spans than most others, more like fresh produce than building materials. The duration is in units of years (although I can imagine some high-tech projects in units of months). Let's call this this variable N.

Annual Net Income: this is the amount of income that would be realized by your product year by year over its entire duration after all costs are subtracted out. It is likely not be the same for every year. For example, if it takes two years for your product to hit the market, for the first two years this number will be negative, representing the initial development costs with no income. In subsequent years in which this number is positive, it will represent the income generated by your product minus any costs associated with manufacturing, support or maintenance. And even if your product is successful, its annual net income will typically rise until it saturates the market and than fall. If all of your annual net income numbers are zero or negative, you are already screwed. We'll call this variable A[i] where i is the elapsed years that runs from 0 to N-1.

Cost of Capital: this is effectively an interest rate that captures what interest could be made year by year by investing the funding you are asking for in a financial instrument with a similar risk profile as your proposal and over the same duration. Note that this captures not just how well alternative investments might do, but how their risk compares to yours. You could argue that this may change from year to year, although in all the NPV calculations I have ever seen it is assumed to be constant. Unless you have the amazing secret to wealth, it will be a lot less than one, like 0.1. We'll call this variable C.

Here is the formula for NPV. Note the the caret "^" is a "raised to the power of" operator, as in 3^2 equals 9. (I've seen slightly different formulations of NPV from different sources, so your mileage may vary a little, but they are all equivalent.)

NPV = sum (i = 0 to N-1) ( A[i] / ((1 + C) ^ i) )

Let's look at an example.

You have a great idea for a new product, the flux capacitor. You think it will cost one million dollars spread across two years to develop and productize it.

Since you have some equipment you have to buy and some licenses you have to purchase, you have some up front costs the first year that you don't expect to have in the second year, so the expenditures for those two years of development are a little front loaded.

You expect the life span of the flux capacitor to be eight years before it becomes obsolete. That gives you an N of ten years: two years of development plus eight years of sales. You expect to hit your sales peak around the fourth year after introduction.

Here is your projected net income, or A[i], year by year for the total duration of the flux capacitor product, based on how many units you think you can sell, at what price, and what the unit cost of each flux capacitor is likely to be.

A[0] = -$700,000
A[1] = -$300,000
A[2] = $50,000
A[3] = $100,000
A[4] = $250,000
A[5] = $500,000
A[6] = $250,000
A[7] = $100,000
A[8] = $70,000
A[9] = $20,000

So far this doesn't look too bad. For an outlay of $1,000,000 over two years, you make back a total of $1,340,000. That's a net income of $340,000 or 34% of your initial investment.

You still need a value to use for C, the cost of capital. Here's the thing: you don't provide that number. Your money guys are going to have someone with a degree in accounting or business administration who is going to provide that number. They are going to base it on lots of factors, like the Dow Jones Industrial Average, the return on investment (ROI) of their existing investments, maybe the ROI of one or more stock index funds, or what they read in the Wall Street Journal this morning. The WSJ is actually the most likely source: it regularly publishes numbers just for this purpose. No way are you going to have more credibility than the WSJ. You have no control over the value used for C. Let's say your money guys use a cost of capital of ten percent or 0.1.

Here is our calculation for NPV. (Note that I'm just showing two decimal places,but my calculator keeps more.)

NPV = -700000/(1.1^0) - 300000/(1.1^1) + 50000/(1.1^2) + 100000/(1.1^3) + 250000/(1.1^4) + 500000/(1.1^5) + 250000/(1.1^6) + 100000/(1.1^7) + 70000/(1.1^8) + 20000/(1.1^9)

NPV = -700000/1.00 - 300000/1.10 + 50000/1.21 + 100000/1.33 + 250000/1.46 + 500000/1.61 + 250000/1.77 + 100000/1.95 + 70000/2.14 + 20000/2.36

NPV = -700000 - 272727 + 41322 + 75131 + 170753 + 310460 + 141118 + 51316 + 32656 + 8482

NPV = -141489

Yeah, you are interpreting that number correctly. Even though you think your project will make $340,000 over its ten year run, your money guys would have been far better off just investing the money in the stock market. If everything goes your way, they stand to lose $141,489 by betting $1,000,000 on you. You are in a battle against compound interest, and that is a hard battle to win. Just be glad your 401(k) is subject to the same math.

If you have any brains at all, you will have done this calculation long before you ever appear in front of the money guys, and you already know you are in trouble. So what can you do? You only have a small number of variables that you can affect. See if any of these strategies sound familiar based on your own experience in product development.

You can ask for a smaller initial investment. We can hire just half as many developers, and have them work twice as many hours. We can buy half as many workstations and tell our developers: it's pair programming! We can ask for less money up front in the hopes that once the money guys commit, we can get more funding later.

You can get the project done in less time so that you start making money sooner and for a longer period of time. We can hire twice as many developers to get it done in half the time. We can skip those time-consuming requirements and design phases. We can claim that we can get it done in half the time in the hopes that once the money guys commit, we can get more funding later.

You can extend the sales life span of the flux capacitor. We're pretty sure that even after the oil economy collapses, there will still be a market for flux capacitors. It's unlikely that our flux capacitor design will become obsolete in less than fifteen years; after all, we're designing for expandability!

You can reduce the on-going cost associated with each flux capacitor you sell. This on-going cost is known as the cost of sales (COS) or sometimes cost of good sold (COGS). We won't need technical support. We can offshore technical support. We don't need a marketing or sales department, these babies will sell themselves! We can use cheaper materials; after all, if it breaks, then maybe we'll sell another one.

You can increase the sales price of each flux capacitor. Of course, this is tricky. There's this little issue of reducing sales by increasing the price.

You can decrease the sales price of each flux capacitor. Everyone loves a sale! We can offer a rebate. How do we do it? Volume!

You can increase the projected annual sales. Once the oil economy collapses, we're positive that everyone is going to need two things: a 9mm Glock and a flux capacitor! We'll be lucky if we can keep up with demand.

And probably some other strategies I haven't thought of, or haven't experienced first hand.

I can see the light bulb going on over your head. This explains a lot. Once I started really thinking about NPV, I began to wonder how any new products were ever developed at all. Of course, if no new products were ever developed, there would be a lot fewer stocks competing for investment dollars, because all the manufacturing companies would eventually go out of business. So there is a kind of dynamic balance going on that impacts the entire global economy. A Nobel prize or two has probably been awarded over the analysis of this very thing.

I've heard engineers claim that an analysis like NPV is short sighted. That's easy to say when it's not their money. They might feel differently if they realized that they might be talking about the money in their own 401(k)s. At least for publicly traded companies, I also think it's important to remember that the money guys have a fiduciary duty to their shareholders, a legal responsibility to make the best use of the shareholders money. The shareholders are literally the owners of the company. They would much rather get that money themselves than have the officers with whom they entrusted their company fritter it away making flux capacitors. Failure to live up to fiduciary duty can result in, best case, shareholder lawsuits, or, worst case, in FBI and SEC investigators pulling up in black SUVs. Somewhere in the middle is where the officers of the company are fired.

As a young technologist, I found it hard to understand the business process. It was a lot easier, and more fun, to read about that new fangled language, C. But as I got more and more experience, and probably as I got more and more money in my 401(k), I began to appreciate the kinds of tough decisions managers of companies have to make on a daily basis about what to make, how to make it, and why.

Sources

Daniel P. Golman, et al., "Calculating Net Present Value", Business, Perseus Publishing, 2002

Barry Karafin, "Business Management for Technologists", (course slides), September 2005

Randy C. Perry, David W. Bacon, "The Business Case for Design for Six Sigma", (e-book), Prentice-Hall, 2007

Separate But Equal II

Long time friend and colleague David Hemmendinger made the following remarks via e-mail about my article "Separate But Equal", but was perhaps too shy (that's hard to believe, David) to post them as comments:

"I think of IBM's channel architecture as another example of the control/data separation, similar to what you describe in the MSS. As to other examples -- the idea is fundamental to hardware design; standard digital design rules keep control logic and datapaths separate."

Dr. Hemmendinger might be more insightful than he knows here. The NCAR MSS that I described did in fact, at least at the time, make use of IBM I/O channels; the high-end storage devices it managed were controlled by an IBM mainframe. But the mainframe accessed those devices primarly for purposes of data migration and management of meta-data. The NCAR MSS switched I/O architecture (which it was using at least a decade before switched I/O became wide-spread) allowed its supercomputer clients to access the I/O devices directly.

Apart from the NCAR MSS, IBM's I/O channel architecture has been around at least since the late 1960s, and so may be one of the earliest examples of this pattern. I remember writing simple channel programs for tape drives in the mid-1970s during my IBM 360/370 assembler period. IBM's I/O Subsystem (IOS) was another first for me as well: it was the earliest example I can recall of using object oriented design and implementation for a real-time application, although the terms "object oriented" and "real-time" didn't really exist way back then. The example of the IOS has stuck with me for decades.

His second comment is spot on too: the separation of control and data is a standard pattern in digital hardware design.

Thanks, David!

Sources

Chip Overclock, "Separate But Equal", January 2007

Monday, January 22, 2007

Separate But Equal

The separation of control messaging and data streams is an old pattern that keeps popping up, to be rediscovered and declared a new invention in new problem domains.

The first time I ran across it was when I was writing device drivers for real-time applications on PDP-11s (and later, LSI-11s) starting back in 1976. I had written drivers for things like A/D converters, parallel and serial I/O boards, real-time clocks, and punch-card readers. Those devices all followed a common pattern: you had a memory-mapped status register and a memory-mapped data register, and it was the responsibility of the driver to handle the movement of data to and from the device and memory.

It wasn't until I started writing drivers for direct-access storage devices like hard disks and floppy drives that the young Mr. Overclock first encountered direct memory access (DMA). DMA was great: you just programmed in some parameters in some status registers, and the hardware itself took care of moving data to and from memory. The only interrupt you received was when the entire I/O transfer was complete.

I know y'all laughed when I said I wrote device drivers for cards readers. But see, I've written a lot of device drivers for a lot of platforms since then, and I can tell you this: those card readers are still the most challenging devices I've ever worked with, before or since. You set the read bit in the status register, the blowers and motors on the reader spun up, and a punch card flew through the read head. You got a fast stream of interrupts, one for every one of eighty columns, and maybe or maybe not another interrupt for end-of-card, depending on the timing. You either serviced every one of those interrupts, or you lost the data. There was no backing up and trying again. If you're goal was, as mine was, to run the card reader at its full rated speed, this took some clever programming, not to mention some careful planning of interrupt priorities for the other devices in the system. (It wasn't until years later that I wrote drivers to handle analog trunks and stations, in Java of all things, for a telephony product, that I encountered devices with these kind of no-compromise hard-real-time constraints.)

So I was in a good position to really appreciate the whole idea of DMA, where I just handled the control and something smarter, or at least a whole lot faster, then my code took care of the data.

Through my wanderings I somehow ended up at the National Center for Atmospheric Research, a national lab in Boulder Colorado. NCAR had a floor full of supercomputers (mostly Cray Research machines at the time). It was there I learned that when you have a lot of supercomputers, you have a lot of data. Petabytes of data. Having a lot of MIPS and FLOPS lying around isn't useful unless you have a lot of MBPS available too. This is where the NCAR Mass Storage System (MSS) comes in. The NCAR MSS is hierarchical storage system that manages a huge disk farm, several robotic tape silos, and (at the time) a vast backing store of offline tape cartridges.

The NCAR MSS Group is credited with inventing the idea of the separation of control and data as it applies to mass storage. Control messaging was carried among a distributed farm of servers and adapters and their client supercomputers over LAN technologies that most of us would recognize, typically Fiber-Distributed Data Interface (FDDI) or some form of Ethernet. The ancillary equipment handled the I/O setup, including any necessary staging of data and mounting of tapes, and it wasn't until the gigabytes were ready to fly that the supercomputers were re-engaged. Then those gigabytes streamed over specialized data paths using (again, at the time) more exotic and specialized I/O-centric technologies like High Performance Parallel Interface (HIPPI) and Fibre Channel (FC).

There were a lot of good reasons for this separation. The LAN technologies at the time tended to be broadcast media. The I/O technologies were switched, point-to-point channels. The LAN technologies had small physical layer data segments which were not optimal for streaming a lot of data really quickly, but was good for moving small payloads between widely distributed endpoints. The I/O technologies had high initial connection setup latencies, but were very high performance once the data stream began. The LAN technologies were cheap. The I/O technologies were expensive. The separation of control and data was a physical-layer artifact. (I've been told that technologies like Gigabit Ethernet have muddied the waters in this respect since I worked at NCAR.)

The NCAR MSS was using the available technologies in the way in which they were intended, and reaping the performance and economic benefits. It wasn't a great leap to notice the common pattern between this architecture at NCAR, where it was writ large, and my earlier work developing tiny little device drivers in assembly language.

It wasn't until I joined Bell Laboratories and entered the domain of telecommunications that I discovered that the telephony folks had also faced, and solved, this same problem. Digital telephony trunks were bundles of sixty-four kilobit per second bearer channels controlled by a single signaling channel of the same bandwidth. Protocols like Integrated Services Digital Network (ISDN) controlled the setup and release of telephone calls over the signaling channel, called the Delta-channel or more commonly the D-channel, while the eight-bit voice samples of each telephone call were transferred over one of many Bearer- or B-channels.

In ISDN, the D-channel and its associated B-channels are typically carried over the same physical path, such as bundles of copper wires in a single T-1 trunk, or a single Synchronous Optical Network (SONET) optical fiber, although I have also seen cases where they were split off into separate physical paths as well. But generally, the separation of control and data in ISDN is a transport-layer artifact.

There were very good reasons for this as well. Separating control (signaling) and data (bearer) made economic sense since the messaging bandwidth of a single D-channel was adequate to control tens of B-channels. The data streams from the D-channel and its B-channels were routed to very different endpoints, the D-channel messages to a control element that would be a conventional processor, the B-channel voice samples to a digital signal processor (DSP) or a time division multiplexed (TDM) circuit switch or backplane. And that fact that the asynchronous signaling messages were carried out-of-band, instead of in-band along with the synchronous voice traffic, decoupled these two streams so that the voice streams could be controlled more efficiently, and it reduced jitter in the voice streams as well.

So it is no surprise that when Voice over IP (VoIP) came along to replace traditional TDM-based telephony, the IP world adopted the same pattern. The Session Initiation Protocol (SIP) is a commonly used application-layer protocol for setting up and releasing VOIP calls. But stations and trunks that speak SIP pass their application data (for example, voice samples) using some other application-layer protocol specialized for that purpose, typically the Real-Time Transport (RTP) protocol.

In SIP, the separation of control and data is mostly an application-layer artifact. The SIP and RTP streams use the same network protocol, IP, and the same physical layer, such as Ethernet, although the transport-layer TCP or UDP packet streams may be handled quite differently in intermediate routers because of their very different quality of service requirements.

The separation of control and data is a common architectural pattern that seems to be reinvented in every problem domain in which I find myself working. In a recent project implementing business process automation of high-end communications capabilities, the control messages were carried over an enterprise service bus, but the data (which from the application's point of view was both the SIP traffic and the RTP traffic) was carried over the LAN.

What other examples of this architectural pattern can we find if we look hard enough? And where should we be applying it that we are not?

Important Safety Tip: Enable Ping with Comcast

Installing a broadband internet connection and a WiFi router is the the kind of no-brainer that every techie loves. It's simple, it's easy, it's the greatest thing since OO, and it makes us look like frackin' geniuses to our significant others.

It took the Overclock household a while to get to this point: the palatial Overclock estate is far enough out in the boonies that Qwest didn't offer DSL, and our development is old enough that the Comcast cable plant wasn't quite up to snuff. But a couple of years ago Comcast upgraded their facilities, which motivated your correspondent to trundle down to the local COMP USA (pronounced, I am told by my friend Tom O'Dell, "com-pu-sa") for some LinkSys gear. Since my SO and I already had notebooks with WiFi radios, in a couple of hours Mrs. Overclock (a.k.a. Dr. Overclock, Medicine Woman) was looking at me in a way you could only appreciate if like us you have been married for over twenty years.

(About a year later she trumped me by buying a TiVo. Without TiVo, life as we know it could not exist. But that's another story.)

For about a year we lived with the internet connection going down about once a month. Just rebooting the router fixed it. The frequency was just below the irritation threshold required for me to take the time to actually figure out what was going on.

But apparently my subconscious was not willing to let it rest. It suddenly occurred to me: I wonder if Comcast is expiring the DHCP lease if the endpoint does not respond to ping? Being the paranoid sort, I had disabled ping in the router configuration, figuring that some camouflage in the dimly lit back allies of the information superhighway might be a good idea.

The DHCP status page on the router didn't seem to indicate a problem, but I took a chance and enabled ping anyway. Our Comcast connection has worked flawlessly for months since then.

WiFi and TiVo. LiFe is GoOd.

Friday, January 19, 2007

Fun Facts to Know And Tell: Java Business Integration

For over a year I worked with a team of developers on a project to expose feature-rich multi-modal communications capabilities to business process automation. The systems engineers probably had their own description of what the product did, but that's how I thought of it, and it was a darn good idea. One of the frameworks we built on top of was ServiceMix, Apache's open source enterprise service bus (ESB) based on the Java Business Integration (JBI) 1.0 specification.

There are times I'd like to claim to be a specialist, but my resume suggests otherwise. For sure, I am not a specialist in web services. So when duty called for me to become the local ServiceMix and JBI expert, I approached it more from a real-time messaging perspective than a Web Services perspective. This article describes some of the surprises I had in store.

Most of these fun facts to know and tell are about JBI and would probably apply to any ESB that was JBI-compliant. A few of them are about ServiceMix specifically. None of them have anything to do with the actual product that we developed.

Our development began with ServiceMix 2.0, progressed to ServiceMix 3.0 for our commercial release, and proceeded to test with ServiceMix 3.0.1. The core portion of ServiceMix that implements the JBI spec was quite stable and usable in the releases we chose to use, particularly considering the relative immaturity of the product. Like all active open source projects, ServiceMix is a moving target. Depending on what specific Subversion trunk of ServiceMix you decide to use, your mileage may vary.

In the sections below I cite the appropriate portions of the JBI spec for your reference.

Service Engines vs. Binding Components [4.3, p. 14]

The JBI spec, and most JBI presentations, seem to make a big distinction between a JBI bus component which is a Service Engine (SE) versus a component which is a Binding Component (BC). I find this to be mostly a conceptual distinction, depending on how you choose to think about the components that you develop for your application. The JBI API is exactly the same no matter what you call your component. (Former colleague Rick Block suggested that early in the development of the JBI spec this was not the case.)

I think of an SE as a component that encapsulates the implementation of some service and accepts requests and generates responses for that service only over the ESB, using JBI normalized messages inside JBI message exchanges. It may maintain internal state about the service it provides. An SE might front-end a database used to maintain inventory. It might be a JBI wrapper for a legacy service like weather information. It might be an abstract interface to a vastly complex system sitting outside the ESB. It might do something dirt simple like return the time of day from a central clock.

I think of BCs as purely protocol converters that serve as gatekeepers between the outside world and the ESB. A BC serves as a proxy between an external client that sits outside of the ESB and an SE that sits on the the ESB. If the BC maintains state, it is only about the transactions that are in progress between the outside client and the SE, or, in JBI-speak, about message exchanges that are in the active state. Once a Message Exchange reaches a terminal state like done or error, the job of the BC for that particular transaction is completed. A BC might convert between external clients using the Java Message Service (JMS) and internal services using JBI messaging. A Simple Object Access Protocol (SOAP) router on the ESB that receives SOAP requests from external web services clients (like your web browser), turns them into JBI message exchanges, and routes them to an SE on the ESB, is also a kind of BC.

This is fuzzier than you might think, and discussions over whether a component is an SE or a BC can quickly devolve to how many angels can dance on the head of a pin. For example, suppose you had a component that incorporated a SIP protocol stack to implement complex VOIP control capabilities. You can think of this component as a container of all the SIP endpoints with which it will ever communicate, and the service it provides is a set of high-level communications capabilities. In this sense, it is an SE. You can also think of it as a complicated protocol converter between SIP endpoints that speak only the SIP protocol and SEs on the ESB which speak only JBI messages. In this sense it is a BC.

In general, don't agonize too much over whether your component is a Service Engine or a Binding Component. In practice, the distinction isn't terribly important from a purely JBI point of view, although it may be important from an architectural point of view for your application. Separating BC and SE functionality makes your application more flexible if you can design it such that just by adding a new BC you can expose your SEs to a whole new domain of external clients, like SNA systems.

Services vs. Endpoints [5.1.1, p. 21]

I think of JBI as a fractal collection of containers: the JBI container (in our case, ServiceMix) contains JBI components. Each JBI component contains zero or more services. Each service contains one or more endpoints. Messages on the JBI ESB may be addressed to a service or to an endpoint within a service.

For example, a time service on the ESB might be a single service with a single qualified name (a service name within a specific namespace). But that time service might expose multiple endpoints, one endpoint for any of several NTP servers, and each with an endpoint name unique within that service. Requests for time could be addressed to the time service itself, in which case the selection of which NTP server to use is left up to some other mechanism, or it could be addressed to a specific endpoint, so that the time would be provided by a specific NTP server. The JBI spec says that if you addresses a request to a service, the choice of endpoint is implementation dependent; in this case, it really means your implementation when you code the service.

Because every service must have at least one endpoint, I sometimes use the terms service and endpoint interchangeably, but as we will find out when we discuss routing, the distinction is important.

Providers vs. Consumers [5.1.1, p. 21]

Given my background in real-time messaging, I really struggled with this JBI nomenclature, and it turns out to be vitally important to understand the distinction. I am used to thinking of messaging components has having the roles of producer and consumer, as in a message producer is that which sends the message, and a message consumer is that which receives the message. These are strictly roles played during a specific message exchange, so a single messaging component may be both a producer and a consumer at different times.

In JBI, provider and consumer are also roles played during a specific message exchange, as in a service provider provides a service, and a service consumer makes a request of the service provider. In this context, the service consumer may never receive a message at all (other than, in JBI, getting a done back from the service provider to indicate the request was successfully received) and a service provider may never send a message (other than sending the done, which carries no payload). As above, a single component may be both a service provider and a service consumer. In receiving a request from a consumer it may act as a provider, but in servicing that request it may act as a consumer by making requests of other providers. Once I got my head around this fact, things got a little easier.

A service provider must activate an endpoint on the ESB. The endpoint is one means through which messages exchanges are addressed to the service provider. (More on this later.) Once it does this, it has exposed its endpoint to receive message exchanges from service consumers. If a component does not activate at least one endpoint, it cannot act as a service provider, because service consumers have no way of addressing message exchanges to it. If a component never acts as a service provider, it need never activate an endpoint on the ESB. It can still address message exchanges to service providers, send those exchanges on the ESB, and receive responses. But no other component on the ESB can originate a new message exchange and send it to that consumer because the consumer is effectively invisible on the ESB.

The pipe through which components on the ESB send and receive (or accept, in JBI-speak) message exchanges on the ESB is the delivery channel. Think of the delivery channel as a socket and you won't be too far off. The JBI container gives each component on the ESB a delivery channel. When a consumer sends a message exchange to a provider over the delivery channel, the message exchange only contains the address of the service and endpoint exposed by the provider that were used by the consumer to address the message exchange. There is no standard JBI mechanism for the provider to query what consumer sent the message, and in fact it isn't even meaningful to ask the question. Since the consumer isn't required to activate an endpoint on the ESB, it has no address to which new message exchanges to it may be addressed.

Of course, the JBI container knows to whose delivery channel to send any response the provider may generate, but this is hidden inside the implementation. And of course, the consumer may also play the role of provider, expose an endpoint on the ESB, and send the address of this endpoint as part of the payload it sends to in its request. But from the point of view of the JBI container, this is the address of a provider on the ESB, not of the consumer that send request.

Because there is one delivery channel per component, but a component may expose many services and endpoints to which message exchanges may be addressed, I like to think of components as sending requests, and as the endpoints inside the components as receiving requests. But this is just my way of thinking.

Routing [5.1.7, p. 26]

Message exchanges may be addressed three ways: to a service (implicit routing), to an endpoint within a service (explicit routing), and to an endpoint reference (dynamic routing). When I described my example of a time service fronting several NTP servers, it probably occurred to you that there might be perfectly legitimate reasons to do either implicit or explicit routing, and you would be quite right.

Suppose your application was a hybrid of a Service Oriented Architecture (SOA) and an Event Driven Architecture (EDA). A client component acting as a consumer could make requests of a server component acting as a provider, and those requests would be addressed using implicit routing, that is, with just the service name of the provider. If the request for was for an event stream, the client component could provide a service and a specific endpoint to which those future events would be addressed by the server component. When the server component sends an event it would be acting as a consumer, sending an event to the client who was now acting as a provider, and those events would be addressed using explicit routing, that is, using both the service and the endpoint. (I never said this was simple, but it does illustrates how important it might be to get your nomenclature about roles straight in your mind.)

For example, a client component might sent a request to the server component "Lassie" to "look after Timmie", and pass along the service "Mom" and the endpoint "in the kitchen". In the future, should circumstances warrant, "Lassie" would send events like "Timmie fell in the well" to "Mom" "in the kitchen". The events could have been addressed just to "Mom", but this would require the underlying implementation to resolve the specific endpoint, searching "in the barn", "in the kitchen", and "in the bedroom", and it might pick the wrong one. (For purposes of compliance with any hypothetical non-disclosures, any resemblance of "Mom" and "Lassie" to an actual product is purely coincidental.)

Dynamic routing is a completely different animal, which is discussed in the following section.

Dynamic Endpoint Reference [5.4.4, p. 36] [5.5.4.1, p. 50, Listing 4]

Dynamic routing allows a component to be addressed using a fragment of XML that complies with a schema documented in the JBI spec. The schema is specified using Relax NG. This allows components to specify "call back" addresses for future message exchanges as part of, for example, their XML payloads inside of a JBI normalized message. This could have been used in the example above with for "Mom" to have specified the service and endpoint to which "Lassie" would send future events. Such addresses in XML form are referred to as endpoint references (EPR), and the mechanism through which such EPRs are resolved is called dynamic endpoint reference.

Here is an example of an XML fragment that might be an EPR.

<jbi:end-point-reference
xmlns:jbi="http://java.sun.com/jbi/end-point-reference"
xmlns:martin="http://farm.martin.us/jbi"
service-name="martin:Mom"
endpoint-name="InTheKitchen"
>

In this example, the service name is "Mom", the namespace is "http://farm.martin.us/jbi" which is associated purely within the context of this EPR with the XML tag "martin", and the endpoint name is "InTheKitchen".

If your application supports dynamic routing and EPRs, it is up to you to write the code to parse this XML and determine what service and endpoint, if any, it identifies. Dynamic endpoint reference resolution isn't rocket science. A component gives an XML fragment to the JBI container and asks what service and endpoint it identifies, expecting to get either null or a service endpoint object in return. ServiceMix simply queries every component on the bus that has activated endpoints, passing it the same XML fragment, in effect asking "Is this yours?", and expecting either a null or a service endpoint in return. I wrote a simple XML parser using the Java Document Object Model (DOM) framework to do just this.

You might think that would be the end of it. But you would be wrong.

It turns out there are a lot of third-party JBI components out there that expect to use dynamic routing and EPRs, but whose XML fragments do not conform to the JBI spec. One such component is Intalio's implementation of the Business Process Execution Language (BPEL). (We reported this as a bug, so in all fairness this may have been fixed by now.) Intalio chose to format their service name and namespace using a very commonly used but non-standard form known as the James Clark format. Here is an example.

<jbi:end-point-reference
xmlns:jbi="http://java.sun.com/jbi/end-point-reference"
service-name="{http://farm.martin.us/jbi}Mom"
endpoint-name="InTheKitchen"
>

Note that the namespace is part of the value of the service name attribute and not an XML namespace at all. So I modified my DOM parser to handle this format as well.

And you would still be wrong.

It turns out that ServiceMix itself exposes a managed bean (mbean) for each component on the ESB through which a query can be made and a EPR string identifying that component returned. And indeed, this EPR does not conform to the JBI spec either. Here is an example.

<jbi:end-point-reference
xmlns:jbi="http://java.sun.com/jbi/end-point-reference"
xmlns:martin="http://farm.martin.us/jbi"
jbi:service-name="martin:Mom"
jbi:endpoint-name="InTheKitchen"
>

Note that the "jbi" namespace XML tag is prepended to the service name and endpoint name attributes. Maybe this is a valid variation (my XML books suggest it is not), but the DOM framework does not recognize it. So once again I modified my DOM parser to handle this variation as well.

Thus far things are quiet on the EPR front.

send vs. sendSync [5.5.2.1.3, p. 43]

Initially, the messaging operations send versus sendSync seemed pretty straightforward. The former is asynchronous: a component could send a message exchange and go on its merry way doing other work, and any further response regarding that message exchange would be handled in the future when the exchange arrived on the component's delivery channel. The latter is synchronous: the component is blocked on the sendSync until the far end responds in whatever way is appropriate for that message exchange. These were familiar messaging patterns to the real-time developer in me.

I have to admit, as a long-time real-time developer, I don't believe in synchronous message passing, even though it is the backbone of a lot of web services messaging frameworks like Axis. In thirty years of building message-based systems, I haven't found synchronous messaging to result in scalable, high-volume, robust systems, although it sure does simplify the coding. Having a really cheap, scalable thread implementation makes it more palatable. Several Java-based frameworks and containers help out in this regard, too. But synchronous messaging makes me nervous in a deadlock is really possible unless we are very very careful kind of way. Since I have a track record for producing successful, mission critical, 24x7 products, the thought that one design mistake might cause the whole shebang to go toes up makes me a little nervous.

But with JBI, I discovered one really good reason to use synchronous messaging. When using asynchronous messaging, JBI does not require that order is preserved, and ServiceMix certainly does not guarantee it. That is, when using asynchronous messaging, the order in which one component sends message exchanges may not be the order in which the provider receives them.

The UDP datagram socket developer in me said "Oh, yeah, sure". The real-time embedded messaging developer in me clutched his chest, seized, and has not been heard from since. I had made the mistake of thinking of JBI as a real-time messaging system, like any number of others I had used over the years. But while JBI looks like a duck, it sure as heck doesn't quack like a duck. No doubt about it, this is my fault completely, but I'm not exaggerating when I tell you that when I figured out what was happening in my extensive ServiceMix junit regression test suite, I broke out in a cold sweat and probably turned pale.

Some digging into the JBI spec, more digging into the ServiceMix source code, and a brief e-mail discussion with Guillaume Nodet, one of the lead ServiceMix developers, verified that [1] there is no requirement in JBI that order is preserved, and [2] ServiceMix's use of a pool of threads for handling asynchronous delivery pretty much insures that order is a function of mostly-non-deterministic thread scheduling. I found this ultimately understandable, but very counter-intuitive.

WSDL Binding [5.5.6.1, p. 57]

Service providers on the ESB may choose to describe their messaging interface in the form of a service description. A service description is an XML document written using the Web Services Description Language (WSDL).

WSDLs, as such service descriptions are casually referred, have an abstract part and a concrete part. The abstract part describes both the syntax and the context of any messages used by the service provider. The concrete part describes the nuts and bolts of where to find the service provider and what protocol to use to talk to it.

Service providers may actually have more than one interface that they expose. They may expose one interface to clients external to the ESB. This interface describes a traditional web service and is only reachable through a BC. Service providers may expose another interface to service consumers on the ESB. This interface is available only on the ESB and may expose a much richer set of capabilities available only to other components on the ESB.

The neat thing is, both interfaces can be described in a single WSDL. Such a WSDL would have a single abstract part, but two (or more) concrete parts. One concrete part might specify a SOAP binding so that a SOAP router on the ESB could serve as a proxy for the service provider and route SOAP requests from external web services clients across the ESB to the service provider and handle sending responses back. A second concrete part could specify a JBI binding which would tell internal JBI components, including the SOAP router, how to use the internal interface.

When the JBI container parses the service description to extract the interfaces and JBI services and endpoints exposed by the service provider, it should (as ServiceMix does) only extract those interfaces, services and endpoints which are associated with the JBI binding. Any interfaces, services and endpoints associated with other bindings are purely a private matter between consenting components such as the service provider (an SE) and the SOAP router (a BC).

I didn't personally need to make use of this feature, but I did write code using the wsdl4j tool kit to extract the JBI-bound interfaces from the service descriptions that other developers provided for their components. So it is likely that others may have more to say on this topic.

Service Endpoints [ServiceEndpoint, p. 192]

A service endpoint is an object which contains, among other things, the qualified name (service name plus namespace) of a service, and the endpoint name of an endpoint within that service, for an endpoint which has been activated on the ESB. It is not unusual for a component to have to keep track of the namespace, service name, and endpoint name of an endpoint which has not yet been activated, for example, based on configuration information gleaned from a properties file. You may be tempted to create your own class that conforms to the JBI service endpoint interface just for the purpose of managing such information. But do not be deceived into thinking that just because your class conforms to the service endpoint interface that you can pass instances of your class with the JBI container as service endpoints.

The spec is quite clear on this: the JBI container is the only source of genuine service endpoints, and they are only doled out when an endpoint is newly activated by the container on behalf of the component. Just as only God can make a tree, only the JBI container may make a service endpoint. The underlying reason is of course because the JBI container implementation (ServiceMix) maintains private data inside its own implementation of the service endpoint interface to which you are not privvy.

I did in fact make a class just to keep track of endpoints that were defined but not yet activated, but I deliberately did not implement the JBI service endpoint interface, even though doing so would have been very convenient, just so I couldn't accidentally try to pass an instance of this class into one of the JBI APIs.

That's the current brain dump on JBI. More later as it occurs to me.

Sources

Ron Ten-Hove, Peter Walker, Java Business Integration (JBI) 1.0, JSR-208, Sun Microsystems Inc., August 2005

Peter Walker, private communication, 2006

Guillaume Nodet, private communication, 2006

Thursday, January 18, 2007

Traffic Management and System Scalability

If I have any readers left, they are surely tired of hearing me talk lately about the importance of traffic management and rate control. I thought a good way to wrap up the topic (for a while anyway) would be to provide a metaphor and an actual example, in the spirit of going from the abstract to the concrete.

The "Big Dig" was hardly Boston's first controversial road construction project. That historic city's transportation and urban planners have for decades had to deal with the fact that many of the downtown streets are barely wide enough for two lanes. This makes building exit ramps from interstates and other high volume highways into the oldest portions of Boston really problematic. How to you handle the traffic volume of Interstate 95 funnelling down to a cobblestone street that is two hundred years old? How do you widen that street when the real estate on either side of it is going for seven figures? It's not just a traffic management problem, it's an economic issue.

Yet we take exactly the same approach to designing large distributed systems that have longish lifespans. We try hard to make our systems modular so that they can be easily upgraded. We try to make them scalable so that they can be easily expanded. We try to make it possible to incorporate new technology as it becomes available. This is particularly true with systems which have high capital costs and for which customers expect long term amortization. Traditional telephone systems come to mind, but this applies equally well to things like power plants, air traffic control systems, and just about anything built for the U.S. Department of Defense, like aircraft carriers.

Traditional telephone systems are among the most complex distributed systems in the world. A typical design (where "typical" here means "one that I've worked on") has call control centralized in a single processor, a network of communication links going to remote cabinets, and each cabinet containing tens of boards with embedded processors that control everything from digital trunks to the telephone on your desk. The individual embedded processors not only talk to the central processor, but to other embedded processors in the same cabinet, to embedded processors in different cabinets, and to embedded processors in other systems owned by other people. There is a whole lotta messaging going on. The mere act of picking up your telephone handset results in dozens of messages being interchanged between at least a handful of processors across several communications links. As my friend and former colleague Ron Phillips says, "Every time I pick up a phone and get dial tone, I know a little miracle has occurred."

So sometime in the 1980s you start out with central processors capable of running at a speed of a few million instructions per second. You have communication links capable of bandwidths of tens of kilobits per second. You have embedded processors at the ends of those links that run a few tens of thousands of instructions per second. And there is a strong motivation to use cheap processors in the embedded portions of the systems because that's what the customers have to buy in quantity. There are at least tens of such processors in each cabinet, and at least tens of such cabinets in a system. One really big traditional telephone system may support tens of thousands of telephones, have cabinets spread across state lines, and effectively have hundreds, if not thousands, of processing elements.

Things work pretty well, because you architect the system initially with a good balance of processor speed and bandwidth. Because the central processor was so slow, there was a built-in natural limit to how many messages it could pump out. Because the communications links were so slow, there was a natural limit to how many messages from various sources could be funnelled down to a single embedded board.

Time passes. You upgrade the central processor with one that is a hundred times faster than its predecessor. You upgrade some of the communication links (but not all of them) to technologies like Ethernet that have a thousand times more bandwidth. You install some (but not all) new embedded boards that have faster processors on them.

A few more years go by. You now have a central processor that can execute a billion instructions per second, thousands of times faster than what you had twenty years ago, in the same system. You replace some (but not all) of the 10mb/s Ethernet with 100mb/s Ethernet. Maybe you didn't even want or need to replace the network technology, but that's what everyone is using now, including your customers. You upgrade a few of those embedded processors (but not all of them) with faster boards, but it is simply not cost effective to re-engineer all of those legacy boards with newer technology, and even if you did, your customers would rebel at the thought of a forklift upgrade.

It's today. Your central processor runs several billion instructions per second on each of several cores. No one even makes your original processor anymore; the closest thing to it is in your wristwatch. You customers want to use gigabit Ethernet, because that's all they run anymore. You start using VOIP, which requires a volume of control messaging like no one has ever seen in the history of telephony. You've expanded your maximum available configuration to many many more cabinets, and customers have systems that span national boundaries.

And you've still got a lot of those original embedded boards, still struggling to deliver dial tone, while getting fire hosed with messages at a rate they never imagined they would ever have to deal with. Even if they don't receive more messages, they receive the same messages but a lot faster, in a burst. The natural throttles on the rate of messaging, the limit on the speed of the central processor, the limit on the speed of their peers, the limit on the speed of the communication channels, are all gone.

And even if those processors can handle the higher rates, they start to behave in funny ways. Before they could do all necessary message processing in their spare time between doing actual work, like making phone calls. Now, when a burst of messages comes through, they have to drop what they are doing and deal with the burst, or else risk losing a crucial control message when the next burst comes in.

In many ways, your system is a victim of its own success: of its longevity in the marketplace, its architecture that allowed customers to expand it incrementally, and the fact that, gosh darn it, people just like it.

Back in May, I talked about how different technologies were on very different growth curves. Microprocessor speeds double about every two years. Memory density doubles about a year and a half. Network bandwidth increases by a factor of ten every decade.

When you upgrade your system, you use what's commercially available, but what's commercially available isn't all growing at the same rate. And some of your system, like those embedded processors, aren't growing at all. What started out as a well behaved system with a balanced architecture becomes unbalanced and starts to, well, get weird. The pattern of behavior of the real-time system as a whole is an emergent behavior, not one that you deliberately architected. And worse, your management starts to wonder what the heck is going on, because, well, how could those old embedded boards be a problem, because with them you haven't changed anything.

No, Bucky, you've changed everything.

This is not a niche issue. Back in May, I talked about the total cost of code ownership. In that article I cited studies that have shown that 67%, a full two thirds, of the total life-cycle cost of software development is code maintenance, and of that 67%, half of that is either perfective (improving correctly working code, not fixing bugs) or adaptive (changing to meet to new requirements). A huge portion of the cost of software development is the modifications and testing necessary to adapt to a changing environment. This has been referred to as software aging: software gets old and cranky and quits working, even though it hasn't been changed. But the context in which it runs has changed.

Traffic management and rate control are tools in the quest for a balanced, scalable, deterministic system. By controlling the rate at which you throw events around, you maintain control over the engineering of your system. Not your customer by deciding what kind of Ethernet to use. Not your server vendor by deciding what kind of Pentium processor to ship you. Not your service provider by moving your communications link to a faster technology. Traffic management and rate control gives you the capability to decide how your system is going to behave, by letting you determine when you turn up the dial, and by how much. It allows you to take control over the emergent behavior of your real-time system.

I've used large scale telephone systems as an example, but that's just an obvious one. I've also worked on distributed systems that controlled real-time devices to shoot a half million frames of photographic film a month. On mass storage systems that delivered gigabytes of data on a daily basis to a room full of supercomputers. On embedded systems where a single board has thirty different processing elements and dozens of communicating real-time tasks. On service oriented architectures that placed a dozen interacting components on a single enterprise service bus. All of these systems face these kinds of issues.

I'm betting yours do too.

Sources

D. Parnas, "Software Aging", Proceedings of the 16th International Conference on Software Engineering, IEEE, May 1994

Chip Overclock, "Traffic Contracts", January 2007

Chip Overclock, "The Generic Cell Rate Algorithm", January 2007

Chip Overclock, "Rate Control Using Throttles", January 2007

Chip Overclock, "Traffic Management", December 2006

Chip Overclock, "It's Not Just About Moore's Law", May 2006

Chip Overclock, "The Total Cost of Code Ownership", May 2006