Tuesday, January 30, 2007
Is C no longer a subset of C++?
"A Tour of C99" gives a brief synopsis of the new C99 features. "Incompatibilities between C99 and C++09" points to specific areas which are likely to cause grief. He also mentions that there is (understandably) some effort at reconciling these differences. Stroustrup has called for a merging of the two languages. Unless this is resolved, those who use C++ as a "better C" are going to have some heartache. Not to mention we may have two different sets of equivalent header files to keep track of. Hopefully the implications of this are enough to bring some sanity to the situation.
One of the C99 features that caught my eye particularly was the "restrict" keyword that is used in a function prototype parameter list to offer a guarantee to the compiler that two pointers in the list cannot point to the same object. If you've done any reading on how processor memory models have evolved to break commonly used idioms like the double-checked locking pattern for implementing singletons, you can see where this is going. It sounds like an optimization enabler, something along the lines semantically of a hypothetical "guaranteed non-volatile" keyword.
Kaley was on the C++ standards committee for several years, and has a lot of interesting and sometimes scary stuff to say on this topic and other C++ issues.
Monday, January 29, 2007
Generic Embedded Programming Using Templates
Developing embedded systems in C++ is not new. Embedded pundits like Jack Ganssle and Dan Saks have been writing about it for years in venues like Embedded Systems Design. But some of the C++ features that sound like old news to developers working in the mainstream still sound exotic to many embedded developers, who by nature are a conservative lot. And rightly so. Embedded systems frequently find themselves in mission critical applications that are intended to run 24x7. And embedded requirements often impose severe real-time and memory constraints on their implementations. No matter what language you use and what features of it you choose to exploit, writing embedded code requires knowledge of how the language works under the hood and a discipline regarding what features to use and what ones to stay away from.
One of the features of C++ that still seems exotic to many embedded developers is its ability for generic programming. Generic programming is where the developer writes an algorithm in a data-type independent way, and then tells the compiler to generate an implementation for a specific data type. Generic programming allows you to debug an algorithm once then apply it to a much broader variety of circumstances. It reduces the amount of code that must be maintained, and since code maintenance is as much as two-thirds of the total software lifecycle cost, this can directly affect the bottom line of product development. Generic programming abstracts out implementation details which increases the reusability of code.
Most people associate generic programming in C++ with templates, and in this article so shall we. Templates directly support generic programming in exactly the way described above, by allowing you to specify an algorithm then apply the algorithm to different data types, leaving it up to the compiler to generate the appropriate code. As we shall see in later articles, there are other features of C++ (and even of C) that also support generic programming. But since templates are the first thing that folks think about when they think about generic programming, we’ll look at examples of using templates from the Desperado open source C++ library.
(Disclaimer: my goal here is not to teach you C++ or how to use templates. There are lots of books that can do that. I’m going to try to show you some examples of how templates, and generic programming in general, can be used to solve common problems in embedded applications. Rather than go into a lot of detail, I’m just trying to give you a flavor of how generic programming might be applied in an embedded context. Downloading and perusing the Desperado source code is a good idea if you really want to get into the nitty-gritty of the implementation using templates)
Example: Memory Testing
Michael Barr wrote an article in Embedded Systems Programming (now Embedded Systems Design) on how to do efficient testing of memory in software. Barr implemented a suite of three tests: a data bus test, an address bus test, and a memory device test. The code for these tests is dependent on the width of the address and data busses. Depending on the design of the embedded memory system, these widths (which may not be the same) may be 16, 32, or 64 bits.
Desperado implements Barr’s memory test suite as a template, a model algorithm coded without exactly defining the data type, which it stores in the header file <Ram.h>. The definition of the Ram template in <Ram.h> starts out looking something like
template <typename TYPE>
class Ram { ... }
where TYPE is a data type passed as a compile-time parameter to the template when the C++ compiler generates the resulting type-specific code.
An application may ask the C++ compiler to generate the code for a 32-bit wide memory test with the following code snippet, using the uint32_t hardware-dependent 32-bit integer data type defined in the standard POSIX header file <stdint.h>, and the size_t operating-system-dependent data type defined in the standard C header file <stddef.h>. (These system header files are included automatically by the header file <Ram.h>.)
#include <Ram.h>
...
Ram<uint32_t> test32;
The application then may run all three tests of the suite against the memory at location 0x80000000 that is 4KB in length;
uint32_t* address = (uint32_t*)0x80000000;
size_t size = 4096;
...
test32.address(address, size);
test32.data(address);
test32.device(address, size);
This assumes that the address and data buses are the same width. If they are not, the application can ask C++ to generate a second instance of the code with a different width and use both tests appropriately.
Ram<uint64_t> test64;
...
uint64_t* address = (uint64_t*)0x80000000;
...
test64.data(address);
Desperado makes this process even simpler by pre-instantiating code for 8, 16, 32 and 64-bit bus widths, which can be used by an application by just linking them in.
#include <uint32_Ram.h>
...
uint32_t* address = (uint32_t *)0x80000000;
size_t size = 4096;
...
uint32_Ram.address(address, size);
uint32_Ram.data(address);
uint32_Ram.device(address, size);
Example: Cyclic Redundancy Checks
Ross Williams wrote a great paper, A Painless Guide to CRC Error Detection Algorithms, which describes a cyclic redundancy check (CRC) algorithm which can be used to implement any of a wide variety of standard CRCs. CRCs are used to detect data corruption in just about every telecommunications and networking protocol, as well as in many data storage systems. The CRC algorithm he provides is specific to the data type of the resulting CRC value that it generates. Commonly used CRC algorithms produce either 16-bit or 32-bit values. Williams’ algorithm also takes four parameters, the CRC polynomial (which describes how the CRC is computed, encoded as a bit pattern the same width as the resulting CRC value), a boolean indicating whether or not the CRC is reflected (which changes how the value is computed), an initial value of the CRC before any data is passed through the algorithm, and a value with which the CRC partial result is exclusive-or’ed to get the final CRC value.
Desperado implements Williams’ CRC algorithm as a template, with the resulting CRC data type as its parameter, which it defines in the header file <Crc.h>. The resulting class generated by the C++ compiler has a constructor that takes the four parameters of Williams’ CRC algorithm. The definition of the Crc template in <Crc.h> starts out looking something like
template <typename TYPE>
class Crc { ... }
where TYPE is a data type passed as a compile-time parameter to the template when the C++ compiler generates the resulting type-specific code. The generic constructor has a prototype that looks something like
explicit Crc(TYPE poly, bool refl, TYPE init, TYPE xoro);
which declares the four parameters needed by Williams’ CRC algorithm. Note that the parameters poly (the polynomial), init (the initial value), and xoro (the xor bit mask) are all of type TYPE, which is not specified until the template is instantiated by the application. The code for the Crc algorithm is written without knowing exactly what that type is.
The application can ask the C++ compiler to generate the industry standard CRC32 algorithm like so
#include <Crc.h>
...
Crc<uint32_t> crc32(0x04c11db7, true, 0xffffffff, 0xffffffff);
which also passes the four parameters of the CRC32 algorithm to the constructor when it instantiates an object of type Crc<uint32_t>. (Note that the value 0x04c11db7 is a common bit-encoding used to represent the polynomial used to compute the standard CRC32 value.) The application then can use this CRC32 algorithm to compute a cyclic redundancy check across a buffer of data, first initializing the crc32 object. (The Desperado Crc template defines operators for the resulting class which allow an object of that type to be used as a functor, that is, an object which can be called like a function. Hence crc32 is not a function, but rather an object, and the following code is calling methods defined for that object. Williams’ CRC algorithm also requires a state variable, which Desperado chooses not to keep inside the Crc object so that the same Crc object may be used in reentrant code.)
uint32_t state;
...
crc32(state);
uint32_t crc = crc32(state, buffer, size);
To make the power of templates for embedded systems a little more obvious, here are the complete definitions for the industry standard CRC16, CRC32 and ATM AAL5 cyclic redundancy check implementations using the Desperado Crc template.
#include <Crc.h>
...
Crc<uint16_t> crc16(0x8005, true, 0, 0);
Crc<uint32_t> crc32(0x04c11db7, true, 0xffffffff, 0xffffffff);
Crc<int32_t> aal5crc(0x04c11db7, false, 0xffffffff, 0xffffffff);
As before, Desperado pre-instantiates the commonly used CRC16 and CRC32 algorithms so that they can be used by an application by just linking them in.
#include <crc32_Crc.h>
...
uint32_t state;
crc32_Crc(state);
uint32_t crc = crc32_Crc(state, buffer, size);
Example: Variable-sized Objects on the Stack
Most people think of the compile-time template parameters as always being data types, but they can also be primitive types, like integers. This can be used to get around a limitation of many embedded systems: the desire not to allocate and free memory from the heap except at boot time.
Memory allocation can be really problematic for systems with real-time constraints. The time required for heap management is non-deterministic, dependent on the current state of the heap and what has been allocated and freed in the past. This makes it difficult or impossible to predict how long it will take to allocate and free memory in a time-constrained section of code.
Embedded systems very commonly address this issue by pre-allocating all objects at boot time and placing them in pools, which are often implemented as linked lists. When you need a buffer, you do not malloc or new memory from the heap, you merely unlink a buffer from the buffer pool. When you are done with the buffer, you do not free or delete it, you link it back onto the same buffer pool. If you need a buffer and the pool is empty, you choices are [1] break down and allocate a new buffer from the heap, making your pool a little larger when you return it, [2] try again later, or [3] signal an alarm, which might be as simple as lighting a little red LED on the front panel. Choice #3 is more common than you might think. Running out of a resource in an embedded application often indicates a serious failure in traffic engineering of the system in which it is embedded.
Unlike Java (prior to version 6 anyway), C++ also gives you the option of allocating an object temporarily on the stack. Stack variables exist as long as they are in scope, which means within the function or execution block in which they are defined. Often this is perfect for allocating an object whose lifespan is only as long as the current function. (Under no circumstances however should you allocate an object on the stack, pass a pointer to it to another functional unit, and then exit the function. The other functional unit holding the pointer now has the address of a section of the stack which has been de-allocated and which will eventually be reused for some other purpose. Wackiness will ensue.) This is done as simply as
#include <Transliterator.h>
...
{
uint8_t stack[16];
Transliterator transliterator(stack, 16);
...
}
where the object transliterator of type Transliterator will be deleted as soon as the thread of control passes the closing curley brace. In the example above, Transliterator is a simple general-purpose parser implemented in Desperado as a table-driven push-down automata (the original meaning of PDA) and defined in the header file <Transliterator.h>. You can imagine an embedded application defining a simple language for a command line interpreter available via a console port for purposes of field troubleshooting. The application defines the language in the form of a table, which the Transliterator interprets to parse the language.
PDAs require a stack in which to keep track of their current context when the language that they are parsing contains nested elements, for example nested parenthesis for a parser that interprets arithmetic expressions. This is what the array is used for in the example above. How big a context stack does a Desperado Transliterator need? It depends entirely on the language it is parsing, which is defined not by the Transliterator itself, but by the application that is using it.
If allocating objects on the heap were not a problem during run-time, the size of the necessary context stack could just be passed as a parameter to the Transliterator constructor and it would just allocate an array of that size to use during parsing. But since we want to avoid allocating and freeing memory after boot time, we instead allocate a context stack on the C++ stack, as shown above.
Desperado streamlines this process by defining a second class, TransliteratorType, defined in the header file <TransliteratorType.h>, that extends Transliterator by taking the desired size of the context stack as a template parameter. The definition of TransliteratorType in <TransliteratorType.h> looks something like this
template <size_t CONTEXTS>
class TransliteratorType : public Transliterator { ... }
where CONTEXTS is the size of the context stack. As part of the definition of the TransliteratorType class there is a field
uint8_t array[CONTEXTS];
that defines and allocates the context stack as part of the object.
When the application creates a TransliteratorType object by doing the following
#include <TransliteratorType.h>
...
{
TransliteratorType<16> transliterator;
...
}
then the TransliteratorType object, the Transliterator object that is its base class object, and the context stack in a size 16, are all allocated on the C++ stack as part of the TransliteratorType object itself, to be automatically freed when the thread of control exists the code block.
I admit that this example is not a great one, since the context stack can be just as easily allocated on the C++ stack explicitly without resorting to the use of a template. But this technique can also be used in a similar fashion to define variable length objects to be allocated on the stack and later deallocated for use within a code block, without resorting to new and delete. And it does serve as an example of how to bundle such a variable length allocation into the class definition.
What’s the fine print?
There can be some significant compile-time overhead for using templates, particularly when using very complex template definitions like those found in the Standard Template Library (STL), which is now part of the C++ standard library. However, if you are writing your own templates like the ones I described above that are part of Desperado, you are unlikely to notice much difference in compile time. Even when using the STL (which I recommend), compile time performance differences are typically negligible. (I have had the 3.3.2 version of the GNU C++ compiler go into what appeared to be an infinite loop when using the template-related –frepo compiler option, so your mileage may vary.)
There is no more run-time execution overhead to using templates than if your application used the equivalent non-template C++ code using virtual methods, where all method calls are mapped through a virtual table or v-table. I used to worry about this, but I got over it, since it only adds a very few machine instruction to every method call. Still, if your real-time constraints are that tight, you should worry not just about templates, but about any C++ virtual function. If you are not using virtual functions, your C++ code will be at least as efficient as the equivalent C code, if not better, thanks to the many improved optimization capabilities of C++.
Templates can cause code bloat, and this must be managed carefully in an embedded application. Instantiating a template at compile-time causes the C++ compiler to generate the code to implement the class given the template parameters. This can cause multiple instantiations of the same code to be generated by the compiler. In the best case this may be detected at link time, but in the worst case the resulting executable may be a lot larger than it needs to be.
Section 6.5, “Where’s the Template?” of the GCC manual describes your options for controlling when and where the G++ compiler generates codes for templates. Desperado’s makefile uses the G++ command line parameter
–fno-implicit-templates
which forces the application to explicitly define when and where code for templates is generated. For any template instantiations that Desperado uses itself or provides by default, those instantiations are generated once and only once and are stored as object files in the Desperado library, where they may be incorporated into an application at link time. For example, this is done in Desperado as simply as having a translation unit having just these two lines
#include <Crc.h>
template class Crc<uint32_t>;
which causes the compiler to generate and compile the code to implement a Crc object for an unsigned 32-bit value. Any other translation unit, in Desperado or an application using it, that uses the Crc template in a manner like
Crc<uint32_t> crc32(0x04c11db7, true, 0xffffffff, 0xffffffff);
would just link in this pre-defined and pre-compiled code. I recommend this explicit approach for using templates, although the Section 6.5 offers other alternatives.
Templates are just one way for writing generic code. In future articles I will describe other techniques, some of which you may be already using without realizing it.
Sources
Michael Barr, “Software-Based Memory Testing”, Embedded Systems Programming, July 2000, pp. 28-40
Richard M. Stallman et al., Using the GNU Compiler Collection (GCC), 4.1.1, 2005
Ross Williams, A Painless Guide to CRC Error Detection Algorithms, Rocksoft Pty Ltd., Adelaide, Australia, 1993
Desperado, Digital Aggregates Corp., 2007
Saturday, January 27, 2007
Pair Programming is Bad and You Should Never Do It
"I can’t tell you how many times I’ve been in a meeting with even one or two other programmers, trying to figure out how something should work, and we’re just not getting anywhere. So I go off in my office and take out a piece of paper and figure it out. The very act of interacting with a second person was keeping me from concentrating enough to design the dang feature."
Finally someone has hit the nail on the head about what has been bothering me about pair programming. Here is what I think about it:
If a problem is so trivial that two people sitting in front of a computer chattering away at each other can solve it, then it should be the part of the project that is outsourced, preferably offshore where the cost of doing trivial work is lower.
For sure I have done pair debugging, sometimes even trio debugging when it's an especially thorny problem involving multiple functional units, race conditions, and hardware. And, yes, I have done pair programming when trying to do maintenance on some code in a language I don't know (typically some form of assembly), on a processor I've never used before (typically a DSP), where the original developers left the company years ago, and a vice president is calling every half hour to helpfully ask how that escalated field problem is coming along. And sometimes getting a bunch of folks in the same room in front of a whiteboard is a very good thing indeed, particularly in those cases where the Engineering staff is pretty much convinced that what Marketing has promised is pretty much impossible (although to be fair, it never was).
So I understand there are circumstances where working closely in pairs or even in larger groups is not only a good idea, it is downright mandatory (and sometimes it can even be fun). But the way pair programming is usually described sounds more distracting than anything else. When having to interact with another person while trying to figure out something difficult, my interior monologue runs to "STFU!"
Having said that, I am very much in favor of 100% coverage by code inspections. The thought that I might be the only engineer to see my code terrifies me, and I consider myself a pretty good developer, with a long track record of successful product development. But I'm far from perfect. I would not have thought this to be controversial, but apparently it is, if my own recent personal experience is any indication. But I have never been in a code inspection in which I didn't learn something useful. Sometimes it's "Hey, that's a great idea, I'm going to steal that!" Or "Oh boy, I'm going to make sure I never make this mistake." Or maybe "Man, what was I thinking?"
What I am not sure of is when to do the code inspection. In his seminar "How to develop better firmware faster", Jack Ganssle cites studies that indicate the sooner the better. Like, right after you have written the code and before you've even compiled it. This is an economic argument: inspections are a more effective use of engineers' time than debugging, presumably because an inspection that eliminates a bug on average takes less total engineer-time than debugging to remove the bug.
This was likely true back when we had long modify-compile-test cycles, particularly in the dark ages when such cycles involved keypunches and computer operators. But with modern tool chains and disciplines like test driven development (TDD), I'm not convinced. When I'm inspecting code, I like to be able to make basic assumptions, like [1] it compiles, and [2] at least the constructors and getters work. Otherwise I'm too paranoid and spend way too much bandwidth looking at the nesting of parentheses instead of concentrating on the overall design intent.
Since I mentioned TDD, I'd better plug that too. Like a lot of old guys, I stumbled into TDD myself before I ever heard the term "unit test", just because I have the virtue of all good developers: I'm lazy. I stumbled into structured programming, and later into object oriented design, the same way. Not the easy way, of course, by learning about it from a class or a book. But by doing software development the hard way and finding I was spending so much time at work that I wasn't keeping up with Battlestar Galactica. (That would be the original BSG.) The good part about this was when I was finally introduced to these concepts, there was zero resistance on my part. It was more like I took serious offense that no one had explained this to me earlier, saving me a lot of time and effort.
So if there's anything you know that will help me keep up with the new BSG, why don't you just let me know now, instead of forcing me to painfully learn it myself? I appreciate it.
Thursday, January 25, 2007
If Java is the new COBOL, is C++ the new assembly?
Clearly some folks mean “Java is obsolete, just like COBOL”, and maybe in some problem domains it is. Domain-specific languages like Ruby on Rails are making in-roads in a lot of roles that were once nearly exclusively Java-based, like web commerce.
But many seem to mean “Java is very broadly deployed and very stable, and is widely used and trusted in enterprise applications, just like COBOL”. Of course, the subtext here is that Java is no longer cool because it’s mainstream.
Being compared to COBOL doesn’t really seem that bad of a thing. I am a little embarrassed to admit that very early in my career I was paid to write a little COBOL. I even got to meet Grace Murray Hopper. But I hadn’t really given COBOL much serious thought for years until in the span of a few days I read some pretty startling (and somewhat contradictory) statistics about COBOL.
In his column “Bar Bets” in a recent edition of Dr. Dobb’s Journal, Jonathan Erickson quotes that there are “250 billion lines of COBOL source code in use”, and “1.5 billion new lines of COBOL are written every year”.
In “How to Interpret COBOL Statistics” in Bob Lewis’ Advice Line column, a reader quotes “various Gartner reports”: “Five billion lines of new COBOL are being written every year” and “Thirty-four percent of all coding activities are in COBOL”.
No matter which set of statistics are correct, this is not bad for a forty-year-old programming language. If Java reached the level of acceptance and use within the enterprise that COBOL has obviously achieved, I would count that as an incredible success. Considering that most information technologies have a half-life of about five years, technologies like FORTRAN, COBOL, C, and TCP/IP which have done substantially better than that have to be admired.
Bob Lewis’ reader also cites “There are 310 billion lines of legacy code operating in the world (65% of all software)”. This jives with my visualization of the total amount of source code in the world as an expanding sphere. The labor spent maintaining legacy code is represented by the volume of the interior of the sphere, whereas the labor spent writing new code is represented by the surface of the sphere. They are both expanding, but while the surface of the sphere expands proportional to the square of the diameter, the volume expands proportional to the cube. Once it ships or goes production, all code becomes legacy code. If you drilled a core down through the sphere, you would find some Ruby scattered around on the surface, a thick crust of Java, a churning molten layer of C++ and C, and a solid core of FORTRAN, COBOL and assembly.
So there will always be a lot of legacy code, and in greater amounts than there is freshly minted code. As long as it is in use, all of that code has to be maintained. If you asked the COBOL developers of the 1960s what the life span of their creations were likely to be, could they possibly have predicted that their work would still have been running in the twenty-first century? Yet Y2K showed us that this is indeed the case. I suspect Java developers will have a lot of work ahead of them maintaining the ever growing legacy Java base.
One thing I find particularly interesting about the success Java has had in the enterprise is the relative lack of success it has had in the embedded arena, at least in the United States. Someone remarked recently that in Europe, Java is considered an embedded programming language, while in the United States it is seen as an enterprise programming language. Witness the fact the new Apple iPhone does not include a Java Virtual Machine (JVM), and apparently, if Steve Jobs has any say (and one must assume he does), it never will. Yet dozens of other mobile phones and wireless devices have a JVM nestled inside to run Java applications. This is a little ironic since Java was first conceived at Sun Microsystems as an embedded language for applications like set top boxes, where it is in fact used. I suspect most of Java's success came from its application in the enterprise domain, the embedded domain as usual getting no respect.
I have a long background in developing real-time applications. This has understandably manifested in my working on a lot of embedded products, which tend to be heavy on concurrency, synchronization, and message passing, stuff I have seared in my brain if not tatooed on my body. I have watched, and participated, as embedded development moved from assembly language to C and then to C++. Way back in 1999, I worked on a project that developed an embedded product in Java, with the low level device code in C. I thought that Java was a great embedded language back then.
I still think so today, tens of thousands of lines of both C++ and Java later. It has all the stuff you want in an embedded language, mostly stuff that C++ lacks, like built-in concurrency and synchronization, not to mention a simpler syntax than C++. Yes, it lacks some of the features you need to easily do C-ish things like access memory mapped registers as variables. The JVM and its bytecode can be troublesome (although I believe they solve far more problems than they cause). The way memory deallocation is hidden from view means that garbage collection may have to be managed. And the fact that you can’t swing a cat in Java without allocating and deallocating a bunch of objects, if not in your code then in the dozens of classes that constitute the Java utility classes, or in the hundreds of frameworks that have served to complicate Java development, causes some headaches.
But this shouldn’t disqualify Java as an embedded language, any more than any of the quirks in C++ disqualified it. It took discipline and a thorough understanding of how C++ works to use it effectively in embedded applications, but doing so yielded economic benefits like greater productivity, simplified maintenance, and ultimately reduced time-to-market. I was lucky enough to have mentors like Tamarra Noirot, Randy Billinger, and Doug Gibbons who carefully tutored me in the design patterns I needed in my transition from C to C++ for embedded development. Yet even as effective as C++ is for embedded applications, there are still always portions of the application that have to be written in C, or even assembly, to accommodate the hardware and the tougher real-time portions of the system, just as there were in that embedded Java project back in 1999.
It will take a similar discipline and understanding of Java to use it for embedded development, but I believe that that must happen. I recently spent more than a year working on yet another Java-based project, this one a non-embedded product to expose feature-rich multi-modal communication capabilities to business process automation via web services. I was reminded how much more productive I am in Java than in C++ or C. It's not just that I'm more productive writing code in Java (although I am), but it is the amazing tool chain that supports Java. Tools like Eclipse (an IDE which works for C++ too but is much better with Java), Ant (build), JUnit (unit testing), Cobertura (code coverage), and JProfiler (memory usage). Not to mention the rich set of utility classes and frameworks available for Java. Programmer productivity is one of the keys to time-to-market, and time-to-market is one of the keys to success in today’s competitive environment. True, maybe you can’t write 100% of your embedded application in Java. But writing 100% of it in C++ will have similar economics as twenty-five years ago when I remember embedded developers arguing against writing in C instead of assembly, or ten years ago arguing against C++ instead of C.
So C++ is the new assembly. And that’s not a bad thing. It will always have its niche in embedded development. But for purely economic reasons, in the coming years you must minimize the amount of embedded code you write in C++, just as you must minimize the amount you write in C, or in assembly. You will do it because you cannot be competitive otherwise.
When I was discussing this with my friend and occasional colleague Demian Neidetcher, he asked “If Java is the new COBOL, and C++ is the new assembly, what does that make Visual Basic .NET?”
You already know the answer, Demian. It’s the new Basic.
Sources
Jonathan Erickson, “Bar Bets”, Dr. Dobb’s Journal, #392, January 2007
Bob Lewis, “How to Interpret COBOL Statistics”, Advice Line, January 17, 2007
Bill Venners, "The Art of Computer Programming: Conversation with James Gosling", Artima Developer, March 25, 2002
Wednesday, January 24, 2007
Job Lock and Universal Health Care
I am a 50-year-old self-employed software engineer who unashamedly sponges off his wife's benefits. When I quit my day job which provided full benefits, you can bet I would not have done so had I not had her benefits as a safety net. The risk for me was small: no kids, house almost paid off, no other large debt, large sums already in our 401(k)s, stockpile of firearms in the gun safe in the basement for the coming end-times. Given our personal situation and the fact that she's a physician with a large not-for-profit HMO, you can imagine that national health care issues have been a big topic of discussion during our 23-years of marriage.
Something will have to be done about this as the baby boomers start retiring in bulk. There will be a kind of domino effect as there aren't enough young people domestically to replace all those folks in their jobs, and so those positions will have to automated, off-shored, or just eliminated, and employers can no longer afford to continue to subsidize the growing health care burden of their retirees. The effect of this is that offshore employees will be paying for the health care of an America that is largely populated by retirees. This will not go over well. If the baby boomers become a majority voting block, they will simply keep voting elected officials out of office until someone solves the problem with some kind of socialized medicine. This is going to have to be done even though a big percentage of tax paying citizens are themselves retired, and a huge portion of the national wealth (and, hence, the ownership of publicly traded companies) is tied up in their retirement funds. The math alone makes by brain hurt.
This will also require a lot of rethinking, and some hard talk, about health care on each of our parts, given some of the stories my spousal unit tells me. No, you can't have a CAT scan just because you think you need one. Socialized medicine will pay for your kidney transplant three years from now, if you're still alive. Your alternative is to pay for it yourself. You're still drinking after your first liver transplant? Sorry, no third liver for you.
I'm amazed big corporations aren't all over this like white on rice. (The BusinessWeek article suggests some are.). Is employee mobility that scary? Have they mistreated their employees so badly that they think they need the health care lasso to keep them on the job? Is the cost of employee turnover in the short run so much scarier to management than the cost of retiree health care in the long run? Are we really talking about a fundamental change in the balance of power between employers versus employees? Surely getting, and keeping, the right employee in the right job is the right thing for both the employer and the employee. Otherwise we're making it sound like some kind of indentured servitude.
There is some evidence to suggest that high-tech will be hit worse than other job sectors by this effect. Another article from BusinessWeek, October 16, 2006, "Still Working and Loving It", cites the following statistics: "From 1980 to 2000 the total number of science and engineering degrees earned grew at an annual rate of 1.4%, far below the 4.2% growth of science and engineering jobs" and "nearly 30% of all science and engineering degree holders currently in the labor force are age 50 and older". Of course, these are just measures of science and engineering degree holders; some of the best developers I've ever worked with had degrees in music, some from very prestigious conservatories. Still, it does give one pause to think about where the intellectual property of the future is going to be developed, and by who.
Taking a purely global view, maybe it is okay if globalization moves all manufacturing (which in an information economy is the development of intellectual property) offshore, but only if the offshore work force also willingly participates in supporting the global collection of retirees. Isn't this fair? Otherwise you have a disconnect between business responsibility and social responsibility. If you're going to take the jobs, you have to take the responsibility that goes with them.
This suggests a shifting of the burden of the funding of socialized medicine from personal income taxes to corporate income taxes. I think it's going to get ugly.
Tuesday, January 23, 2007
Lies, Damn Lies, and Net Present Value
Some of the competitors are obvious: other start-ups, other new products or features under consideration by your company, and other research proposals. But what isn't so obvious to most technologists, but is burned into the mind of every M.B.A., is that you are also competing against the stock market. You have to convince your funding source (or, as I like to think of them, money guys) that they would be better off giving you the money than simply investing it in the stocks or other financial instruments.
The bad news is that if the stock market is doing pretty well, or if the perceived risk of investment in stocks is a lot lower than the risk of investing in your project, you will be fighting an uphill battle. The good news is that there is a way to calculate how you stand compared to your money guys just calling their broker. This calculation is called net present value (NPV). It accounts for the time value of money by calculating an equivalent present value of an investment whose return will be made over a span of of time. It is a way of comparing apples to apples when evaluating several investment options.
Since most technologists are hard-wired for math, this is quite likely going to make a lot of sense. While I am most definitely a technologist (I just took business classes as electives as an undergraduate for fun), I found that understanding just a little bit about NPV gave me a lot of insight into how and why new product decisions are made the way they are, and why certain projections seemed to be unrealistically inflated.
NPV has just three variables. It is the choice of the values for these variables around which all discussion with your funding source will revolve. It is pointless to argue about the use of NPV, or how NPV is calculated, because the money guys will know that this is a standard business tool, and they will likely have been using it for years. But battles will be won or lost by the choices made, and the justifications offered, for these variables. The reliability of the value that NPV computes will be no better than the reliability of the values chosen for these variables.
Duration: this is the total life span of your project and the product that it creates, from the first year you start spending money to develop it until when you finally decide to discontinue the product. Don't kid yourself: your money guys know that all products have a life span. High-tech products tend to have shorter life spans than most others, more like fresh produce than building materials. The duration is in units of years (although I can imagine some high-tech projects in units of months). Let's call this this variable N.
Annual Net Income: this is the amount of income that would be realized by your product year by year over its entire duration after all costs are subtracted out. It is likely not be the same for every year. For example, if it takes two years for your product to hit the market, for the first two years this number will be negative, representing the initial development costs with no income. In subsequent years in which this number is positive, it will represent the income generated by your product minus any costs associated with manufacturing, support or maintenance. And even if your product is successful, its annual net income will typically rise until it saturates the market and than fall. If all of your annual net income numbers are zero or negative, you are already screwed. We'll call this variable A[i] where i is the elapsed years that runs from 0 to N-1.
Cost of Capital: this is effectively an interest rate that captures what interest could be made year by year by investing the funding you are asking for in a financial instrument with a similar risk profile as your proposal and over the same duration. Note that this captures not just how well alternative investments might do, but how their risk compares to yours. You could argue that this may change from year to year, although in all the NPV calculations I have ever seen it is assumed to be constant. Unless you have the amazing secret to wealth, it will be a lot less than one, like 0.1. We'll call this variable C.
Here is the formula for NPV. Note the the caret "^" is a "raised to the power of" operator, as in 3^2 equals 9. (I've seen slightly different formulations of NPV from different sources, so your mileage may vary a little, but they are all equivalent.)
NPV = sum (i = 0 to N-1) ( A[i] / ((1 + C) ^ i) )
Let's look at an example.
You have a great idea for a new product, the flux capacitor. You think it will cost one million dollars spread across two years to develop and productize it.
Since you have some equipment you have to buy and some licenses you have to purchase, you have some up front costs the first year that you don't expect to have in the second year, so the expenditures for those two years of development are a little front loaded.
You expect the life span of the flux capacitor to be eight years before it becomes obsolete. That gives you an N of ten years: two years of development plus eight years of sales. You expect to hit your sales peak around the fourth year after introduction.
Here is your projected net income, or A[i], year by year for the total duration of the flux capacitor product, based on how many units you think you can sell, at what price, and what the unit cost of each flux capacitor is likely to be.
A[0] = -$700,000
A[1] = -$300,000
A[2] = $50,000
A[3] = $100,000
A[4] = $250,000
A[5] = $500,000
A[6] = $250,000
A[7] = $100,000
A[8] = $70,000
A[9] = $20,000
So far this doesn't look too bad. For an outlay of $1,000,000 over two years, you make back a total of $1,340,000. That's a net income of $340,000 or 34% of your initial investment.
You still need a value to use for C, the cost of capital. Here's the thing: you don't provide that number. Your money guys are going to have someone with a degree in accounting or business administration who is going to provide that number. They are going to base it on lots of factors, like the Dow Jones Industrial Average, the return on investment (ROI) of their existing investments, maybe the ROI of one or more stock index funds, or what they read in the Wall Street Journal this morning. The WSJ is actually the most likely source: it regularly publishes numbers just for this purpose. No way are you going to have more credibility than the WSJ. You have no control over the value used for C. Let's say your money guys use a cost of capital of ten percent or 0.1.
Here is our calculation for NPV. (Note that I'm just showing two decimal places,but my calculator keeps more.)
NPV = -700000/(1.1^0) - 300000/(1.1^1) + 50000/(1.1^2) + 100000/(1.1^3) + 250000/(1.1^4) + 500000/(1.1^5) + 250000/(1.1^6) + 100000/(1.1^7) + 70000/(1.1^8) + 20000/(1.1^9)
NPV = -700000/1.00 - 300000/1.10 + 50000/1.21 + 100000/1.33 + 250000/1.46 + 500000/1.61 + 250000/1.77 + 100000/1.95 + 70000/2.14 + 20000/2.36
NPV = -700000 - 272727 + 41322 + 75131 + 170753 + 310460 + 141118 + 51316 + 32656 + 8482
NPV = -141489
Yeah, you are interpreting that number correctly. Even though you think your project will make $340,000 over its ten year run, your money guys would have been far better off just investing the money in the stock market. If everything goes your way, they stand to lose $141,489 by betting $1,000,000 on you. You are in a battle against compound interest, and that is a hard battle to win. Just be glad your 401(k) is subject to the same math.
If you have any brains at all, you will have done this calculation long before you ever appear in front of the money guys, and you already know you are in trouble. So what can you do? You only have a small number of variables that you can affect. See if any of these strategies sound familiar based on your own experience in product development.
You can ask for a smaller initial investment. We can hire just half as many developers, and have them work twice as many hours. We can buy half as many workstations and tell our developers: it's pair programming! We can ask for less money up front in the hopes that once the money guys commit, we can get more funding later.
You can get the project done in less time so that you start making money sooner and for a longer period of time. We can hire twice as many developers to get it done in half the time. We can skip those time-consuming requirements and design phases. We can claim that we can get it done in half the time in the hopes that once the money guys commit, we can get more funding later.
You can extend the sales life span of the flux capacitor. We're pretty sure that even after the oil economy collapses, there will still be a market for flux capacitors. It's unlikely that our flux capacitor design will become obsolete in less than fifteen years; after all, we're designing for expandability!
You can reduce the on-going cost associated with each flux capacitor you sell. This on-going cost is known as the cost of sales (COS) or sometimes cost of good sold (COGS). We won't need technical support. We can offshore technical support. We don't need a marketing or sales department, these babies will sell themselves! We can use cheaper materials; after all, if it breaks, then maybe we'll sell another one.
You can increase the sales price of each flux capacitor. Of course, this is tricky. There's this little issue of reducing sales by increasing the price.
You can decrease the sales price of each flux capacitor. Everyone loves a sale! We can offer a rebate. How do we do it? Volume!
You can increase the projected annual sales. Once the oil economy collapses, we're positive that everyone is going to need two things: a 9mm Glock and a flux capacitor! We'll be lucky if we can keep up with demand.
And probably some other strategies I haven't thought of, or haven't experienced first hand.
I can see the light bulb going on over your head. This explains a lot. Once I started really thinking about NPV, I began to wonder how any new products were ever developed at all. Of course, if no new products were ever developed, there would be a lot fewer stocks competing for investment dollars, because all the manufacturing companies would eventually go out of business. So there is a kind of dynamic balance going on that impacts the entire global economy. A Nobel prize or two has probably been awarded over the analysis of this very thing.
I've heard engineers claim that an analysis like NPV is short sighted. That's easy to say when it's not their money. They might feel differently if they realized that they might be talking about the money in their own 401(k)s. At least for publicly traded companies, I also think it's important to remember that the money guys have a fiduciary duty to their shareholders, a legal responsibility to make the best use of the shareholders money. The shareholders are literally the owners of the company. They would much rather get that money themselves than have the officers with whom they entrusted their company fritter it away making flux capacitors. Failure to live up to fiduciary duty can result in, best case, shareholder lawsuits, or, worst case, in FBI and SEC investigators pulling up in black SUVs. Somewhere in the middle is where the officers of the company are fired.
As a young technologist, I found it hard to understand the business process. It was a lot easier, and more fun, to read about that new fangled language, C. But as I got more and more experience, and probably as I got more and more money in my 401(k), I began to appreciate the kinds of tough decisions managers of companies have to make on a daily basis about what to make, how to make it, and why.
Sources
Daniel P. Golman, et al., "Calculating Net Present Value", Business, Perseus Publishing, 2002
Barry Karafin, "Business Management for Technologists", (course slides), September 2005
Randy C. Perry, David W. Bacon, "The Business Case for Design for Six Sigma", (e-book), Prentice-Hall, 2007
Separate But Equal II
"I think of IBM's channel architecture as another example of the control/data separation, similar to what you describe in the MSS. As to other examples -- the idea is fundamental to hardware design; standard digital design rules keep control logic and datapaths separate."
Dr. Hemmendinger might be more insightful than he knows here. The NCAR MSS that I described did in fact, at least at the time, make use of IBM I/O channels; the high-end storage devices it managed were controlled by an IBM mainframe. But the mainframe accessed those devices primarly for purposes of data migration and management of meta-data. The NCAR MSS switched I/O architecture (which it was using at least a decade before switched I/O became wide-spread) allowed its supercomputer clients to access the I/O devices directly.
Apart from the NCAR MSS, IBM's I/O channel architecture has been around at least since the late 1960s, and so may be one of the earliest examples of this pattern. I remember writing simple channel programs for tape drives in the mid-1970s during my IBM 360/370 assembler period. IBM's I/O Subsystem (IOS) was another first for me as well: it was the earliest example I can recall of using object oriented design and implementation for a real-time application, although the terms "object oriented" and "real-time" didn't really exist way back then. The example of the IOS has stuck with me for decades.
His second comment is spot on too: the separation of control and data is a standard pattern in digital hardware design.
Thanks, David!
Sources
Chip Overclock, "Separate But Equal", January 2007
Monday, January 22, 2007
Separate But Equal
The first time I ran across it was when I was writing device drivers for real-time applications on PDP-11s (and later, LSI-11s) starting back in 1976. I had written drivers for things like A/D converters, parallel and serial I/O boards, real-time clocks, and punch-card readers. Those devices all followed a common pattern: you had a memory-mapped status register and a memory-mapped data register, and it was the responsibility of the driver to handle the movement of data to and from the device and memory.
It wasn't until I started writing drivers for direct-access storage devices like hard disks and floppy drives that the young Mr. Overclock first encountered direct memory access (DMA). DMA was great: you just programmed in some parameters in some status registers, and the hardware itself took care of moving data to and from memory. The only interrupt you received was when the entire I/O transfer was complete.
I know y'all laughed when I said I wrote device drivers for cards readers. But see, I've written a lot of device drivers for a lot of platforms since then, and I can tell you this: those card readers are still the most challenging devices I've ever worked with, before or since. You set the read bit in the status register, the blowers and motors on the reader spun up, and a punch card flew through the read head. You got a fast stream of interrupts, one for every one of eighty columns, and maybe or maybe not another interrupt for end-of-card, depending on the timing. You either serviced every one of those interrupts, or you lost the data. There was no backing up and trying again. If you're goal was, as mine was, to run the card reader at its full rated speed, this took some clever programming, not to mention some careful planning of interrupt priorities for the other devices in the system. (It wasn't until years later that I wrote drivers to handle analog trunks and stations, in Java of all things, for a telephony product, that I encountered devices with these kind of no-compromise hard-real-time constraints.)
So I was in a good position to really appreciate the whole idea of DMA, where I just handled the control and something smarter, or at least a whole lot faster, then my code took care of the data.
Through my wanderings I somehow ended up at the National Center for Atmospheric Research, a national lab in Boulder Colorado. NCAR had a floor full of supercomputers (mostly Cray Research machines at the time). It was there I learned that when you have a lot of supercomputers, you have a lot of data. Petabytes of data. Having a lot of MIPS and FLOPS lying around isn't useful unless you have a lot of MBPS available too. This is where the NCAR Mass Storage System (MSS) comes in. The NCAR MSS is hierarchical storage system that manages a huge disk farm, several robotic tape silos, and (at the time) a vast backing store of offline tape cartridges.
The NCAR MSS Group is credited with inventing the idea of the separation of control and data as it applies to mass storage. Control messaging was carried among a distributed farm of servers and adapters and their client supercomputers over LAN technologies that most of us would recognize, typically Fiber-Distributed Data Interface (FDDI) or some form of Ethernet. The ancillary equipment handled the I/O setup, including any necessary staging of data and mounting of tapes, and it wasn't until the gigabytes were ready to fly that the supercomputers were re-engaged. Then those gigabytes streamed over specialized data paths using (again, at the time) more exotic and specialized I/O-centric technologies like High Performance Parallel Interface (HIPPI) and Fibre Channel (FC).
There were a lot of good reasons for this separation. The LAN technologies at the time tended to be broadcast media. The I/O technologies were switched, point-to-point channels. The LAN technologies had small physical layer data segments which were not optimal for streaming a lot of data really quickly, but was good for moving small payloads between widely distributed endpoints. The I/O technologies had high initial connection setup latencies, but were very high performance once the data stream began. The LAN technologies were cheap. The I/O technologies were expensive. The separation of control and data was a physical-layer artifact. (I've been told that technologies like Gigabit Ethernet have muddied the waters in this respect since I worked at NCAR.)
The NCAR MSS was using the available technologies in the way in which they were intended, and reaping the performance and economic benefits. It wasn't a great leap to notice the common pattern between this architecture at NCAR, where it was writ large, and my earlier work developing tiny little device drivers in assembly language.
It wasn't until I joined Bell Laboratories and entered the domain of telecommunications that I discovered that the telephony folks had also faced, and solved, this same problem. Digital telephony trunks were bundles of sixty-four kilobit per second bearer channels controlled by a single signaling channel of the same bandwidth. Protocols like Integrated Services Digital Network (ISDN) controlled the setup and release of telephone calls over the signaling channel, called the Delta-channel or more commonly the D-channel, while the eight-bit voice samples of each telephone call were transferred over one of many Bearer- or B-channels.
In ISDN, the D-channel and its associated B-channels are typically carried over the same physical path, such as bundles of copper wires in a single T-1 trunk, or a single Synchronous Optical Network (SONET) optical fiber, although I have also seen cases where they were split off into separate physical paths as well. But generally, the separation of control and data in ISDN is a transport-layer artifact.
There were very good reasons for this as well. Separating control (signaling) and data (bearer) made economic sense since the messaging bandwidth of a single D-channel was adequate to control tens of B-channels. The data streams from the D-channel and its B-channels were routed to very different endpoints, the D-channel messages to a control element that would be a conventional processor, the B-channel voice samples to a digital signal processor (DSP) or a time division multiplexed (TDM) circuit switch or backplane. And that fact that the asynchronous signaling messages were carried out-of-band, instead of in-band along with the synchronous voice traffic, decoupled these two streams so that the voice streams could be controlled more efficiently, and it reduced jitter in the voice streams as well.
So it is no surprise that when Voice over IP (VoIP) came along to replace traditional TDM-based telephony, the IP world adopted the same pattern. The Session Initiation Protocol (SIP) is a commonly used application-layer protocol for setting up and releasing VOIP calls. But stations and trunks that speak SIP pass their application data (for example, voice samples) using some other application-layer protocol specialized for that purpose, typically the Real-Time Transport (RTP) protocol.
In SIP, the separation of control and data is mostly an application-layer artifact. The SIP and RTP streams use the same network protocol, IP, and the same physical layer, such as Ethernet, although the transport-layer TCP or UDP packet streams may be handled quite differently in intermediate routers because of their very different quality of service requirements.
The separation of control and data is a common architectural pattern that seems to be reinvented in every problem domain in which I find myself working. In a recent project implementing business process automation of high-end communications capabilities, the control messages were carried over an enterprise service bus, but the data (which from the application's point of view was both the SIP traffic and the RTP traffic) was carried over the LAN.
What other examples of this architectural pattern can we find if we look hard enough? And where should we be applying it that we are not?
Important Safety Tip: Enable Ping with Comcast
It took the Overclock household a while to get to this point: the palatial Overclock estate is far enough out in the boonies that Qwest didn't offer DSL, and our development is old enough that the Comcast cable plant wasn't quite up to snuff. But a couple of years ago Comcast upgraded their facilities, which motivated your correspondent to trundle down to the local COMP USA (pronounced, I am told by my friend Tom O'Dell, "com-pu-sa") for some LinkSys gear. Since my SO and I already had notebooks with WiFi radios, in a couple of hours Mrs. Overclock (a.k.a. Dr. Overclock, Medicine Woman) was looking at me in a way you could only appreciate if like us you have been married for over twenty years.
(About a year later she trumped me by buying a TiVo. Without TiVo, life as we know it could not exist. But that's another story.)
For about a year we lived with the internet connection going down about once a month. Just rebooting the router fixed it. The frequency was just below the irritation threshold required for me to take the time to actually figure out what was going on.
But apparently my subconscious was not willing to let it rest. It suddenly occurred to me: I wonder if Comcast is expiring the DHCP lease if the endpoint does not respond to ping? Being the paranoid sort, I had disabled ping in the router configuration, figuring that some camouflage in the dimly lit back allies of the information superhighway might be a good idea.
The DHCP status page on the router didn't seem to indicate a problem, but I took a chance and enabled ping anyway. Our Comcast connection has worked flawlessly for months since then.
WiFi and TiVo. LiFe is GoOd.
Friday, January 19, 2007
Fun Facts to Know And Tell: Java Business Integration
There are times I'd like to claim to be a specialist, but my resume suggests otherwise. For sure, I am not a specialist in web services. So when duty called for me to become the local ServiceMix and JBI expert, I approached it more from a real-time messaging perspective than a Web Services perspective. This article describes some of the surprises I had in store.
Most of these fun facts to know and tell are about JBI and would probably apply to any ESB that was JBI-compliant. A few of them are about ServiceMix specifically. None of them have anything to do with the actual product that we developed.
Our development began with ServiceMix 2.0, progressed to ServiceMix 3.0 for our commercial release, and proceeded to test with ServiceMix 3.0.1. The core portion of ServiceMix that implements the JBI spec was quite stable and usable in the releases we chose to use, particularly considering the relative immaturity of the product. Like all active open source projects, ServiceMix is a moving target. Depending on what specific Subversion trunk of ServiceMix you decide to use, your mileage may vary.
In the sections below I cite the appropriate portions of the JBI spec for your reference.
Service Engines vs. Binding Components [4.3, p. 14]
The JBI spec, and most JBI presentations, seem to make a big distinction between a JBI bus component which is a Service Engine (SE) versus a component which is a Binding Component (BC). I find this to be mostly a conceptual distinction, depending on how you choose to think about the components that you develop for your application. The JBI API is exactly the same no matter what you call your component. (Former colleague Rick Block suggested that early in the development of the JBI spec this was not the case.)
I think of an SE as a component that encapsulates the implementation of some service and accepts requests and generates responses for that service only over the ESB, using JBI normalized messages inside JBI message exchanges. It may maintain internal state about the service it provides. An SE might front-end a database used to maintain inventory. It might be a JBI wrapper for a legacy service like weather information. It might be an abstract interface to a vastly complex system sitting outside the ESB. It might do something dirt simple like return the time of day from a central clock.
I think of BCs as purely protocol converters that serve as gatekeepers between the outside world and the ESB. A BC serves as a proxy between an external client that sits outside of the ESB and an SE that sits on the the ESB. If the BC maintains state, it is only about the transactions that are in progress between the outside client and the SE, or, in JBI-speak, about message exchanges that are in the active state. Once a Message Exchange reaches a terminal state like done or error, the job of the BC for that particular transaction is completed. A BC might convert between external clients using the Java Message Service (JMS) and internal services using JBI messaging. A Simple Object Access Protocol (SOAP) router on the ESB that receives SOAP requests from external web services clients (like your web browser), turns them into JBI message exchanges, and routes them to an SE on the ESB, is also a kind of BC.
This is fuzzier than you might think, and discussions over whether a component is an SE or a BC can quickly devolve to how many angels can dance on the head of a pin. For example, suppose you had a component that incorporated a SIP protocol stack to implement complex VOIP control capabilities. You can think of this component as a container of all the SIP endpoints with which it will ever communicate, and the service it provides is a set of high-level communications capabilities. In this sense, it is an SE. You can also think of it as a complicated protocol converter between SIP endpoints that speak only the SIP protocol and SEs on the ESB which speak only JBI messages. In this sense it is a BC.
In general, don't agonize too much over whether your component is a Service Engine or a Binding Component. In practice, the distinction isn't terribly important from a purely JBI point of view, although it may be important from an architectural point of view for your application. Separating BC and SE functionality makes your application more flexible if you can design it such that just by adding a new BC you can expose your SEs to a whole new domain of external clients, like SNA systems.
Services vs. Endpoints [5.1.1, p. 21]
I think of JBI as a fractal collection of containers: the JBI container (in our case, ServiceMix) contains JBI components. Each JBI component contains zero or more services. Each service contains one or more endpoints. Messages on the JBI ESB may be addressed to a service or to an endpoint within a service.
For example, a time service on the ESB might be a single service with a single qualified name (a service name within a specific namespace). But that time service might expose multiple endpoints, one endpoint for any of several NTP servers, and each with an endpoint name unique within that service. Requests for time could be addressed to the time service itself, in which case the selection of which NTP server to use is left up to some other mechanism, or it could be addressed to a specific endpoint, so that the time would be provided by a specific NTP server. The JBI spec says that if you addresses a request to a service, the choice of endpoint is implementation dependent; in this case, it really means your implementation when you code the service.
Because every service must have at least one endpoint, I sometimes use the terms service and endpoint interchangeably, but as we will find out when we discuss routing, the distinction is important.
Providers vs. Consumers [5.1.1, p. 21]
Given my background in real-time messaging, I really struggled with this JBI nomenclature, and it turns out to be vitally important to understand the distinction. I am used to thinking of messaging components has having the roles of producer and consumer, as in a message producer is that which sends the message, and a message consumer is that which receives the message. These are strictly roles played during a specific message exchange, so a single messaging component may be both a producer and a consumer at different times.
In JBI, provider and consumer are also roles played during a specific message exchange, as in a service provider provides a service, and a service consumer makes a request of the service provider. In this context, the service consumer may never receive a message at all (other than, in JBI, getting a done back from the service provider to indicate the request was successfully received) and a service provider may never send a message (other than sending the done, which carries no payload). As above, a single component may be both a service provider and a service consumer. In receiving a request from a consumer it may act as a provider, but in servicing that request it may act as a consumer by making requests of other providers. Once I got my head around this fact, things got a little easier.
A service provider must activate an endpoint on the ESB. The endpoint is one means through which messages exchanges are addressed to the service provider. (More on this later.) Once it does this, it has exposed its endpoint to receive message exchanges from service consumers. If a component does not activate at least one endpoint, it cannot act as a service provider, because service consumers have no way of addressing message exchanges to it. If a component never acts as a service provider, it need never activate an endpoint on the ESB. It can still address message exchanges to service providers, send those exchanges on the ESB, and receive responses. But no other component on the ESB can originate a new message exchange and send it to that consumer because the consumer is effectively invisible on the ESB.
The pipe through which components on the ESB send and receive (or accept, in JBI-speak) message exchanges on the ESB is the delivery channel. Think of the delivery channel as a socket and you won't be too far off. The JBI container gives each component on the ESB a delivery channel. When a consumer sends a message exchange to a provider over the delivery channel, the message exchange only contains the address of the service and endpoint exposed by the provider that were used by the consumer to address the message exchange. There is no standard JBI mechanism for the provider to query what consumer sent the message, and in fact it isn't even meaningful to ask the question. Since the consumer isn't required to activate an endpoint on the ESB, it has no address to which new message exchanges to it may be addressed.
Of course, the JBI container knows to whose delivery channel to send any response the provider may generate, but this is hidden inside the implementation. And of course, the consumer may also play the role of provider, expose an endpoint on the ESB, and send the address of this endpoint as part of the payload it sends to in its request. But from the point of view of the JBI container, this is the address of a provider on the ESB, not of the consumer that send request.
Because there is one delivery channel per component, but a component may expose many services and endpoints to which message exchanges may be addressed, I like to think of components as sending requests, and as the endpoints inside the components as receiving requests. But this is just my way of thinking.
Routing [5.1.7, p. 26]
Message exchanges may be addressed three ways: to a service (implicit routing), to an endpoint within a service (explicit routing), and to an endpoint reference (dynamic routing). When I described my example of a time service fronting several NTP servers, it probably occurred to you that there might be perfectly legitimate reasons to do either implicit or explicit routing, and you would be quite right.
Suppose your application was a hybrid of a Service Oriented Architecture (SOA) and an Event Driven Architecture (EDA). A client component acting as a consumer could make requests of a server component acting as a provider, and those requests would be addressed using implicit routing, that is, with just the service name of the provider. If the request for was for an event stream, the client component could provide a service and a specific endpoint to which those future events would be addressed by the server component. When the server component sends an event it would be acting as a consumer, sending an event to the client who was now acting as a provider, and those events would be addressed using explicit routing, that is, using both the service and the endpoint. (I never said this was simple, but it does illustrates how important it might be to get your nomenclature about roles straight in your mind.)
For example, a client component might sent a request to the server component "Lassie" to "look after Timmie", and pass along the service "Mom" and the endpoint "in the kitchen". In the future, should circumstances warrant, "Lassie" would send events like "Timmie fell in the well" to "Mom" "in the kitchen". The events could have been addressed just to "Mom", but this would require the underlying implementation to resolve the specific endpoint, searching "in the barn", "in the kitchen", and "in the bedroom", and it might pick the wrong one. (For purposes of compliance with any hypothetical non-disclosures, any resemblance of "Mom" and "Lassie" to an actual product is purely coincidental.)
Dynamic routing is a completely different animal, which is discussed in the following section.
Dynamic Endpoint Reference [5.4.4, p. 36] [5.5.4.1, p. 50, Listing 4]
Dynamic routing allows a component to be addressed using a fragment of XML that complies with a schema documented in the JBI spec. The schema is specified using Relax NG. This allows components to specify "call back" addresses for future message exchanges as part of, for example, their XML payloads inside of a JBI normalized message. This could have been used in the example above with for "Mom" to have specified the service and endpoint to which "Lassie" would send future events. Such addresses in XML form are referred to as endpoint references (EPR), and the mechanism through which such EPRs are resolved is called dynamic endpoint reference.
Here is an example of an XML fragment that might be an EPR.
<jbi:end-point-reference
xmlns:jbi="http://java.sun.com/jbi/end-point-reference"
xmlns:martin="http://farm.martin.us/jbi"
service-name="martin:Mom"
endpoint-name="InTheKitchen"
>
In this example, the service name is "Mom", the namespace is "http://farm.martin.us/jbi" which is associated purely within the context of this EPR with the XML tag "martin", and the endpoint name is "InTheKitchen".
If your application supports dynamic routing and EPRs, it is up to you to write the code to parse this XML and determine what service and endpoint, if any, it identifies. Dynamic endpoint reference resolution isn't rocket science. A component gives an XML fragment to the JBI container and asks what service and endpoint it identifies, expecting to get either null or a service endpoint object in return. ServiceMix simply queries every component on the bus that has activated endpoints, passing it the same XML fragment, in effect asking "Is this yours?", and expecting either a null or a service endpoint in return. I wrote a simple XML parser using the Java Document Object Model (DOM) framework to do just this.
You might think that would be the end of it. But you would be wrong.
It turns out there are a lot of third-party JBI components out there that expect to use dynamic routing and EPRs, but whose XML fragments do not conform to the JBI spec. One such component is Intalio's implementation of the Business Process Execution Language (BPEL). (We reported this as a bug, so in all fairness this may have been fixed by now.) Intalio chose to format their service name and namespace using a very commonly used but non-standard form known as the James Clark format. Here is an example.
<jbi:end-point-reference
xmlns:jbi="http://java.sun.com/jbi/end-point-reference"
service-name="{http://farm.martin.us/jbi}Mom"
endpoint-name="InTheKitchen"
>
Note that the namespace is part of the value of the service name attribute and not an XML namespace at all. So I modified my DOM parser to handle this format as well.
And you would still be wrong.
It turns out that ServiceMix itself exposes a managed bean (mbean) for each component on the ESB through which a query can be made and a EPR string identifying that component returned. And indeed, this EPR does not conform to the JBI spec either. Here is an example.
<jbi:end-point-reference
xmlns:jbi="http://java.sun.com/jbi/end-point-reference"
xmlns:martin="http://farm.martin.us/jbi"
jbi:service-name="martin:Mom"
jbi:endpoint-name="InTheKitchen"
>
Note that the "jbi" namespace XML tag is prepended to the service name and endpoint name attributes. Maybe this is a valid variation (my XML books suggest it is not), but the DOM framework does not recognize it. So once again I modified my DOM parser to handle this variation as well.
Thus far things are quiet on the EPR front.
send vs. sendSync [5.5.2.1.3, p. 43]
Initially, the messaging operations send versus sendSync seemed pretty straightforward. The former is asynchronous: a component could send a message exchange and go on its merry way doing other work, and any further response regarding that message exchange would be handled in the future when the exchange arrived on the component's delivery channel. The latter is synchronous: the component is blocked on the sendSync until the far end responds in whatever way is appropriate for that message exchange. These were familiar messaging patterns to the real-time developer in me.
I have to admit, as a long-time real-time developer, I don't believe in synchronous message passing, even though it is the backbone of a lot of web services messaging frameworks like Axis. In thirty years of building message-based systems, I haven't found synchronous messaging to result in scalable, high-volume, robust systems, although it sure does simplify the coding. Having a really cheap, scalable thread implementation makes it more palatable. Several Java-based frameworks and containers help out in this regard, too. But synchronous messaging makes me nervous in a deadlock is really possible unless we are very very careful kind of way. Since I have a track record for producing successful, mission critical, 24x7 products, the thought that one design mistake might cause the whole shebang to go toes up makes me a little nervous.
But with JBI, I discovered one really good reason to use synchronous messaging. When using asynchronous messaging, JBI does not require that order is preserved, and ServiceMix certainly does not guarantee it. That is, when using asynchronous messaging, the order in which one component sends message exchanges may not be the order in which the provider receives them.
The UDP datagram socket developer in me said "Oh, yeah, sure". The real-time embedded messaging developer in me clutched his chest, seized, and has not been heard from since. I had made the mistake of thinking of JBI as a real-time messaging system, like any number of others I had used over the years. But while JBI looks like a duck, it sure as heck doesn't quack like a duck. No doubt about it, this is my fault completely, but I'm not exaggerating when I tell you that when I figured out what was happening in my extensive ServiceMix junit regression test suite, I broke out in a cold sweat and probably turned pale.
Some digging into the JBI spec, more digging into the ServiceMix source code, and a brief e-mail discussion with Guillaume Nodet, one of the lead ServiceMix developers, verified that [1] there is no requirement in JBI that order is preserved, and [2] ServiceMix's use of a pool of threads for handling asynchronous delivery pretty much insures that order is a function of mostly-non-deterministic thread scheduling. I found this ultimately understandable, but very counter-intuitive.
WSDL Binding [5.5.6.1, p. 57]
Service providers on the ESB may choose to describe their messaging interface in the form of a service description. A service description is an XML document written using the Web Services Description Language (WSDL).
WSDLs, as such service descriptions are casually referred, have an abstract part and a concrete part. The abstract part describes both the syntax and the context of any messages used by the service provider. The concrete part describes the nuts and bolts of where to find the service provider and what protocol to use to talk to it.
Service providers may actually have more than one interface that they expose. They may expose one interface to clients external to the ESB. This interface describes a traditional web service and is only reachable through a BC. Service providers may expose another interface to service consumers on the ESB. This interface is available only on the ESB and may expose a much richer set of capabilities available only to other components on the ESB.
The neat thing is, both interfaces can be described in a single WSDL. Such a WSDL would have a single abstract part, but two (or more) concrete parts. One concrete part might specify a SOAP binding so that a SOAP router on the ESB could serve as a proxy for the service provider and route SOAP requests from external web services clients across the ESB to the service provider and handle sending responses back. A second concrete part could specify a JBI binding which would tell internal JBI components, including the SOAP router, how to use the internal interface.
When the JBI container parses the service description to extract the interfaces and JBI services and endpoints exposed by the service provider, it should (as ServiceMix does) only extract those interfaces, services and endpoints which are associated with the JBI binding. Any interfaces, services and endpoints associated with other bindings are purely a private matter between consenting components such as the service provider (an SE) and the SOAP router (a BC).
I didn't personally need to make use of this feature, but I did write code using the wsdl4j tool kit to extract the JBI-bound interfaces from the service descriptions that other developers provided for their components. So it is likely that others may have more to say on this topic.
Service Endpoints [ServiceEndpoint, p. 192]
A service endpoint is an object which contains, among other things, the qualified name (service name plus namespace) of a service, and the endpoint name of an endpoint within that service, for an endpoint which has been activated on the ESB. It is not unusual for a component to have to keep track of the namespace, service name, and endpoint name of an endpoint which has not yet been activated, for example, based on configuration information gleaned from a properties file. You may be tempted to create your own class that conforms to the JBI service endpoint interface just for the purpose of managing such information. But do not be deceived into thinking that just because your class conforms to the service endpoint interface that you can pass instances of your class with the JBI container as service endpoints.
The spec is quite clear on this: the JBI container is the only source of genuine service endpoints, and they are only doled out when an endpoint is newly activated by the container on behalf of the component. Just as only God can make a tree, only the JBI container may make a service endpoint. The underlying reason is of course because the JBI container implementation (ServiceMix) maintains private data inside its own implementation of the service endpoint interface to which you are not privvy.
I did in fact make a class just to keep track of endpoints that were defined but not yet activated, but I deliberately did not implement the JBI service endpoint interface, even though doing so would have been very convenient, just so I couldn't accidentally try to pass an instance of this class into one of the JBI APIs.
That's the current brain dump on JBI. More later as it occurs to me.
Sources
Ron Ten-Hove, Peter Walker, Java Business Integration (JBI) 1.0, JSR-208, Sun Microsystems Inc., August 2005
Peter Walker, private communication, 2006
Guillaume Nodet, private communication, 2006
Thursday, January 18, 2007
Traffic Management and System Scalability
The "Big Dig" was hardly Boston's first controversial road construction project. That historic city's transportation and urban planners have for decades had to deal with the fact that many of the downtown streets are barely wide enough for two lanes. This makes building exit ramps from interstates and other high volume highways into the oldest portions of Boston really problematic. How to you handle the traffic volume of Interstate 95 funnelling down to a cobblestone street that is two hundred years old? How do you widen that street when the real estate on either side of it is going for seven figures? It's not just a traffic management problem, it's an economic issue.
Yet we take exactly the same approach to designing large distributed systems that have longish lifespans. We try hard to make our systems modular so that they can be easily upgraded. We try to make them scalable so that they can be easily expanded. We try to make it possible to incorporate new technology as it becomes available. This is particularly true with systems which have high capital costs and for which customers expect long term amortization. Traditional telephone systems come to mind, but this applies equally well to things like power plants, air traffic control systems, and just about anything built for the U.S. Department of Defense, like aircraft carriers.
Traditional telephone systems are among the most complex distributed systems in the world. A typical design (where "typical" here means "one that I've worked on") has call control centralized in a single processor, a network of communication links going to remote cabinets, and each cabinet containing tens of boards with embedded processors that control everything from digital trunks to the telephone on your desk. The individual embedded processors not only talk to the central processor, but to other embedded processors in the same cabinet, to embedded processors in different cabinets, and to embedded processors in other systems owned by other people. There is a whole lotta messaging going on. The mere act of picking up your telephone handset results in dozens of messages being interchanged between at least a handful of processors across several communications links. As my friend and former colleague Ron Phillips says, "Every time I pick up a phone and get dial tone, I know a little miracle has occurred."
So sometime in the 1980s you start out with central processors capable of running at a speed of a few million instructions per second. You have communication links capable of bandwidths of tens of kilobits per second. You have embedded processors at the ends of those links that run a few tens of thousands of instructions per second. And there is a strong motivation to use cheap processors in the embedded portions of the systems because that's what the customers have to buy in quantity. There are at least tens of such processors in each cabinet, and at least tens of such cabinets in a system. One really big traditional telephone system may support tens of thousands of telephones, have cabinets spread across state lines, and effectively have hundreds, if not thousands, of processing elements.
Things work pretty well, because you architect the system initially with a good balance of processor speed and bandwidth. Because the central processor was so slow, there was a built-in natural limit to how many messages it could pump out. Because the communications links were so slow, there was a natural limit to how many messages from various sources could be funnelled down to a single embedded board.
Time passes. You upgrade the central processor with one that is a hundred times faster than its predecessor. You upgrade some of the communication links (but not all of them) to technologies like Ethernet that have a thousand times more bandwidth. You install some (but not all) new embedded boards that have faster processors on them.
A few more years go by. You now have a central processor that can execute a billion instructions per second, thousands of times faster than what you had twenty years ago, in the same system. You replace some (but not all) of the 10mb/s Ethernet with 100mb/s Ethernet. Maybe you didn't even want or need to replace the network technology, but that's what everyone is using now, including your customers. You upgrade a few of those embedded processors (but not all of them) with faster boards, but it is simply not cost effective to re-engineer all of those legacy boards with newer technology, and even if you did, your customers would rebel at the thought of a forklift upgrade.
It's today. Your central processor runs several billion instructions per second on each of several cores. No one even makes your original processor anymore; the closest thing to it is in your wristwatch. You customers want to use gigabit Ethernet, because that's all they run anymore. You start using VOIP, which requires a volume of control messaging like no one has ever seen in the history of telephony. You've expanded your maximum available configuration to many many more cabinets, and customers have systems that span national boundaries.
And you've still got a lot of those original embedded boards, still struggling to deliver dial tone, while getting fire hosed with messages at a rate they never imagined they would ever have to deal with. Even if they don't receive more messages, they receive the same messages but a lot faster, in a burst. The natural throttles on the rate of messaging, the limit on the speed of the central processor, the limit on the speed of their peers, the limit on the speed of the communication channels, are all gone.
And even if those processors can handle the higher rates, they start to behave in funny ways. Before they could do all necessary message processing in their spare time between doing actual work, like making phone calls. Now, when a burst of messages comes through, they have to drop what they are doing and deal with the burst, or else risk losing a crucial control message when the next burst comes in.
In many ways, your system is a victim of its own success: of its longevity in the marketplace, its architecture that allowed customers to expand it incrementally, and the fact that, gosh darn it, people just like it.
Back in May, I talked about how different technologies were on very different growth curves. Microprocessor speeds double about every two years. Memory density doubles about a year and a half. Network bandwidth increases by a factor of ten every decade.
When you upgrade your system, you use what's commercially available, but what's commercially available isn't all growing at the same rate. And some of your system, like those embedded processors, aren't growing at all. What started out as a well behaved system with a balanced architecture becomes unbalanced and starts to, well, get weird. The pattern of behavior of the real-time system as a whole is an emergent behavior, not one that you deliberately architected. And worse, your management starts to wonder what the heck is going on, because, well, how could those old embedded boards be a problem, because with them you haven't changed anything.
No, Bucky, you've changed everything.
This is not a niche issue. Back in May, I talked about the total cost of code ownership. In that article I cited studies that have shown that 67%, a full two thirds, of the total life-cycle cost of software development is code maintenance, and of that 67%, half of that is either perfective (improving correctly working code, not fixing bugs) or adaptive (changing to meet to new requirements). A huge portion of the cost of software development is the modifications and testing necessary to adapt to a changing environment. This has been referred to as software aging: software gets old and cranky and quits working, even though it hasn't been changed. But the context in which it runs has changed.
Traffic management and rate control are tools in the quest for a balanced, scalable, deterministic system. By controlling the rate at which you throw events around, you maintain control over the engineering of your system. Not your customer by deciding what kind of Ethernet to use. Not your server vendor by deciding what kind of Pentium processor to ship you. Not your service provider by moving your communications link to a faster technology. Traffic management and rate control gives you the capability to decide how your system is going to behave, by letting you determine when you turn up the dial, and by how much. It allows you to take control over the emergent behavior of your real-time system.
I've used large scale telephone systems as an example, but that's just an obvious one. I've also worked on distributed systems that controlled real-time devices to shoot a half million frames of photographic film a month. On mass storage systems that delivered gigabytes of data on a daily basis to a room full of supercomputers. On embedded systems where a single board has thirty different processing elements and dozens of communicating real-time tasks. On service oriented architectures that placed a dozen interacting components on a single enterprise service bus. All of these systems face these kinds of issues.
I'm betting yours do too.
Sources
D. Parnas, "Software Aging", Proceedings of the 16th International Conference on Software Engineering, IEEE, May 1994
Chip Overclock, "Traffic Contracts", January 2007
Chip Overclock, "The Generic Cell Rate Algorithm", January 2007
Chip Overclock, "Rate Control Using Throttles", January 2007
Chip Overclock, "Traffic Management", December 2006
Chip Overclock, "It's Not Just About Moore's Law", May 2006
Chip Overclock, "The Total Cost of Code Ownership", May 2006
Wednesday, January 17, 2007
Traffic Contracts
ATM offers several classes of service, but the three that I dealt with exclusively in my work were unspecified bit rate (UBR), constant bit rate (CBR), and variable bit rate (VBR). Each class of service has its own particular form of traffic contract.
The UBR contract is really no contract at all. The event stream makes no promises about how it will behave, and the network makes no promises that any of your events will get to their destination. If the network gets congested, the events in the UBR streams are the first to go. UBR event streams get no respect, but they are also the cheapest to lease from the network service provider, since everything is strictly best effort. Most IP traffic tunnelled over an ATM network is sent UBR. UBR is effectively what you are using if you are not policing or shaping traffic at all in your application.
The CBR contract is the simplest real traffic contract. The event stream guarantees that it will never send events above a specified rate, more or less, and the network guarantees that it will not drop the events in the stream as long as the stream abides by its contract. The problem with CBR contracts is that they are expensive. In order to meet the traffic contract, the network has to reserve the peak bandwidth specified in the traffic contract at all times. Customers paying for CBR service pay for the full bandwidth, 24x7, whether they use it or not. The only applications which really make sense for CBR service are highly time-synchronized event streams of a constant rate, like uncompressed voice, which is exactly what we used it for in the applications on which I worked. Voice encodings like G.711 send an eight-bit voice sample eight thousand times a second whether or not you are speaking. In ATM, because CBR traffic is often very jitter sensitive it typically is transmitted through the network at a very high priority, something like a Presidential motorcade with police escort running all the red lights.
Somewhere in between UBR and CBR is VBR. VBR is a complex traffic contract in which the event stream guarantees that it will never send events above a specified rate, that it will on average send at a typically much lower rate, and if it does send at the higher rate, it will limit the number of events it will send before returning to sending at a lower rate. VBR event streams get higher priority than UBR streams, cost less than CBR streams, and are allowed to occasionally burst at a higher rate when the situation demands it. In the application I worked on, we sent control messages through VBR streams. The control messages got priority over any UBR (typically IP) traffic, yet were guaranteed not to interfere with the CBR voice traffic. If some heavy-duty messaging was required, the VBR stream could briefly emit bursts of messages at a higher rate to clear its buffers. A typical application for the VBR class of service is compressed video.
The parameters of both the CBR and VBR traffic contracts are mapped into the parameters increment (i) and limit (l) of the Generic Cell Rate Algorithm (GCRA), which I described in such detail in my article by the same name that the reader's eyes probably bled. The literature uses the notation
GCRA(i,l)
when defining this mapping.
The traffic contract associated with the CBR class of service has two parameters: the peak cell rate (PCR) in cells per second, and the cell delay variation tolerance (CDVT) in microseconds. Of course, when you adapt the CBR traffic contract to your own application, you'll use whatever units you find convenient. The PCR places an upper bound on the bandwidth used by the event stream. The CDVT provides some slack in the contract to accommodate jitter.
The CBR traffic contract is simply mapped by computing the increment as
1/PCR
because the inverse of the rate is the inter-arrival time, which is what the increment is, and using the CDVT directly as the limit. This is represented as
GCRA(1/PCR, CDVT)
in our notation.
Of course we don't literally compute the reciprocal of the PCR, because that gives us a result in units of a fraction of a second. We divide it into the number of ticks per second in whatever time units we are using. If we are using microseconds in our implementation of the GCRA, then we compute
1 000 000/PCR
to get the increment in microseconds. The choice of time unit for a clock tick, or equivalently the choice of frequency, of our GCRA implementation depends on several factors, for example, the dynamic range of the data type we are using to store time durations in ticks, the maximum rates we need to support, the granularity of the underlying clock in our application, and the maximum time intervals we need to compute. I have implemented GCRA and other rate control algorithms using milliseconds, microseconds, nanoseconds, and CPU clock ticks, and used both 32-bit and 64-bit arithmetic. There is no right answer for all applications, and frequently the correct choice is a compromise that may limit the rate control algorithm in terms of its precision, accuracy, or performance.
The traffic contract associated with the VBR class of service has four parameters: the same PCR and CDVT as for CBR, plus the sustained cell rate (SCR) in cells per second, and the maximum burst size (MBS) in cells. The SCR is the long term average rate of the event stream. The MBS is the largest number of events the event stream can burst at PCR. The VBR contract uses two GCRAs, one that enforces the PCR and the CDVT, and one that enforces the SCR and the MBS. The VBR event stream must conform to both contracts simultaneously.
The first GCRA
GCRA(1/PCR, CDVT)
is the same as the contract used for the CBR class of service. The increment for the second contract is easy: it's the recipricol of the SCR, or
1/SCR
just as with the PCR. The event stream isn't restricted to just emitting at either PCR or SCR. It can emit events at any rate from zero to PCR. This is accommodated by computing a burst tolerance (BT). The BT is computed as
(MBS - 1) * ((1/SCR) - (1/PCR))
where
((1/SCR) - (1/PCR))
is the difference in inter-arrival times between events emitted at SCR and events emitted at PCR, and
(MBS - 1)
is the number of intervals between MBS cells. (There's actually a more complicated explanation for this formula that has to do with the fact that BT is not computed from MBS but rather vice versa, but let's stick with this simpler version for now.) The VBR contract is represented as
GCRA(1/SCR, CDVT + BT)
or equivalently
GCRA(1/SCR, CDVT + ((MBS-1) * ((1/SCR) - (1/PCR))))
where the quantity
CDVT + BT
is the limit. Why do we add CDVT? Because it is perfectly valid for SCR to equal PCR. (I remember vividly having to explain this at length to an engineer for another ATM switch vendor, and why this wasn't equivalent to the CBR class of service in typical ATM chip sets. I may have climbed on my soapbox and waved the TM 4.0 spec around. I apologize if you are reading this.) If this is the case, then unless we add CDVT, the second GCRA becomes
GCRA(1/SCR, 0)
or, because PCR==SCR, equivalently
GCRA(1/PCR, 0)
which is a more restrictive traffic contract than the first GCRA, which is clearly not our intent. What we really want is for the PCR==SCR case to degenerate to
GCRA(1/PCR, CDVT)
and adding the CDVT to the BT accomplishes this. (I didn't find the TM 4.0 specification to be terribly clear on this. It wasn't until I was debugging my traffic shaping firmware with a broadband network analyzer that cost more than I made in a year that I figured this out.)
The Digital Aggregates Buckaroo open source library of Java classes implements the GCRA as a Throttle in the class GenericCellRateAlgorithm with a tick of one microsecond, and that class can be used directly if you care to figure out your own increment and limit parameters. The class CellRateThrottle uses one or two GenericCellRateAlgorithm objects to provide either a CBR or a VBR traffic contract, depending on what constructor you use. These classes are useful for rate controlling in terms of events per second, for example messages per second, packets per second, and the like.
But if you want to emit variable length packets and rate control in terms of bytes per second, calling the methods in these classes for every byte in a packet is not efficient. The classes BandwidthAlgorithm and BandwidthThrottle extend the prior classes by allowing you to specify a traffic contract in terms of bytes per second, using a tick of one nanosecond, and indicate how many bytes you are going to emit in each packet. Because you are presumably handing your packet to some underlying operating system, or at least device driver, to emit at its own rate, these Throttles exhibit burstier behavior than their more ATM-centric counterparts, but the long term effective emission rate is correct.
The Digital Aggregates Desperado open source library of C++ classes has similar tools, but the Java versions represent a couple more years of pondering on my part. I consider the C++ classes to be deprecated, and will sooner or later port the Java implementations into the C++ library.
I believe that rate control becomes more and more crucial the bigger, more complex, and the more distributed your real-time system becomes. Even if you are implementing, say, an enterprise service bus (ESB) in a single JVM, applying traffic contracts to the various components that may interact with one another is an important step in delivering a scalable, reliable, and predictable system that can be debugged in the lab, troubleshot at a customer site, and understood by others. It is even more important if you are building a true distributed system where many traffic sources may be aggregated into a single traffic sink.
Sources
Buckaroo, Digital Aggregates Corp., 2007
Desperado, Digital Aggregates Corp., 2007
Chip Overclock, "The Generic Cell Rate Algorithm", January 2007
Chip Overclock, "Rate Control Using Throttles", January 2007
Chip Overclock, "Traffic Management", December 2006
N. Giroux, et al., Traffic Management 4.0, tm-0056.000, ATM Forum, April 1996, pp. 62-67
J. L. Sloan, "ATM Traffic Management", Digital Aggregates Corp., August 2005
Tuesday, January 16, 2007
The Generic Cell Rate Algorithm
TM 4.0 offers two different explanations of how the GCRA works: the leaky bucket and the virtual scheduler. The two explanations are exactly equivalent, but to this day I really do not like term "leaky bucket", because for lots of folks it seems to suggest that there is some kind of data buffering going on when there isn't (or if there is, it isn't the GCRA doing the buffering). So I prefer to use the term "virtual scheduler", even though when I implement the GCRA, I tend to use the leaky bucket flowchart and terminology from TM 4.0.
As I mentioned briefly in my article "Rate Control Using Throttles", the GCRA has but two parameters: the increment and the limit. The increment is the expected inter-arrival time (IAT) between two successive events in an event stream. If you ideally want to transmit one hundred events evenly dispersed over every second, then you are describing a 100Hz event source whose emissions have an IAT of 1/100th of a second. Your increment would be 0.01 seconds. TM 4.0 specifies a time granularity of one microsecond as a standard for ATM traffic contracts, so in that case you would represent that increment as (1 000 000 / 100) or 10 000 microseconds. Of course, you can implement the GCRA in whatever time units you find appropriate for your application: nanoseconds, microseconds, milliseconds, or even some convenient time interval based on the frequency of your CPU clock. For this reason, I like to talk about time in the GCRA in units of ticks. So given an increment of i, the GCRA expects an ideal event source to emit an event every i ticks.
The increment assumes a smooth flow of data. Life is seldom that simple. Neither software, nor firmware, nor even hardware is capable of emitting events with perfect precision (although the old TDM-based land line telephone system sure tried). And once events are handled by other components, for example, nodes on a network, additional variation is inevitably inserted. This is especially true when multiple event sources are aggregated together. The aggregator (for example, a network router) can only emit one event at a time onto the aggregated output channel. So it selects an event from one input channel, hence delaying all the pending events on all the other input channels. It may select the next event to emit based on many factors, such as the quality of service parameters of the traffic contracts of the individual event streams, service level agreements with the event stream owners (a.k.a. paying customers), or simply round-robin.
Because small variations in the expected IAT of events, also known as jitter, are inevitable, a traffic policer implemented using the GCRA must provide for some slack in its traffic contract. This slack is the second parameter, the limit. The limit is the total cumulative time that successive events in an event stream may deviate from the expected IAT. The limit has a number of applications. When the GCRA is used for traffic policing incoming event streams, the limit prevents the policer from dropping events which have been jittered but are still within the tolerance of the limit. This allows, for example, a policer to pass otherwise well behaved event streams just because they got a little rowdy once in a while ("Events will be events" as my sainted mother never said). When used for traffic shaping, the limit allows the shaper to emit events in short bursts where the IAT between a small number of successive events is smaller than the specified increment. This allows, for example, a shaper to drain a perilously full memory buffer onto the network instead of dropping events on the floor.
I should mention here that jitter is quite different from drift, which is a more or less consistent, or at least larger, variation in the IAT than is associated with jitter. A frequent source of drift is a lack of a common time synchronization source between network nodes. This means each node has a different sense of the passage of time, kind of like that episode of Star Trek. Or maybe it was Twilight Zone. Anyway, because the nodes are timed to different clocks, the event source node believes it is emitting events more or less on time, while the event sink node may believe that every single event is arriving a little early. As we shall see, a GCRA used for traffic policing events streams on the sink node has limited patience with drift. Limited in fact by the limit. The limit is a time budget, and every single early event uses up more of that budget. Once the limit is exhausted, the traffic police will start dropping events, or at least marking them for possible deletion downstream, which is a kind of Scarlet Letter in the networking world.
Although it may sound complicated, the typical implementation of the GCRA is simple and efficient. This will be a little more obvious when we see how the increment and the limit play together, and see how little state the GCRA actually has to maintain and what simple calculations it has to do. The discussion below refers to the following set of diagrams, which show various scenarios of an event stream and a GCRA with an increment i and a limit l. (Click on the thumbnail to see a full size version.) Such a GCRA is typically represented in the literature using the notation GCRA(i, l). Yeah, the variables kinda suck, but I'm trying to stick with the TM 4.0 terminology.

Figure [1] shows a time line as the GCRA sees it. It expects to see events E0, E1, etc. arrive evenly spread exactly i ticks apart, where i is the increment.
Figure [2] shows just such a well behaved event stream, with the actual events A0, A1, etc. all arriving exactly on time. These are the kind of events that got beat up in school and called "mama's boys".
Figure [3] shows event A2 arriving j ticks late. A2 is always arriving late, and always has some good excuse: its car broke down, it had to drop the kids of at school, whatever. You would think the GCRA would give the event stream some credit by maybe increasing its limit by j ticks. Or maybe it would penalize event A2. But instead, when an event arrives late, the GCRA shifts the entire time line down so that the IAT of all subsequent events will be measured relative to the new time line established by the late arriving A2.
Figure [4] shows that event A2 has been somehow dropped by the network and was never seen by the GCRA. The GCRA doesn't care one whit about the fact that A2 was dropped. From its perspective, A3 is the next event, and it arrived i ticks late. So once again the time line is shifted down so that the IAT of all subsequent events will be measured relative to the new time line established by the arrival of A3.
Figure [5] shows that although A1 and A3 arrived right on schedule, A2 was jittered in its passage though the network. Because A2 arrived late, the time line is shifted down just as before. Now, even though A3 arrived right on time as far as the original time line was concerned, the GCRA sees it as arriving early from the perspective of the new time line, just k ticks after A2. So while A2 arrived j ticks late, it is A3, which arrived (i-k) ticks early, which may be penalized. The GCRA compares (i-k) to the limit l and if it is less, A3 is not dropped and the event stream goes on its merry way. However, something interesting happens to all subsequent events A4, A5 and on to infinity. Even through they are arriving spot on in terms of the original time line, they are all (i-k) ticks early from the perspective of the new time line. The limit l has been reduced by (i-k) ticks, and will remain so until an event arrives on or after its expected time on the new time line.
Because jitter is a more or less random mechanism, and it is about as likely that events in the same event stream arrive late as they do early, assuming they were emitted with the correct timing from their original event source, this works. When an event arrives early, part of the limit budget is consumed. But when a subsequent event in the same event stream arrives on time or late, the time line is shifted down and all prior sins are forgiven.
This is also why drift is a problem. If an event source is emitting events too quickly for any reason (drift being just a common one), each early event uses up part of the limit budget until it is exhausted. This can happen because a single event arrived very very early. But it can also happen gradually, as each successive event uses up a tiny fraction of the limit. This can result in network links that appear to work just fine for days, until they suddenly and mysteriously quit working altogether. This is one of the reasons that telephony networks carrying real-time voice samples are carefully time synchronized to each other and to common timing sources of very high precision, typically atomic clocks. Timing synchronization can be a big deal.
Figure [6] shows event A3 arriving early but within the limit of the GCRA. But A4 arrives equally early, and its expected arrival time E4 is beyond the limit of the GCRA. A4 is dropped, as would be all subsequent events, until one arrives at or after its expected time on the current time line. In this example, A5 arrives at its expected time E5, and the limit budget is restored.
Finally, it is instructive to show some code snippets for a GCRA implementation just to show how simple it is. Below are some Java snippets (with some distracting details removed) borrowed from the GenericCellRateAlgorithm class in the Digital Aggregates Buckaroo library. First, there is the state maintained by the GCRA. The Java class maintains this state in long variables because they contain time durations in ticks. Because ticks are typically in units of microseconds or nanoseconds, the large range of the 64-bit integer is necessary. Since most modern processors implement 64-bit integers directly, this isn't typically a problem (although I have implemented 32-bit versions of the GCRA on embedded systems).
long increment;
long limit;
long now; // time of this event
long then; // time of prior event
long x = 0; // expected ticks
long x1 = 0; // actual ticks
This next snippet computes the delay (if any) for an event when the current time is now, and the prior event was emitted then. The difference between now and then is the IAT if the current event were to be emitted.
long delay = 0;
long iat = now - then;
if (x <= iat) {
x1 = 0;
} else {
x1 = x - iat;
if (x1 > limit) {
delay = x1 - limit;
}
}
if (delay == 0) {
then = now;
x = x1 + increment;
}
If the delay is zero, the event conforms to the traffic contract implemented by the GCRA and may be emitted, in which case the state of the GCRA has to be updated with the expected IAT of the next event. If the delay is greater than zero, the emission of the event must be delayed (traffic shaping) or the event violates the traffic contract of its event stream and may be dropped (traffic policing).
That's it! No loops. No complicated math. No floating point.
This has been a whirlwind tour of the Generic Cell Rate Algorithm. But the GCRA is seldom used directly. One or more GCRAs are typically combined, with their increment and limit parameters computed from traffic contract parameters in more useful forms, like events per second. We will talk about how traffic contracts are mapped into one or more GCRAs in a later article.
Sources
Buckaroo, Digital Aggregates Corp., 2007
Chip Overclock, "Rate Control Using Throttles", January 2007
N. Giroux, et al., Traffic Management 4.0, tm-0056.000, ATM Forum, April 1996, pp. 62-67
J. L. Sloan, "ATM Traffic Management", Digital Aggregates Corp., August 2005
Friday, January 12, 2007
Rate Control Using Throttles
They are also simple enough to be implemented efficiently in firmware or software and incorporated into applications ranging from embedded systems to enterprise Service Oriented Architectures. Several commercial telecommunications products contain proprietary rate control software that I developed. Digital Aggregates provides clean-room implementations of several rate control algorithms based on public standards. These are available in C++ in the open source (LPGL) Desperado library, and in Java in the open source (Apache 2.0) Buckaroo library.
Historically - in this case meaning in my personal history - rate control has been cast in terms of the emission of ATM cells. Much of the rate control software that I have developed has been based on the virtual scheduler described in the ATM Forum Traffic Management 4.0 specification. This is because I spent several years at Bell Laboratories immersed in the development of ATM devices, including an ATM network interface card that supported Voice and Telephony Over ATM (VTOA) with traffic shaping, and which to this day is still arguably the most complex single board ever produced by that organization, and an ATM network switch with a sophisticated Connection Admission Control (CAC) algorithm. Alas, ATM became the Sony Betamax of network technologies, along with all that implies. However, replace the word cell with buffer, packet, datagram, log message, event, or foo, and the same code works equally well in non-ATM applications. (You can replace cell with byte too, but calling this software for every single byte you want to transmit is not cost effective. Stay tuned for future developments.)
The basic abstraction for the Desperado and Buckaroo rate control classes is the Throttle. Throttles are objects that maintain internal state about the event stream they are rate controlling. Before emitting an event (whatever the words emit and event mean in context), the application calls the Throttle to ask if the event is admissible. If the Throttle says that it is, the application emits the event and commits the Throttle state. If it is not, the application does not emit the event and instead rolls back the Throttle state. What happens next depends on what kind of Throttle you have.
Throttles may be time-based or event-based. Time-based throttles base admission decisions on temporal factors such as the inter-arrival time between successive events. They are initialized with a specific traffic contract that formally describes the expected (traffic policing) or desired (traffic shaping) emission behavior of the event stream that they are rate controlling. Event-based throttles base admission decisions on something other than time, such as the number of events already admitted. Throttles may be compounded together to form composite Throttles that exhibit behavior that may be quite complex. Think of it as adding simple periodic waveforms together to get speech or music.
When using an event-based Throttle and the event is not admissible, the event is not emitted, and the application may try again later at some indeterminate time. The Geometric Throttle admits events following a geometric progression: it admits the first event, the second event, the fourth event, the eighth event, up to and including the thirtieth event, after which no more events are admitted until the Throttle is reset. This simple Throttle is very useful in embedded systems for controlling error messages to a log file or a console when a persistent failure has occurred. The application only emits the error message if it is admissible, and it resets the Throttle once the error clears, placing the Throttle back in its initial state. Several error messages will be emitted initially for a particular failure, then with ever decreasing frequency, until finally no more error messages appear. This is a good strategy whether the log is being watched in real-time or being written to a file, because it produces the occasional reminder that the error still exists without resulting in a fire hose of error messages.
When using a time-based Throttle and the event is not admissible, the event is not emitted, and the Throttle gives the application a delay interval, in units like microseconds, that indicates how far in the future the event will become admissible. The application can delay emitting the event until this interval has passed, at which point it asks again if the event is admissible to update the Throttle state, commits the state, and emits the event. Time-based throttles are ideal for shaping outgoing data streams to make a message producer a good citizen that plays well with others, and for policing incoming data streams to prevent a misbehaving message producer from fire hosing a message consumer.
Below is a Java code snippet (with some distracting details removed) that shows a trivial application initializing and then using a Throttle to control the rate at which events are emitted by doing a simple Thread sleep. Without going into too much explanation, this gives you the flavor of how a Throttle might be used.
int pcr = 10; // Peak Cell Rate
int cdvt = 250; // Cell Delay Variation Tolerance
int scr = 1; // Sustained Cell Rate
int mbs = 10; // Maximum Burst Size
Throttle th =
new CellRateThrottle(pcr, cdvt, scr, mbs);
while (true) {
long delay = th.admissible();
while (delay > 0) {
th.rollback();
Thread.sleep(delay);
delay = th.admissible();
}
th.commit();
emit(event());
}
The Generic Cell Rate Algorithm (GCRA) is a time-based Throttle which is based on the virtual scheduler. Its traffic contract consists of just two parameters, the increment, which is the expected or desired inter-arrival time between events, and the limit, which is the total variation from the inter-arrival time that the event stream may exhibit before being in violation of the traffic contract. Telecommunications folk will recognize this slack in the traffic contract as an accommodation for jitter, small variations in the inter-arrival time of events. It is impossible for firmware or even hardware to nail the emission of events exactly on the dot time-wise. And when aggregating multiple event streams into a single event stream, one event has to be emitted before another, regardless of their respective traffic contracts. Jitter happens. As someone once said, "perfection is the enemy of good enough", and rate control mechanisms have to take this into account. If you find it impossible to emit event streams in conformance with their individual traffic contracts, either your traffic contracts are unrealistic, or you have a network engineering problem that you need to solve.
However, there may be legitimate reasons to commit the Throttle state and emit an event even if the event is not admissible. It may well be better to emit an event at a time that violates its traffic contract and let it take its chances being dropped due to congestion by a downstream node in the network, than to guarantee that it is dropped due to, for example, a full buffer in the consumer, before it is ever emitted into the network. (It also may well be that this is not the case; your mileage may vary.) If the application commits the Throttle for an event which is not admissible, the Throttle becomes alarmed. The Throttle remains in the alarmed state until the event stream once again conforms to its traffic contract.
The Cell Rate Throttle is a composite Throttle made up of two GCRA Throttles. It implements a complex traffic contract based on parameters that may sound a little more approachable than increment and limit. The Peak Cell Rate (PCR) describes the maximum expected or desired rate of event emission in cells per second. The Cell Delay Variation Tolerance (CDVT) is the limit for jitter in the PCR, typically in microseconds. The Sustained Cell Rate (SCR) is the long term average rate of event emission in cells per second. The Maximum Burst Size (MBS) is the number of cells that maybe emitted at PCR (which is presumably greater than SCR) without violating the traffic contract. The math behind the GCRA and the virtual scheduler calculates the equivalent burst size for any emission rate between PCR and SCR. The Cell Rate Throttle allows an event stream to burst for short periods without violating its long term average rate. This may sound like rocket science, but once again, the implementation of this mechanism is actually very simple and efficient.
Each time-based Throttle implementation has a time granularity known as a tick. A tick may be a millisecond, microsecond, nanosecond, or even some arbitrary unit that may not even be representable as a whole number and is perhaps based on the underlying CPU clock. The inverse of this granularity, that is, the number of ticks per second, is known as the Throttle's frequency. A time-based Throttle may be queried as to its frequency, and each time-based Throttle provides a method that returns the current time in units appropriate to its frequency. The current time may be equivalent to wall clock time relative to some epoch, or merely a monotonically increasing counter with a consistent interval of change suitable for measuring arbitrary time durations. This flexibility in regards to how time is measured makes Throttles easily adaptable to many operating system platforms and hardware targets.
A few more minor details: Throttles may be reset at any time to return them to their initial state, causing them to forget any past sins of the event stream they are rate controlling. The first event for any Throttle is guaranteed to be admissible. Throttles are guaranteed not to alarm as long as their event streams conform to their traffic contracts.
This has been a brief introduction to Throttles. I believe that rate control is crucial for any real-time system that is to be robust, scalable, and supportable, and most especially one in which event streams must be aggregated. Using Throttles and other rate control mechanisms for traffic shaping and traffic policing make producers and consumers well behaved, and allows you to make useful assumptions about traffic, load, and quality of service when engineering a network, configuring a large distributed application, or troubleshooting a customer system. In the future I will discuss traffic contracts in more detail, and the theoretical underpinnings of the Generic Cell Rate Algorithm and the virtual scheduler.
Sources
Buckaroo, Digital Aggregates Corp., 2007
Desperado, Digital Aggregates Corp., 2007
Chip Overclock, "Traffic Management", December 2006
N. Giroux, et al., Traffic Management 4.0, tm-0056.000, ATM Forum, April 1996, pp. 62-67
J. L. Sloan, "ATM Traffic Management", Digital Aggregates Corp., August 2005
Tuesday, January 09, 2007
The One Percent Inspiration
When I am working on a problem I never think about beauty. I only think about how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong.
But this is a phenomena well known to software developers too, and probably designers in all domains. I suggest that successful designers in any domain require a high degree of left-brain/right-brain integration. Studies have shown that in general the left half of the brain handles the concrete (arithmetic), while the right half handles the abstract (philosophy). For the problem domain I know something about, software design, the detail-oriented job of writing good code surely must be a left-brain thing, but the esthetic from which good design emerges rests in the right-brain.
I need all the help I can get for the right half of my brain. That's my rationale as to why I can often be caught talking aloud to myself: it has been suggested that this is one way to increase the bandwidth between the halves of the brain. (My spousal unit would counter that it is because I like the sound of my own voice.) Lately I've been evaluating another tool to aid creativity and inspiration: design cards. These are decks of cards, reminiscent of the flash cards that some of us may be old enough to remember, that are intended to spur discussion, jog memories, suggest alternatives, and foster innovation.
I have four different decks of design cards that I've been comparing. They differ greatly in terms of execution and intent, and range from being very general (Oblique Strategies: "Remove ambiguities and convert to specifics") to quite specific (Drivers of Change 2006: "What will we do in a global knowledge economy when we have outsourced everything?"). While they are in no way specific to software development, it is my belief that they could be useful in software design or elsewhere in the product development process.
Here are capsule reviews of all four decks in order from general to specific. (In all cases you can click on the image of the card to see a larger version.)
Oblique Strategies (5th edition)
103 cards, 9.5cm x 7cm each
Brian Eno and Peter Schmid
http://www.rtqe.net/ObliqueStrategies/
The Oblique Strategies cards are the oldest of the design card decks, and perhaps the most generally well known, the first edition having been published around 1975. Designed by Brian Eno (yes, that Brian Eno) and Peter Schmidt, the cards are about the size of playing cards and are one-sided (the backs of the cards are uniformly black). Each card offers a zen-like phrase, like "Gardening, not architecture", intended to foster out-of-the-box thinking.
The typical response of the engineer in me (meaning: my left-brain) is WTF? But the aesthete in me (right-brain) likes these cards a lot. I think their strength is in their ambiguity. The halves of your brain struggle to agree on an interpretation, and after a few seconds you suddenly realize that "Remove ambiguities and convert to specifics" might actually be pretty darn good advice when grappling with an issue of abstraction versus implementation in software design.
The Oblique Strategies cards are worth a look if nothing else because they have a time-proven track record. I find them useful to sort through and ponder when I get stuck on a design problem or even a difficult bug.
Design and Beauty Cards
61 cards, 15cm x 10cm each
tompeters!
http://www.tompeters.com/
(These appear to be out of print already.)
Author and management consultant Tom Peters has become controversial in recent years when fans of evidence-based management started asking what happened to those great companies, like Data General, DEC, and Wang, he described in his book In Search of Excellence, many of whom did not survive the dot.com crash. Then there was the 2001 Fast Company interview in which he suggested he might have cooked the data. Well, Peters would hardly be the first guru I've followed that had feet of clay, and I still find much wisdom in his work.
His Design and Beauty cards are flash-card-sized, and have a front side with a basic principle ("design for error") with an appropriate graphic design, and a back side with more detail ("Big idea: to err is human. And it's usually the designer's fault. Don't think 'pilot error,' think 'designer error.' So trouble shoot in advance. Examine all the possible ways someone naive might screw up your project's deliverables. Know where those errors might occur and compensate in the design").

The layout of the Design and Beauty cards is clever: the front side appeals to the right-brain, the back side to the left. They lack the appealing ambiguity of the Oblique Strategies cards, and because they are more specific, you might have to go through quite a bit of the deck to find a card that you think applies to the quandry you are trying to solve. They have the look of a set of Peters' PowerPoint slides, and I'm sure that's how they started out. But they offer rules of thumb that I think would be useful whether you are designing software, brainstorming a new product, or planning a marketing campaign.
The instructions suggest going through the entire deck and sorting the cards into doesn't apply, might apply, and definitely apply. Used in this way, as memory joggers, issue reminders, and discussion motivators, they could be quite useful, particularly in a group setting.
IDEO Method Cards
50 cards, 14cm x 9cm each
IDEO
http://www.ideo.com/methodcards/MethodDeck/index.html
Palo Alto design house IDEO is known for winning more BusinessWeek Industrial Design Excellence Awards than any other firm. They have designed products ranging from personal digital assistants to office chairs. Their Method Cards on the surface are reminiscent of Peters' Design and Beauty cards: clever appropriate graphic on the front, detail on the back.

They differ in that the detail on the back of each card is more focused on specific strategies to be used in product development. For example, under "Scenarios": "HOW: Illustrate a character-rich story line describing the context of use for a product or service. WHY: This process helps to communciate and test the essence of a design idea within its probable context of use. It is especially useful for the evaluation of service concepts." That's virtually word for word out of the software developer's "user story" and "use case" playbook.
The cards are organized into four categories: learn, look, ask, and try, encouraging the user to be more end-user facing and more evidence-based. Echoing IDEO's human factors approach to design, the cards are very human-centric in their suggestions. I think they would be especially useful in early stages of product development, including software products, where requirements for the user interface (whether it be a GUI or a door handle) are being discussed.
Drivers of Change 2006
50 cards, 16.5cm x 11.5cm each
ARUP
http://2006.driversofchange.com/
The Drivers of Change (2006 edition) cards, from the U.K.-based design and engineering firm ARUP, are by far the most detailed and specific of those discussed here. Each largish card has the format of a card-catalog card, with an index tab identifying its driver as social, technology, environment, economic, and political. The front of each card has the usual graphic and general topic, but also includes a paragraph with more detail. The back of each card has even more data, frequently including charts and graphs to draw in those folks who scored a high Sensing in the Myers-Briggs Type Indicator.

If anything, the Drivers of Change cards are too dense with information. They tend to distract me from the problem at hand as I stop for several minutes to read all the detail present on both sides of the card. The timely data they offer will appeal to the left-brain-dominants among us, but will make the cards less useful over time. I can definitely see ordering a new set of Drivers of Change as later editions become available.
While I am not convinced of their usefulness for design, I think they would be very apropos early in the product development lifecycle, perhaps when brainstorming with a group of right-brainers who need a reality check. For example, if designing a product for deployment into third-world countries, the card "Literacy - who taught you to read? 18% of the world's adult population cannot read this card (whatever language it is printed in)." might give you pause. (And that's just the front side.) Even better, the deck includes a set of footnotes leading you to the raw data, which is frequently available on the web.
I find the Drivers of Change cards fascinating in and of themselves, and sometimes a little scary. I suspect I will eventually read all of them, front to back.
The cost of each of these decks of cards is on the order of tens of dollars, and often I found better prices at web sites selling architectural books. These card decks could be a valuable tool for any group or individual who needs a little help with left-brain/right-brain integration.
Sunday, January 07, 2007
The Wisdom of Amazon.com
Not too long ago, I was cruising around on Amazon.com looking for any classical music CDs that featured the tango Por Una Cabeza, which was featured in the movie True Lies. Thirty seconds of Googling for "true lies tango" had gotten me the title. When I typed that into Amazon's search engine, I got back several classical music CDs, and the soundtrack to True Lies.
On the surface, this might not seem unusual. Except that Por Una Cabeza does not appear on the soundtrack by Brad Fiedel. I knew this already because I owned soundtrack CD. There was absolutely no reference to the tango on the CD, in the liner notes, etc.
How did Amazon.com know that that tango was used in the movie? I don't think it did. I think the search engine merely noticed that people who searched for "Por Una Cabeza" also purchased the soundtrack to True Lies. So in a marketing move that seemed almost omniscient, and without really knowing why, it suggested that CD too. This is not the first time Amazon.com has startled me. (Actually, it startles me every month when my Visa bill arrives. One-click-ordering maybe works a little too well.)
This is a great example of what James Surowiecki talks about in his book The Wisdom of Crowds, where you leverage the combined knowledge of a very large sample of people, including both experts and amateurs. While a particular individual's knowledge may not add up to much, you get enough data points, start doing some correlations, or apply it all in a market-based system, and suddenly you not only have a lot of data, but some real information as well.
If you want some examples of the really counter-intuitive results you may get when you start looking at statistical correlations in large data sets, see Steven Levitt and Stephen Dubner's book Freakonomics. The controversial result that is mentioned in all the reviews is that the reduction in the crime rate correlates to the legalization of abortion. Unwanted babies that are not born do not grow up to become criminals. But this is merely one among many compelling chapters.
Peter Morville is a librarian, a librarian's librarian. He's the kind of librarian that might be (and probably has been) hired as a consultant by the Library of Congress or maybe Google. Morville invented the field of information architecture. His book Ambient Findability is about life in a world in which virtually any fact might be at your fingertips, if you only knew where to look for it, how to ask for it, and if it were organized to maximize its findability.
One of the topics Morville covers is the difference between organizing information in formal taxonomies versus the emerging folksonomies. Formal taxonomies are very expensive to create because they are labor intensive, but you are leveraging the wisdom, experience, and knowledge of highly trained experts in a field. The Library of Congress system of classifying books is a formal taxonomy. All those Latin words you had to learn in high-school biology were probably part of a formal taxonomy as well. Folksonomies may be very expensive to create as well, but the cost is widely distributed and the labor frequently donated, but you're leveraging a lot of people whose credentials are more or less unknown. Folksonomies abound on the web, ranging from del.icio.us (which is completely driven by users adding their own bookmarks categorized by tags that they themselves apply), to SlashDot (whose value-add is its editorial moderation as applied to the firehose of contributed news items), to the kind of information organization done by Amazon.com (in which software extracts patterns of behavior from a huge sample size with little or no human intervention).
It is this difference in economics between the formal taxonomy and the folksonomy that killed the original Yahoo, which the old coots in the audience may recall was once not a search engine but a formal taxonomy of web sites. It is also what made a star of Google, which is likely to become the emerging artificial intelligence that eventually conquers the planet.
I loved Malcolm Gladwell's book The Tipping Point, which applied theories of how an outbreak of disease becomes an epidemic to everything from fashion to open source software development. His more recent book Blink is about how sometimes snap judgements are more accurate than informed opinions formed by long studious research. The weakness in Blink is that Gladwell does not stress nearly enough that in all of his case studies of amazingly successful snap judgements, they were all made by the kinds of people you would want drawing up your formal taxonomies: experts in the field with decades of experience, whose brains contain finely honed neural networks capable of delivering useful snap judgements.
When the software refactoring folks talk about code smells, this is exactly what they are talking about: the sensation you get when you look at a piece of code, and something in the back of your mind tells you this looks funny. This is not an intuition that occurs to the average joe off the street. It is the result of many many hours spent finding that frackin' bug, and swearing, as God is my witness, I will never make this mistake again! There is nothing like a weekend spent at work to train your neural network to be more careful next time.
As useful as folksonomies obviously are, we need to keep a few experts around too for the value-add that their neural networks bring to the table. On the flip side, Surowiecki would tell us that it is just as bad having nothing but experts. True wisdom comes from having a wealth of experience, a breath of knowledge, and a diversity of opinions.
Sources
Malcolm Gladwell, Blink, Little, Brown, and Company, 2005
Malcolm Gladwell, The Tipping Point, Little, Brown, and Company, 2002
Steven Levitt, Stephen Dubner, Freakonomics, William Morrow, 2006
Peter Morville, Ambient Findability, O'Reilly, 2005
James Surowiecki, The Wisdom of Crowds, Doubleday, 2004
Wednesday, January 03, 2007
Version Control as a Victim of the Times
I can't install, use, and manage a source code control system without knowing at least a little but about how it works under the hood. Looking back on my varied and hard-won experience in information technology and product development, I have come to the following conclusion: from strictly a user point of view, all of the version control systems I've used are pretty much the same. Yes, I can already hear the mob marching on the castle with flaming torches. But the fact is, there are a handful of basic things you always have to do when juggling multiple versions of digital stuff, and every version control system has to implement, with more or less success, the user's Platonic ideal of what such a system should do. Where they really differ is in how they are implemented, and what ancillary features they offer that take advantage of new enabling technologies.
Back in May I wrote about how architecture and design was driven not just by Moore's Law, which, broadly speaking, addresses just CPU horsepower, but also by similar exponential growth curves for things like network bandwidth, network connectivity, secondary storage density, and even bus width and speed. I argued that since these metrics were growing on very different curves, the architecture of today might not be the architecture of tomorrow, because what it took to design a balanced system would be so different in just a few years.
Version control systems have been victims of these different curves in a big way. Looking at how SCCS works versus how Subversion works under the hood, you can tell that the authors of both were looking at very different design points: "storage is expensive" versus "storage is cheap", or "development is economical because it is centralized" versus "development is expensive because it is distributed", "legacy code is relatively stable" versus "refactoring is a way of life" to name just a few. The legacy version control systems have not been able to keep up with the changing times.
And so, I am sure, in just a few years I will once again be installing and learning to manage and use yet another version control system. It seems unlikely, from a historical perspective, that Subversion will be the end-all and be-all.
The King is dead! Long live the King!
Sources
Ben Collins-Sussman et al., Version Control with Subversion, Subversion 1.2
Chip Overclock, "It's Not Just About Moore's Law", May 2006


