Wednesday, December 23, 2009

Product Development as Warfighting

Pop quiz. Closed book. Multiple choice. Complete the following.

John Boyd (1927-1997) was:

A. in the U.S. Army Air Corps and later U.S. Air Force and served in three wars.

B. the chief instructor at the Fighter Weapons School at Nellis Air Force Base in Nevada. He was known as Forty-Second Boyd for his boast that he could win any dogfight in forty-seconds. He was undefeated.

C. the author of Aerial Attack Study, the first text book on air combat. The book, in either its original or pirated form, became the principle text book on air combat for every air force in the world.

D. the inventor of Energy-Maneuverability Theory, which applied thermodynamics to aircraft performance. E-M Theory became an invaluable evidence-based tool in the design, specification, evaluation, and comparison of aircraft.

E. a key player in the design of the F-15, F-16, and A-1 aircraft, and was a fierce opponent of the initial flawed designs of the M1-A1 Abrams tank and the Bradley Fighting Vehicle.

F. the thought leader of the Defense Reform Movement at the Pentagon, applying an evidence-based approach to military procurement. He required for the first time in decades that competing defense contractors build working prototypes of weapons systems so that they could be directly compared.

G. a retired Air Force officer who synthesized centuries of strategic thinking to codify Maneuver Warfare. He lectured at Quantico and his work was eventually adopted by the U.S. Marine Corps as official doctrine. A statue of Boyd, in his Air Force flight suit, can be found at the Marine Corps Research Center at Quantico.

Of course it's a trick question. Would you expect less from me? If you or I had achieved any one of these accomplishments, we would have considered it the capstone of a brilliant career. Boyd achieved all of them.

That's not all. He desegregated the Las Vegas casinos in the late 1950s by insisting that his black officers be allowed to dine with the rest of his men in the restaurants. And for a time he commanded what was probably the most secret base in South East Asia, responsible for monitoring the Ho Chi Minh trail.

But his achievements did not come without a price. He barely spoke to his wife, although he apparently spent enough time with her to have five children, whom he also ignored. His career stalled at the rank of full colonel because he had become a thorn in the side of too many senior officers in all branches of service. In retirement, he refused to double dip, and his pension was just enough to house his large family in a tiny basement apartment in a bad part of town. If you saw him on the street, you might conclude from the way he was dressed and acted that he was among the ranks of the homeless instead of one of the greatest strategic thinkers of all time.

And although nothing I have read has ever suggested it, all of the descriptions of Boyd's behavior and mannerisms point to the symptoms of Asperger Syndrome. It's like checking off bullet items on a checklist. Economist Tyler Cowen has written that for some professions, Asperger's or related traits on the autism spectrum are not necessarily dysfunctions but can actually be a competitive advantage. Those with Asperger's frequently are able to bring an extreme mental focus, an obsessive attention to detail, and other cognitive abilities to bear on highly complex problems. What they may lack in the social skills considered to be naturally occurring by others can be learned.

One thing for sure, if Boyd did have Aspergers, he was probably in good company. It has been speculated that luminaries ranging from Thomas Jefferson to Albert Einstein to Alan Turing also lived on the autism spectrum.

There are several recurring themes in Boyd's work, briefings, and teachings.

People. Ideas. Things.

It took me decades to figure this out on my own. In any endeavour, people are more important than ideas. Ideas are more important than things. Things (or as Boyd put it, hardware) are the least important of all.

Boyd, who felt that the Air Force was too much a technocracy enamored with the latest new hardware, preached this time and time again to anyone that would listen. It is people who make the difference. It is the idea that can be more broadly applied. The hardware is just a passing fad.

This is the mistake that many technologists make in their personal lives. When given the choice between going out with your friends and sitting at home alone cruising the web, generally the best long term choice is to go with your friends. If trying to decide between spending money on travel with your spouse or buying a new home computer, take the trip. Learning a new concept is more important than learning how to use the latest gadget.

It's also the mistake made by many hiring managers, architects, and technical leads. A really good developer can more than make up for marginal or legacy technology. A developer who is good at synthesizing and applying new ideas is better than a developer that has the latest new technology on their resume.

This is also why design patterns are more important than specific tools, because the pattern is more broadly applicable. A developer who can intuitively apply design patterns is even better.

To be or to do.

Boyd saw that a lot of the officers at the Pentagon were careerists, more interested in getting promoted than doing the right thing. He also realized that in a bureaucracy it was frequently impossible to do both. Advancement in a bureaucracy typically depends on defending the status quo, not on shaking things up. Boyd lectured his followers on the need to decide whether they were going to be something or to do something.

George Benard Shaw once said "The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." The fact that Boyd and many of his followers were unreasonable men is why most of them saw their careers stall in mid-flight: they choose to accomplish something great rather than to become something great.

I should be the last one to give you career advice. But I will tell you that if you have any ambition at all, you will face this choice, probably many times, during your career in high-technology. I personally have always chosen to do rather than to be. But then again, I am living proof that you can spend a decade in an organization without getting promoted or accomplishing anything significant. If you find yourself in that situation, then both of your engines have flamed out and your plane is on fire. Consider bailing out.

Fast Transients.

It may seem strange, particularly to those who have served in the military, that a self-described "dumb fighter jock" would have revolutionized both air combat and ground warfare. But Boyd found many similarities. All of his work revealed the importance of fast transients: the ability, and the willingness, to change course quickly. It was this critical aspect of his concept of air combat that directly led to the design of the small, nimble, and very successful, F-16 fighter, when all Air Force doctrine at the time called for higher, further, and faster in a straight line. It also was the main idea behind maneuver warfare: move quickly, probe for weaknesses, don't get bogged down, don't worry about your flanks, let the enemy worry about his flanks. This can be applied to business, to development processes, and to architecture and design.

If your adversary is on your tail, not changing anything will not improve your situation. Changing too slowly will just give him time to adapt. You have to commit to change, to changing quickly, and to do so in a way that your adversary cannot anticipate. You must exploit what Sun Tzu referred to as cheng and ch'i, the expected and the unexpected. This is the core of what is innovation, doing the unexpected (and is what Apple in my opinion is very good at).

Be prepared to toss out a business plan or a technical design when new evidence comes to light that makes the current path unworkable. For sure, fix it if it can be fixed. But if it can't, don't spent endless meetings debating or fretting about it or moaning about the compromise of your beautiful aesthetic design. You're a craftsman, not an artist. Take the control stick in your hands and change course.

If during development you become blocked on a problem, don't sit around waiting for things to change. Find an area to work where you are not blocked. Make progress where you can. You will accomplish something while waiting for the blockage to resolve itself, and you may find that the progress that you made actually led to its resolution. This is the essence of blitzkrieg.

When designing complex real-time systems, expect emergent behavior. Be prepared to deal with it. Expect failures to cascade quickly through the layers of your architecture. Don't get so overwhelmed with a fire hose of log messages that you can no longer tell what is going on. More is not necessarily better. Eliminate detail. Reduce the logging level until the chaff and smoke and clutter is reduced so that you can clearly see your target. Achieve fingerspitzengefuhl, or finger-tip feel, an intuitive mental model of how your systems works and reacts.

Keep the end goal in sight. It is easy for technologists to become side tracked with the cool details of the latest new thing, to become tied up in dealing with a stack of non-critical bug reports, or to continue to refine and refactor that working code until it meets some arbitrary personal design aesthetic. Don't get bogged down in the thousands of tiny technical details so that you can no longer see the big picture. Make all decisions based on what will help you achieve your end goal, which is always shipping a product. Organizational guru Steven Covey says "begin with the end in mind", but among warfighters this is what is known as schwerpunkt: maintaining a focus on the objective.

Observe, Orient, Decide, Act.

Boyd's most famous idea (and probably the most misunderstood, including by me) is the OODA Loop. Boyd described how people process information to deal with the outside world. In his briefings he applied it to conflict, but it's much more broadly applicable than that. You observe what is happening. You orient, by which he meant you synthesize and interpret what is happening based on your own experience and cognitive biases. You decide on a course of action. You act on that decision. Because your actions change the circumstances in which you find yourself, you iterate and do it all over again.

When in a conflict with an adversary, both parties are iterating in their own OODA Loops. The key to success is to be able to iterate faster than your adversary. Boyd called this getting inside your adversary's decision loop. When applied on the battlefield, Marine Corps commanders remarked that it was like you were running both sides of the battle. The enemy spends all their cognitive bandwidth trying to figure out what the heck you are doing and reacting to old news.

Of all of Boyd's ideas, this is the one that has gotten the most press, mostly in the business world, for obvious reasons. It is the idea of fast transients writ large, taken from the realm of air combat and applied to conflict in general. Success in competition relies on not just making the correct decision based on the evidence at hand, but making that decision faster than the other guy.

What makes Boyd's OODA Loops go beyond just fast transients is the fact that you can negatively impact your adversary by controlling how and what he observes, and by exploiting his cognitive biases in how he interprets it. This is the role of most security and intelligence organizations (including business security and intelligence), but is crucial even among field commanders. You can force your adversary into making decisions based on the wrong information. Because you can do this to your adversary, you must assume that he is doing it to you.

Sometimes your adversary is an architecture, a design, or a legacy code base. You have to be aware of what your cognitive biases are, and whether they are blinding you to what can be done, or what needs to be done. This is one of the reasons I believe developers should learn new programming languages, particularly languages that are very different from what they use everyday, even if they never intend to use those different languages in production.

For example, if you spend most of your time using procedural languages like C, you should learn object oriented languages like C++. If you use an unmanaged language like C++, you should learn Java. If you are a C, C++ and Java expert, you should learn Prolog, Snobol, Lisp, Haskell, Smalltalk, or any number of other non-traditional languages, just to learn a radically different mindset and approach to solving problems. You may find idioms and patterns in those languages that you can productively apply to your everyday work.

Since ideas are more important than things, learning concepts like finite state machines, push down automata, and formal grammars, which can be applied in any language, are really useful too. I had several courses in graduate school on automata and formal languages that were ostensibly theory. Yet they completely changed, for the better, the way I view many programming issues I encounter in my daily work.

A point about OODA Loops that is frequently ignored, but is clear from Boyd's briefing slides, is that this iterative process is fractal: there are loops within loops at all different levels of scale. This fractally iterative process describes how I see the development process in general: you constantly iterate at all levels of scale: implementation, design, architecture, and requirements, with feedback from all levels constantly changing other levels as you decide what can and cannot be done, what must be done and what is optional, given the technology, resources, and schedule. This is really what Agile Development at its core is all about: fast yet effective iteration.

John Boyd is my hero.

I find the story of John Boyd to not only be inspirational, but also a cautionary tale regarding work-life balance and the fine line between genius and dysfunction. Because Boyd chose to deliver his message primarily through classified briefings, and seldom if ever published any of his ideas, he is largely unknown today. When his work is used, he typically goes uncredited. This is even though his ideas revolutionized much thinking in the military, in business, and for me, in high-technology development.

John Boyd died of cancer in 1997. He is buried in Arlington National Cemetery.


Robert Coram, Boyd: The Fighter Pilot Who Changed the Art of War, Little, Brown and Company, 2002 (This nearly 500 page book is surely the definitive biography of John Boyd. It is also one of the best non-fiction books I have ever read. Coram makes Boyd's story inspirational, compelling, and cautionary.)

James Fallows, National Defense, Random House, 1981 (Fallows frequently interviewed and wrote about Boyd and national security issues for the Atlantic while Boyd was at the Pentagon. This National Book Award winner covers a lot more territory than just John Boyd. Fallows' coverage leans neither left nor right, and is still worth a read today, although his remarks on nuclear policy makes me nostalgic for the Cold War when nuclear weapons were in the hands of professionals instead of radicals, thugs, and lunatics.)

Grant T. Hammond, The Mind of War: John Boyd and National Security, Smithsonian Institution Press, 2001 (This book was researched and largely written while Boyd was still alive. Coram states that Boyd prohibited Hammond from including much in the way of personal detail. It is an excellent introduction to Boyd's ideas, but Coram's book gives a picture of Boyd the man.)

U. S. Marine Corps Staff, Warfighting, MCDP 1, 1997

Saturday, December 12, 2009

In Praise of do while (false)

By now I would have thought that everyone knew the joys of the C, C++, and Java language construct do while (false). You can find articles written about it on the web from as far back as 1994, which might as well be Neolithic cave drawings. Yet I continue to have questions about do while (false) (or do while (0) in C if you haven't defined false) from code inspectors who should know better. (You know who you are.)

There is nothing magic about do while (false). It does exactly what you think it does, which is to say, very little. In fact, it does so little, your typical optimizing compiler won’t generate any code for it. Yet, it is a really handy tool to keep in your toolbox, right next to your duct tape and vice grips.

Common Exit Flow of Control

All C++ functions have a common entry point. It is frequently desirable for functions to have a common exit point. There are all sorts of reasons for this. The most pragmatic reason is having a common entry and exit point makes it easy to add debugging statements that log the arguments being passed into the function, and the results generated by the function. If the flow of control needs to return prematurely, it can do so while not avoiding the logging statement at the common exit, just by doing a break.

bool function1(int argument1) {
int rc = 0;
printf(“%s[%d]: function1(%d)\n”,
__FILE__, __LINE__, argument1);
do {
// Some really complicated code.
if (bogus) {
rc = -1;
// Some more really complicated code.
if (giveup) {
rc = -2;
// Yet more really complicated code.
if (error) {
rc = -3;
} while (false);
printf(“%s[%d]: function1=%d\n”,
__FILE__, __LINE__, rc);
return rc;

The inner logic uses the break statement to drop out the bottom of the do while (false). No need for labels and goto statements. No need for maintaining and checking flags. And if the inner logic completes and the flow of control finds itself at the while (false), it simply drops through. No harm, no foul, no iteration.

You can imagine replacing the logging statement with a close() system call, a free() call, a delete operator, or anything else you need to make sure you do to clean up after yourself. I routinely use this construct to eliminate any possibility of resource leaks in code that uses temporarily allocated resources like file descriptors, sockets, and dynamically acquired memory. I also routinely use it to fix resource leaks in legacy code, which seem to be surprisingly common in my experience, although using wonderful tools like valgrind during unit testing have helped a lot in that regard.

Adherents of other languages both more modern and more ancient will recognize this control structure as something you might have known as a do-end. It would be great if C++, C, and Java had something similar, perhaps a way to use break to exit out the bottom of any compound statement (that is, a block of statements enclosed in { curly braces } ). Alas, a break can only occur in the context of a switch or loop construct. So in order to use break, we must provide the compiler with a loop construct, albeit one that never actually loops.

This pattern is applicable to any block of logic, not just functions. I frequently use it when I am writing a long sequence of data transformations or functions calls, all of which must succeed for the result to be useful. If any step in the sequence fails, it does a break to the end of the block. Refactoring fans will be pleased that the pattern can be used to refactor spaghetti code into something more readable. The Design By Contract crowd will like the fact that code written with a common exit can establish preconditions above the do and postconditions below the while (false). Formal Verification folks will like the idea of establishing invariants (assertions that remain true during execution) before and after the do while (false). And although I find the idea of proving any non-trivial piece of code correct pretty laughable, I do find the concept of invariants to be very powerful when reasoning about program correctness, and the use of do while (false) helps me greatly in that regard.

The use of the break statement is obviously not universally applicable. If you are using it from inside another looping control structure, including other do while (false) constructs, or from inside a switch statement, then it is not going to drop to the bottom of the outer do while (false).

Instead, if you are using an ancient language like C, which I have a lot of affection for, the same way I might have had for Latin, had I studied Latin in high school instead of goofing off in the computer lab, you could have accomplished the same thing using a goto. In fact, this application is one of the few in which I find the use of goto acceptable. The maintainers of the Linux kernel use this pattern routinely, and so do I as my clients frequently call upon me to hack the 2.4 and 2.6 kernels to support their newest bleeding edge hardware platform.

But if you are using Java or any other C-like language that doesn’t have a goto, or if like me you find the use of goto a slippery slope, or even perhaps it is too reminiscent of those thousands of lines of FORTRAN IV that you wrote decades ago, the memory of which you are desperately trying to suppress, then this is a useful technique.

The use of do while (false) to implement a common exit flow of control is merely good practice. There is another context in which it is absolutely necessary.

Compound Statements and Preprocessor Macros

My name is Chip, and I use the C preprocessor when writing in C++. As much as the C++ purists like inline functions (and truth be told so do I), there are situations in which they just don’t cut it. I have used preprocessor macros to do fun things like computing the largest signed two’s complement binary number of any basic data type. I’ve tried to write an inline function to do that, and I would be pleased to see the results of anyone who did so successfully without using the preprocessor. (I’ve done it with a templated function, but then it could not be used in C.) The C preprocessor is a powerful form of code reuse known as code generation, and like all powers, with it comes responsibility. It must be used only for good and never for evil.

So given that I’m going to use the C preprocessor whether the C++ crowd likes it or not, consider the following code snippet.

#define TRANSFORM(_A_, _B_) \
function1(_A_); \

Now consider its use in this context.

if (transformable)

It’s not going to do the right thing, is it? The preprocessor will expand it thusly.

if (transformable)

This is clearly not what the user of the macro intended. You might be able to make up a lot of excuses for writing macros like the one above, but regardless, you have done something to surprise anyone that uses it. You have designed an abstraction that does not conform to the behavior any competent programmer would expect. You can argue that your coding standard requires { curly braces } around even single statements in if else blocks. This is not going to be helpful to your fellow developer who has to port ten thousand lines of third-party code, code which follows its own coding standard, and wants to use your macro to make their job easier.

The logical thing is to place the function calls in a compound block instead.

#define TRANSFORM(_A_, _B_) \
{ \
function1(_A_); \
function2(_B_); \

Then our code snippet will expand into something like this.

if (transformable)

Looks better at first glance, doesn’t it? Now both functions are part of the conditional. Yes, there is a dangling semicolon at the end which is functionally a null statement. But so far, so good.

So try this.

if (transformable)

Now our snippet expands to something like this.

if (transformable)

This will not compile. The semicolon trailing the first invocation of the TRANSFORM macro is actually a null statement, separate from the compound block preceding it. It becomes a statement in-between the if clause and the else clause. Using a semicolon following the macro invocation in the expected way leaves the else clause dangling.

The fact that this code does not compile is the good news. The programmer using your macro will merely think that you are incompetent, and will never use your macro, nor probably any code that you write, ever again.

A much worse case would be if the resulting code compiled, but did the wrong thing. I have tried very hard to find a code snippet which compiles but does the wrong thing. I have been unsuccessful. I’m not saying that such a code snippet does not exist, merely that I am not smart enough to find it. If such a snippet exists, then the programmer using your macro will think that you are incompetent while they sit with a baseball bat in the bushes next your house waiting for you to come home. If we were truly judged by a jury of our peers, it would be completely justifiable homicide.

A common approach to fixing this is to use the macro without a semi-colon at the end.

if (transformable)

This is an unsatisfying solution. You are requiring the user to write code in an unexpected and surprising way. Worse, the requirement to omit the semi-colon is merely an artifact of having to use a compound statement. If a thousand years from now the definition of your macro changes so that it is not a compound statement, then you must churn every single application of it to add the semi-colon. Or you have to put the semi-colon in the macro definition itself, which may cause all sorts of wackiness to ensue. Wouldn’t it be better to just make the macro work like any other C++ statement?

Like Lassie, do while (false) comes to our rescue. What is it, girl? The barn is on fire? Timmy fell into the well? We write our macro thusly.

#define TRANSFORM(_A_, _B_) \
do { \
function1(_A_); \
function2(_B_); \
} while (false)

The preprocessor now expands our macro into a single C++ statement that must be properly terminated by a semicolon. Hence

if (transformable)


if (transformable)
do {
} while (false);
do {
} while (false);

The semicolon, added by the user of the macro, is now a required part of the syntax, not a dangling null statement.

All of the snippets I have shown not only compile, but do the expected thing when executed. The do while (false) control structure serves as a compound statement that is both syntactically and semantically well behaved.

One context in which do while (false) does not work is when you are using the preprocessor to generate code that declares variables.

#define ALLOCATE(_A_, _B_) \
do { \
int _A_; \
int _B_; \
} while (false)

The variables will be allocated on the stack then immediately deallocated when the do while (false) construct terminates. This is fine if the scope of the variables is limited to the code inside the do while (false). It is not so useful if they are being declared for use outside of that scope. The simple compound statement has the same flaw.

The Little Control Structure That Could

I hope I have given you a new appreciation of do while (false), the control structure that does so much, while generating so little.

Saturday, November 21, 2009

Post-Modern Deck Construction

The Palatial Overclock Estate (or what the liberal media insists on calling The Heavily-Armed Overclock Compound) is getting on in years. As a result we are faced with gradually replacing many of its three decades old features, including its rear wooden deck. This has led me to ponder the latest in deck construction techniques.

See, back in the day, maybe fifty years ago, it was traditional to build a deck using a single carpenter. Back then there were very expensive master carpenters who were affordable only to a very large pool of homeowners who needed decks. But even so, there were a lot of advantages to having even a small share of a single master carpenter. They had the benefit of coming with a complete set of detailed deck plans, and were very experienced at following those plans. They had a lot of apprentices who made sure the carpenter never ran out of nails or lumber. The carpenter and his apprentices were a tightly knit group who were used to working together and were pretty efficient at doing so. The deck plans were pretty straight forward to follow because everyone working on the deck knew their place and their role. The carpenter did all the actual deck building, and the apprentices were relegated to fairly minor roles. Guys like Thomas J. Watson built entire companies just on the training of master uni-carpenters, and did pretty well for a long time.

Not everyone could afford a master carpenter, so a single master carpenter was shared among many folks who needed decks. Since there was always some dead time while building a particular deck, due to weather or lack of materials or what not, the master carpenter could go next door and work on that deck for a while. The carpenter could slice his time up amongst several deck projects. Even so, it took longer to get your deck built because even when the weather cleared the carpenter wouldn't be immediately available to continue on your project. And of course the priorities of the other homeowners with whom you shared the master carpenter weren't always your priorities.

In time, maybe forty years ago, journeyman carpenters became available. They were cheaper but still fairly expensive, and not nearly as fast as the master carpenter. Their apprentices weren't as smart either. But a journeyman carpenter, sometimes referred to as a mini-carpenter, was cheap enough that a few neighbors could band together and hire one to share to get some of their rear decks built faster than if they had been sharing a single master carpenter among a larger pool of homeowners, even though the journeyman carpenter wasn't nearly as fast as the master carpenter. This worked pretty well for a long time, and guys like Ken Olsen made a lot of money training journeyman carpenters.

But about thirty years ago, high school shop classes started turning out wood working students. At first, they were pretty bad. They were very slow, they weren't very smart, they bent more nails than they hammered in, they didn't make very efficient use of lumber. But they were dirt cheap. So some individual homeowners started hiring these shop students to do small jobs that the master and journeyman carpenters never seemed to have time to do or care about. That worked pretty well. Shop students, or micro-carpenters, could build a fence, maybe build a small deck, maybe a set of stairs for an existing deck built by a master carpenter.

Around twenty years ago, someone got the bright idea of banding a bunch of shop students together into a large team, in a variation of the "nine woman can have a baby in a month" strategy. This is not as crazy as it sounds. Owners of very large homes, mansions and estates and resorts and such, had been doing just that for some time with master carpenters. They would band as many as eight master carpenters together, with a whole bunch of apprentices, great lots of lumber and nails, to tackle really big carpentry projects. You could think of it as a sort of super-carpenter. Guys like Seymour Cray were very good at creating these kinds of master carpenter teams.

It wasn't easy: in order to make effective use of eight very expensive master carpenters, the usual deck plans wouldn't do. You had to completely rethink the plans and the instructions so that each master carpenter could keep busy without getting in the way of another master carpenter, that each wouldn't run out of materials and have to sit idle (a very expensive proposition since you had to pay them anyway), and make sure that all the individual parts each was working on all fit together.

Designing the deck plans so that all eight master carpenters kept busy was no mean feat, and a lot of very smart people spent a lot of time coming up with those plans. Even so, there was a limit to how much parallelization could be found even in the biggest construction projects. Gene Amdahl, who started out training carpenters for Thomas Watson and went on to found his own carpenter training company, once rather famously observed that the amount by which a carpentry project could be sped up depended not just on the number of carpenters and the speed of each carpenter, but the degree to which the deck building instructions could be written to keep all of the carpenters busy at the same time. It was one of those observations that was both obvious and profound.

Some of those same smart people also designed some very specialized tools for the master carpenters to make better use of their very expensive time. My favorite was the vector hammer. The vector hammer had a single handle, so that it could be wielded by a single master carpenter, but had many heads, perhaps even dozens. A single master carpenter could, if he were careful, and everything lined up just right, drive dozens of nails at a time. So the deck architects spent a lot of time designing deck plans in which the nails all just happened to line up just right so that the vector hammer could be used.

Applying this same team idea to the high school shop students seemed like a winning proposition. But to get the same level of performance out of a group of high school shop students as you would from eight master carpenters, you would need a lot of high school shop students. How many? A lot. A guy named Danny Hillis made an ambitious attempt to harness sixty-four thousand high school shop students to build a single deck as fast as eight master carpenters.

Yeah, you see the problem now don't you? How do you rewrite your deck plans to keep sixty-four thousand high school shop students busy building a single deck? How do you even keep them supplied with lumber and nails? Keep them from stumbling over one another? It's almost impossible.

Oh, for sure, there were some successes. Some carpentry projects, like building a long picket fence, were embarrassingly parallel. If the fence were long enough, you could space the students out along the length of the fence line and have them all start building in parallel towards the next student, making sure to join the end of their fence section to the beginning of the adjacent student's. For a lot of projects, this actually worked pretty well. And you can be sure for the guy trying to get you to hire his sixty-four thousand shop students, those were the projects he used as examples in his sales pitch. But in general, no one knew how to design blueprints and write instructions to make effective use of that many high school shop students. Most of the students sat around idle, drinking energy drinks, while the few that could be kept busy slowly plodded along.

But a funny thing happened. Over time, as they got older and more experienced, the high school shop students individually got better. They started learning from their mistakes, started copying some of the techniques of the master carpenters, starting being more efficient at building decks. But remarkably there was one thing they didn't do: they didn't get more expensive.

Suddenly it became practical to use a single high school shop student to build a deck. Maybe not as fast as those eight master carpenters. But maybe almost as fast (or sometimes faster) than one of them. And the instructions for a single student to build a deck were so simple, even I could write them. As time went by, each high school shop student continued to get better and better. A guy by the name of Gordon Moore recognized that high school shop students would likely continue to get better and faster at a predictable rate for the foreseeable future.

Alas, it turns out that there were still limits to how fast a single high school shop student could build a deck. Students could only move so fast, could only drink so much energy drink (and had to take more frequent bathroom breaks). And no matter what, they didn't have the experience and knowledge of the master carpenters, so that in their haste to build a deck quickly they occasionally made a mistake, but because they were still pretty good at what they did, the mistake was sometimes subtle and hard for even the deck architects to find.

It was inevitable that someone would suggest that, just as the owners of mansions and estates did decades ago with master carpenters, a group of high school shop students be banded together as a team. First it was just a couple of students. Then four. Then eight were banded together in a team in what we might call a multi-carpenter approach. Some folks, like a guy named David Patterson and his friends who think about building decks a lot, have suggested creating teams of dozens or even hundreds of high school shop students, in what they call a many-carpenter approach.

Oh. Yeah, see, now we're back to the problem of decades ago when no one knew how to write instructions and deck plans for lots of carpenters. It's like we forgot what Amdahl said decades ago.

But the problem is even worse than it was way back then. See, using the one cheap but experienced high school shop student worked for so long and so well, that there are lots and lots of instruction books and architectural plans for how to build all sorts of cool things using a single carpenter. How detailed are these plans? One place where I once worked had an instruction book for its carpentry projects that was eight million lines long. That's a book of around 120,000 pages of instructions, all based on the assumption that a single carpenter would be working on them. This instruction book, portions of which dated back twenty years, generated this company as much as a billion dollars in revenue annually.

Where would you start breaking down eight million lines of instructions so that two carpenters, much less eight, or dozens, could all be kept busy building a deck? Oh, for sure, they've tried, but without much luck. After much effort, the carpenters still keep getting in each others way, tripping over each others lumber, accidentally hammering another carpenter's thumb.

And that's just the legacy deck plans. In fact, even if you're designing a brand new deck from scratch, no one really knows how to write deck building instructions to keep a lot of carpenters busy in parallel. Humans don't think that way. There are lots of ideas about abandoning English altogether and writing the deck building manuals in a new language better suited for expressing parallelism in carpentry. But there are an awful lot of carpenters and deck designers that already know English. And that doesn't do anything for all the myriad of English-language plans, blueprints, and instruction books that already exist.

As a homeowner, I think it is an interesting problem.


Thanks to Ken Mulvihill, Robert Gallegos, and Ken Howard for the inspiration.


IBM mainframe supercomputer DEC minicomputer microprocessor multiprocessor Amdahl's Law Moore's Law Thinking Machines massively parallel processor MPP multi-core many-core threads multithreaded C C++ Java

Saturday, October 17, 2009


If you live in Europe or the Americas, every banana you have ever eaten was probably a variety known as the Cavendish. This are other varieties of bananas, but they look so different (for one thing, they may have large, hard seeds, something bred out of the Cavendish) that you may not even recognize them as the same fruit you slice up on your breakfast cereal. And that's a problem. There is so little genetic diversity in the Cavendish that a single disease could wipe the long, yellow, conveniently self-packaged fruit we all know right off the planet.

Alarmist rhetoric? Not so much. That's exactly what happened to its similarly monogenetic predecessor, the Gros Michel banana.

Michael Osinski argues that a similar lack of diversity nearly led to the collapse of Western Civilization. Conspiracy theory? Again, not so much. Osinski, in an article in New York Magazine, admits that the latest global economic meltdown might have been at least partly his fault.

Osinski wrote the software that was used by nearly every investment bank on the planet to bundle mortgages into a kind of bond known as a collateralized debt obligation or CDO, and to compute an asking price for the result. A user of Osinski's software package would pour thousands of individual mortgages into one end, puree them into a smooth consistency, then parcel the results out into many different investment vehicles, each vehicle containing a tiny portion of each of those mortgages. Then many individual investors bought a portion of those investment vehicles. Investors like your 401(k), your mutual fund, and your pension plan. His software computed the value of the CDO based on many factors including the risk of the mortgages that went into it.

Because each mortgage was blended together with thousands of other mortgages, the risk of default of any one individual mortgage was spread out. Because the resulting smoothie of mortgages was distributed into many different investment vehicles, even if many mortgage holders were to default, those bad mortgages would be a tiny portion of each investment vehicle. And because the ownership of each investment vehicle was spread out among many investors, the tiny risk was shared among many. In the case of your 401(k), your mutual fund, and your pension plan, the investor itself was in fact made up of money from many different individuals.

The only way it could possibly go wrong would be if the entire housing market suddenly collapsed. And what were the chances of that happening?

How do you determine the worth of something? As someone who has bought and sold a used motorcycle or two, I can tell you that this isn't the easiest question to answer. My Honda CB700SC that was a basket case (that is, it had been separated into its component parts, and would take some work to get it running again), I sold it for a fraction of its book value, which is, after all, just someone's opinion based on what others have actually paid for a particular model of vehicle. My Harley-Davidson FLHS, on the other hand, I sold years later for more than I paid for it. See, in a free market, the worth of something isn't what some piece of paper or blue book says it is. Its worth is what someone else is actually willing to pay for it. Once you grasp that, it becomes clear why touchy-feely factors like investor confidence suddenly become really important, because the idea of value has a significant psychological component.

When the housing industry collapsed and house prices were suddenly severely deflated, people discovered that they owed far more on their mortgages than their homes were worth, because their homes were worth only what someone else was willing to pay for them.

That may not seem like such an issue if you've never owned a home. When you take out a loan to buy a car (or a motorcycle) for example, the vehicle is probably worth less than you owe on it as soon as you drive it off the lot. But houses are different. When your job transfers you to another state, or you graduate from college, or you lose your job and have to move to where your employment prospects are better, you can take your vehicle with you. But typically you'll have to sell your house, if for no other reason than to take the money you make on the sale to use as a down payment on a house in your new town.

That is, if you make any money on your house. But if you find you owe more on your home than someone else is willing to pay for it, you are seriously underwater. You can no longer afford to sell your home, because you can't afford to pay off the balance of the mortgage. You can't afford a new home, because you don't have the down payment from the sale of your previous home. You can't afford to move, even though your job depends on it.

The solution for many people was to just walk away from their old homes, to default on their mortgages, and leave the mortgage holder holding the bag. And by mortgage holder, I mean, ultimately, your 401(k).

The risk of default of each of those pureed mortgages was among the parameters supplied by the users of Osinski's software package. It took these risks into account when computing the value of the CDOs. The sudden default of so many mortgages more or less simultaneously was not a risk factor that the users of Osinski's software had anticipated. They had seriously underestimated the risk, and hence the software package had seriously overvalued the CDOs. Investor confidence in mortgage-backed securities eroded; taking all of that risk (which had in fact always been there) into account devalued the CDOs, which now found themselves owning toxic assets in the form of defaulted mortgages. And because of the incredibly broad distribution of the ownership of the CDOs, we all ended up owning parts of defaulted mortgages that were worth virtually nothing.

In a talk he gave at a recent conference I attended in Santa Fe, computer scientist and financial software guru Arthur Whitney (who, by the way, decades ago worked with Kenneth Iverson on the programming language APL) estimates that in a matter of days the valuation of these CDOs fell from around US$33 trillion to US$8 trillion.

Let me make this clear: almost overnight, twenty-five trillion dollars in wealth evaporated. It didn't get spent. It wasn't stolen. It simply ceased to exist because thousands of investors suddenly decided not to buy into the consensual hallucination that was the housing bubble.

So what, if anything, does this have to do with the kinds of topics I usually write about? And what the heck does it have to do with bananas?

Here's the thing: just about every investment bank on the planet was using Osinski's software package. For sure, they were all putting in the wrong parameters, and using it for far more complex investment vehicles than Osinski ever intended. But still, it was all the same software package, with the same input formats, the same calculations, the same menus, the same bugs, the same output formats, the same pop-ups, the same errors and warnings. It is possible, just possible, if there had been other software packages being used for this same purpose, that some investment houses might have come to a different conclusion about the risk inherent in mortgage backed securities? Even if the answer is no, the fact that everyone was using the same software package means they were all exposed to the same software bugs, the same design flaws, the same systemic errors.

They were all eating the same variety of banana. Except in this case, instead of losing a much beloved fruit, they ran the risk of a systemic error leading to the collapse of the world economy. Each of us is, in a very real sense, a victim of the success of Osinski's software package.

I recently helped develop an product in which we worked very hard to make the system recoverable in the field by the end-user. If the disk-based system failed, the user could boot up a flash-based system and recover the disk. If the flash-based system failed, the user could boot up the boot loader and recover the flash-based system. If the boot loader failed, the box had to be shipped back.

All three systems ran on the same processor. Much of the very low level code, that which would be typically described as part of the board support package, could have been common between the disk system, the flash system, and the boot loader. It was just a matter of time until someone suggested just that. I argued (vehemently I'm told) against it. Even if the code were exactly the same, I argued against sharing the same source files in the source code control system. Otherwise, a single error in a single source file could lead to a systemic flaw that caused the same defect to occur in all three tiers of the product, rendering the product not just inoperable but unrecoverable in the field.

I understand -- better than most -- the economics of software reuse. And I understand how free markets drive companies to adopt what they perceive to be the best practices, including the software tools, of their competitors (whether they are really the best or not). Economies of scale drive this too. Do all of your employees use the same brand and model of desktop or laptop? Running the same operating system? All purchased about the same time because of a volume discount? And then there's people. Do you hire people that look like you? That share your opinions? That think the same as you so that you feel comfortable around them?

Whether you are talking about software, hardware, bananas, or people, a lack of diversity can lead to dire and unforeseen consequences.

Sunday, June 14, 2009

Cache Coherency and Memory Models

Sometime around 2002 I was working on an object-oriented multi-threading framework in C++ for an embedded project and the most remarkable thing happened. I had an object that represented a thread. My application created the object via the usual new operator and started the thread. The first thing the thread did was access its own thread object. It core dumped. Core dump analysis using gdb revealed that the virtual pointer in the object was null.

I know what you're thinking.

That's impossible.

That's what I thought too. That event led to a personal research project that consumed many of hours of my free time and led to my putting together a presentation on the Implications of Memory Models (or Lack of Them) for Software Developers. In it, I describe the research of others that reveals that modern processor memory subsystems have broken the memory models implicit in the software.

This includes software written in higher level languages like C, C++, and Java (prior to Version 6 anyway) and as well as that written in assembler. It includes multi-processors, multi-core uniprocessors, and even those single core uniprocessors that are hyper-threaded. The upshot is that, thanks to various processor, memory, and compiler optimizations, memory may not be read or written when you think it is from looking at your code. Or even the compiled code.

Java addressed this issue in Version 6, providing you use volatile or the language's built-in synchronization mechanisms. But C and C++ developers dealing with variables shared among threads must still resort to the appropriate use of the volatile and explicit memory barrier instructions, which may be provided by your threading library's synchronization functions, providing you use them.

Fortunately, at least gcc offers the generic __sync_synchronize() built-in that performs a platform-specific memory barrier with full fence semantics. Unfortunately, in their paper Volatiles are Miscompiled, and What to Do about It, Eric Eide and John Regehr at University at Utah reveal that code containing the volatile keyword is frequently miscompiled. The life of a systems programmer is seldom simple.

Recently I discovered Ulrich Drepper's lengthy white paper on What Every Programmer Should Know About Memory. He gives a detailed explanation of how the hardware architecture of cache and memory subsystems can affect performance and correctness of software. I recommend it to every one that does embedded or systems development.

(Actually, I recommend it to all developers. But my Java buddies have pointed out on more than one occasion that they don't really care to know how things work under the hood. That's an attitude that I can't relate to. But I was one of those kids who took clocks apart to see what made them tick.)

It took me a couple of weeks to read Drepper's 114 page paper cover to cover. The entire time I was reading his description of the MESI cache coherency protocol, I was trying, without much success, to reconcile it with my prior reading on processor memory models. (Briefly, the MESI protocol, which tries to keep the local caches of individual processor cores in sync, places each cache line in a state of Modified, Exclusive, Shared, or Invalid.) 

Then finally I got to section 6.4.2, "Atomicity Operations", page 68:

If multiple threads modify the same memory location concurrently, processors do not guarantee any specific result. This is a deliberate decision made to avoid costs which are unnecessary in 99.999% of all cases. For instance, if a memory location is in the ‘S’ state and two threads concurrently have to increment its value, the execution pipeline does not have to wait for the cache line to be available in the ‘E’ state before reading the old value from the cache to perform the addition. Instead it reads the value currently in the cache and, once the cache line is available in state ‘E’, the new value is written back. The result is not as expected if the two cache reads in the two threads happen simultaneously; one addition will be lost.

If this doesn't run chills down your spine, you're not thinking clearly. Drepper goes on to discuss some of the same atomicity operations I talk about in my presentation. His paper also details the write reordering which I suspect was the cause of my null C++ virtual pointer.

Why is all this stuff important?

In the early 1980s I was a computer science graduate student working on my thesis under the direction of my mentor Bob Dixon, in a research group that was looking at programming language and operating system architectures for massively parallel processors. This was when Japan's Fifth Generation project, dataflow architectures, and the massively parallel Connection Machine were big. At the time, it all seemed bleeding edge and, given our budget, largely hypothetical.

But just a few years later I found myself working at a national lab, the National Center for Atmospheric Research, which not only had a Connection Machine, but other exotic highly parallel systems including several Cray Research supercomputers with multiple processors and vector hardware.

A few years later still and I am more than a little surprised to find that I have a four-core Dell server running Linux in my basement. The future has a way of happening.

In The Landscape of Parallel Computing Research: A View from Berkeley, researchers David Patterson et al. describe the future that they see: not just multi-core, but many-core, the equivalent of a massively parallel processor running on your desktop, with the dozens or hundreds of cores sharing memory. Suddenly, understanding concurrency, synchronization, and memory models (or the eventual higher level languages that implement it all without the developer having to worry about it) seems pretty necessary.

For me, it's 1983 all over again.

Tuesday, February 17, 2009

The Vulnerability of Volatiles

As if having to deal with broken memory models isn't bad enough. In his latest column in Embedded Systems Design magazine, Jack Ganssle points us to research that shows that virtually all C compilers mis-compile code that uses volatiles. Here's the original article and presentation, both by Eric Eide and John Regehr at the University of Utah.

Monday, January 05, 2009

House M.D. and Evidence-Based Troubleshooting

Mrs. Overclock (a.k.a. Dr. Overclock, Medicine Woman) and I got sucked into the television series House M.D. over the holidays. The USA cable network was running a marathon of House reruns. We haven't watched the first-run episodes (which run on the Fox cable network) except mostly by accident, and the last thing we need is to watch more television. But Mrs. Overclock is understandably interested in medical dramas (and, unfettered by the need to fill an hour of air time, will often beat House and his diagnostic team to the solution). I was a little surprised to find that House tickles certain centers of my brain, too.

If you're not familiar with the show, Gregory House (played by British comic actor Hugh Laurie in an impressive dramatic turn) leads a diagnostic group of physicians at a hospital in New Jersey. Each week they to try to cure some critically ill patient with a mysterious illness. They don't always succeed. It's a tribute to evidence based medicine. They have to choose tests that won't kill the patient, without really knowing what's wrong with the patient. They keep eliminating possibilities, and racking their brains to think up new ones. Frequently there is more than one problem, and they interact in strange ways. And always, the clock is ticking.

It's exactly like troubleshooting large, complex, distributed, real-time, high-availability, production systems, like a PBX.

I've done my share of field support of such systems. I've lived in a small room with several other developers, clustered around a workstation looking at a remote customer system. And I've gotten on a plane with a laptop and a protocol analyzer in my checked luggage. It's just like House. You keep brainstorming ideas of what could be going wrong. You try to think of tests to isolate the problem, to indict a particular hardware or software component, without crashing the production system. Your pour through log files, frequently inventing filtering tools on the fly, to make sense of the fire hose of information, to eliminate possibilities. You have to always keep track of what you know you know, of what you know you don't know, always being prepared to discard a much loved hypothesis in the face of new evidence. And always, the clock is ticking.

Frequently when I do this kind of work, and particularly when I am doing development for these kinds of systems, I have to keep reminding myself that not only are careers at stake, but lives. Customers may depend on the my software to dial 911 when someone has chest pain. Even in a non-safety critical situation, rebooting to see if it fixes the problem isn't an option if it means dropping hundreds of in progress calls. House's team can't afford to take a cavalier attitude, and neither could I.

I found a lot to relate to, watching House M.D. If you want to know what developing for high-availability systems is like, you would be well advised to check it out.

Saturday, January 03, 2009

Abstraction in C using Sources and Sinks

The problem with telling people that I do embedded software development is that no one really knows what that means. Including me. I've worked on embedded systems that were so resource constrained that they were written completely in assembly code, had no operating system, and in order to fix a bug we had to mine the code base to find instructions we could eliminate in order to make room. I've worked on embedded systems that had a full blown multi-user Linux system as their operating system, and the code base was hundreds of thousands of lines of bleeding-edge C++ that looked like the poster child for the use of templates and dynamic memory allocation.

Somewhere in between there is a sweet spot in which the application is written in C, but could still benefit from an object oriented design and implementation. OO scares a lot of the old school embedded developers (you know who you are). Yet my introduction to the use of those classics of OO - abstraction, encapsulation, and inheritance - was in 1974 when I was programming in assembly and using IBM's OS/360 Input/Ouput Subsystem, which was written in the late 1960s. Many years later when I was studying how C++ worked under the hood, the idea of the virtual pointer and the virtual table seemed quite familiar. I'd seen it all before in the OS/360 IOS. You get as old as I am, you begin to wonder if you're ever going to see anything that's truly innovative.

One of the most useful applications of OO in embedded systems, in my not-so-humble opinion, is that of abstracting out I/O: the ability to write applications that are agnostic as to from where their input comes and to where their output goes. This is exactly what the IOS and its myriad of pointer-filled control blocks (what C programmers would call structures) accomplished.

I used this idea for one of my clients when I was hired to implement a wide assortment of bitmap-graphic processing functions for an existing C-based product whose digital hardware could fit in your shirt pocket. I quickly realized that a lot of the code I had to write - like a deflate-based decompression algorithm, or a decoder for Portable Network Graphics (PNG) files - would be used in many places in the system. I also realized that the data for these algorithms could be buffered in memory, could come streaming in from a serial, Ethernet, or USB ports, or could be stored in a flash-based file system. I badly needed some abstraction for sources and sinks that I could implement on this tiny, non-POSIX platform.

As I implemented it, a Source is an abstract interface that requires the implementation to provide a read byte and a push byte method. The read produces a byte if it is available, End Of File (EOF) if the Source is empty and no byte would ever be available, and End of Data (EOD) if the byte isn't available now but might be in the future. The push pushes a single byte back into the Source where it is guaranteed to be the next byte produced by a subsequent read. (This vastly simplifies any algorithm that can be described by a grammar with one-character look-ahead, for example LL(1) or LALR(1). Such grammars are extremely useful ways of describing any algorithm that does parsing or data transformation or indeed any state machine. Both the deflate and PNG applications mentioned above can be described as grammars, and from there the implementation in code is almost a mechanical process. But that is an article for another day.)

int readSource(Source * that);
int pushSource(Source * that, char data);

A Sink is an abstract interface that requires the implementation to provide a write byte method. The write consumes the byte provided by the caller if it can, returns EOF if the Sink is full and no byte will ever be consumed, and EOD if the byte cannot be consumed now but might be in the future.

int writeSink(Sink * that, char data);

There are also a close method for both Sources and Sinks, which may or may not do anything, depending on the underlying implementation. No open method? Nope, because as we shall soon see the open is part of the implementation, not the abstraction.

int closeSource(Source * that);
int closeSink(Sink * that);

As you can see, these simple byte operations expect a pointer to a Source or a Sink structure (what C++ programmers would call an object). What these pointers actually point to depends on the implementation. But the application doesn't need to know this. It just takes whatever Source or Sink pointer you give it and implements the application (what my younger colleagues would call the business logic) in an I/O agnostic manner.

What might an application look like that uses Sources and Sinks?

Here is a trivial one that just copies all of the data from a Source to a Sink, until either the Source is empty or the Sink is full, without having any idea what the Source and Sink really is (and not doing much in the way of error checking).

size_t copy(Source * source, Sink * Sink) {
size_t total = 0;
int data;

while (!0) {
if ((data = readSource(source)) < 0) {
if (writeSink(sink, data) < 0) {
pushSource(source, data);

return total;

Here's a code snippet of a calling sequence that uses the function above to copy data from a file to an existing socket represented by an open file descriptor. It does so by providing the function with a concrete implementation of its required source and sink.

FileSource file;
DescriptorSink descriptor;
Source * source;
Sink * sink;
size_t total;

source = openFileSource(&file, "/home/coverclock/data");
sink = openDescriptorSink(&descriptor, sock);
total = copy(source, sink);

What kind of Sources and Sinks might one implement?

A BufferSource implements a Source that produces its data from a fixed size memory buffer.

Source * openBufferSource(BufferSource * that, void * buffer, size_t size);

A BufferSink implements a Sink that consumes its data into a similar memory buffer.

Sink * openBufferSink(BufferSink * that, void * buffer, size_t size);

A FileSource produces data from a file in the file system (in Linux this could be just about anything, thanks to the /proc and /sys file systems).

Source * openFileSource(FileSource * that, const char * path);

A DescriptorSink consumes data to any file descriptor (like a TCP/IP socket).

Sink * openDescriptorSink(DescriptorSink * that, int fd);

A NullSink consumes all data without out any complaint and tosses it into the bit bucket.

Sink * openNullSink(NullSink * that);

A CompositeSource concatenates two Sources into a single Source.

Source * openCompositeSource(CompositeSource * that, Source * primary, Source * secondary);

An ExpanderSink writes each data byte to two different Sinks. (Yes, of course you can use another ExpanderSink as one or even both of the Sinks.)

Sink * openExpanderSink(ExpanderSink * that, Sink * primary, Sink * secondary);

A Fletcher8Sink computes a Fletcher 8-bit checksum as the Sink is written and automatically appends the checksum to the Sink when it is closed. (And before you ask, of course there is a Fletcher8Source that automatically verifies the data as it is read from the Source.)

Sink * openFletcher8Sink(Fletcher8Sink * that, Sink * primary);

A RingBuffer implements a circular buffer that exposes both a Source and a Sink interface.

RingBuffer * openRingBuffer(RingBuffer * that, void * buffer, size_t size);
Source * sourceRingBuffer(RingBuffer * that);
Sink * sinkRingBuffer(RingBuffer * that);

How are these Sources and Sinks implemented?

Easily. Most are a few lines of code, a few maybe a page. I'll be writing more about the implementation. But you can find the source code for these and other Sources and Sinks (and their unit tests) now in the
Digital Aggregates Concha distribution. Concha is a clean-room open-source implementation of the Source and Sink design pattern (which my younger colleagues will recognize as a form of dependency injection). It is licensed under the Desperado modified LGPL which allows static linking without any viral licensing implications.

Update (2011-02-21)

I improved the example slightly since the original publication of this article. Apologies as usual for the poor code formatting. (The Blogger editor doesn't respect leading spaces, even those explicitly coded as HTML special sequences, nor does it respect the pre HTML tag.)