Sunday, May 27, 2007

Alternative URL

A notice from Housekeeping: as of a few weeks ago, you can access this blog by the URL http://www.chipoverclock.com, in addition to the usual http://coverclock.blogspot.com/. Either ends up in the same place. I expect the first URL to remain usable even in the unlikely event the host for this blog changes. Thanks!

Friday, May 25, 2007

Survival of the Sickest

Humanity continues to be plagued by genetic diseases like sickle cell anemia and type 1 diabetes. In his book Survival of the Sickest (William Morrow, 2007), Dr. Sharon Moalem asks "Why hasn't natural selection eliminated these diseases?"

For some, like family traits towards hypertension or breast cancer, it's obvious: they don't kill us until after we've reproduced, so there is little or no genetic pressure against them. But for others, like the two I mentioned above and others which can take their victims in the prime of life, this is clearly not the case. So what's the deal?

The deal is this: as bad as these genetic diseases are, they provide some immunity to something that is even worse. They have survival value.

Sickle cell anemia occurs predominantly in populations of African descent. It offers resistance to malaria, because the malformed blood cells are not good hosts for the malaria parasite. Type 1 diabetes is statistically more common in populations of Nordic descent. It offers resistance to frost bite by virtue of the anti-freeze properties of high blood sugar. Hemochromatosis is a genetic disease where the iron levels in the blood are so high that it can lead to organ failure. A huge percentage of the population of Western European descent have at least a mild form of this malady, which in turn gives them resistance to the bubonic plague. (It is also why treating folks with leeches and otherwise bleeding them may actually have made them better.)

Moalem (with co-author Jonathan Prince) offers many other counter-intuitive insights of how dreaded genetic diseases may have kept your ancestors alive, in this medical version of Freakanomics, another book I much recommend. The chapter on epigenetics would have made my hair stand on end, if I had any. The fact that your maternal grandmother smoked while she was pregnant with your mother had a direct effect on the expression of the genes in your body today.

And so it is with large software systems. How does a software application get to be eight million lines of code?

One line at a time, baby, one line at a time.

No one starts out to write an eight million line program, just as natural selection doesn't set out to create genetic diseases. Everything was done for a reason, to provide some necessary capability or to solve some problem. Successful software systems evolve and grow organically, just like biological systems. And as David Parnas asserts in his classic paper "Software Aging" (Proceedings of the 16th International Conference on Software Engineering, IEEE, May 1994), software is no more immune to the effects of entropy than any other artifact in the physical universe.

Continuous software maintenance inevitably occurs in any successful software product. The only software products that don't change are the ones that failed in the marketplace. The fact that this frequently results in an awkward architecture, a compromised design, and spaghetti code just means that the software was a victim of its own success, and the fact that it is rarely economically feasible to scrap it and start over. In evolution we call scrapping something and starting over extinction.

In his book Object-Oriented and Classical Software Engineering (McGraw-Hill, 2002), Stephen Schach cites studies that shows that fully two-thirds of the total cost of the entire software development lifecycle is maintenance (and I have read studies that cite higher numbers). Software reliability pundit Les Hatton cites studies that show that only 20% of code is written as part of initial development; 40% is corrective maintenance or bug fixing, 16% is perfective maintenance or improvements, and fully 24% is adaptive maintenance or simply making changes because something in the external environment changed.

We apparently spend a lot of time just adapting to the latest Linux kernel or Java Virtual Machine release, porting to a new processor platform, wondering if the newest GCC will subtly alter the behavior of our code base, or dealing with the effects of the IT department upgrading from 10Mb/s to gigabit Ethernet on the servers. This is why I think way too little attention is paid to designing systems that are easy to modify. If you are not addressing the largest single piece of the software development pie, then you are not controlling your costs. Adapt or die.

This is one of the reasons why innovation seems so much easier in start-ups. They are too young, inexperienced, and immature to realize that what they're doing "can't be done". They don't have to shoehorn their new product into an existing successful portfolio. They don't have to worry that they may disrupt a revenue stream that supports the livelihood of many folks in their organization. And they don't have to carry around the legacy of their past successes, because they haven't had any. The risk of failure is large, but the cost of failure is small. It's also why large established organizations may find innovation by spinning off a tiny subsidiary and shielding it from the rest of the bureaucracy.

In evolution, the vast majority of mutations are neutral or lethal. But some become a benefit to survival. Likewise, we should expect a lot of new product development to fail. A few will create a new billion dollar industry. Two-time Nobel prize winner Linus Pauling once said "The best way to have a good idea is to have a lot of ideas." Innovation is not for the risk-averse, in either the software world or the biological world.

Evolution has (so far) chosen not to scrap the human DNA and start over, even though doing so might lead to a more efficient design without so much questionable baggage. Likewise, we are understandably reluctant to do the same with our large successful software systems.

Thursday, May 10, 2007

Value Is Where You Find It

Here is the story as it was related to me.

More than a decade ago, the Pfizer pharmaceutical company was developing a new drug to treat hypertension. Unfortunately, the phase one clinical trials of sildenafil citrate were so disappointing that the researchers decided to cut their losses and end the trials early. They asked the participants in the trial, who were predominantly older men, to return the unused drug samples.

Many of them refused.

Then someone broke into the clinic from which the trials were being run and made off with some of the remaining sildenafil samples.

The researchers knew something was up, but what? They started interviewing the participants, and that is when they discovered that their experimental drug, which would eventually be marketed under the name Viagra, had an unintended side effect.

I’d like to think that once the researchers realized the implications, more than one of them swung by the local Mercedes dealership on the way home from work.

Sometimes the value in your product isn’t where you think it is.

About the time those clinical trials began, I started working for the research and development arm of a large high-tech equipment manufacturer. During the next decade, I worked on seven different development projects, for as many as six release cycles, all but one of which led to a successful commercial product. All of those projects, even the one that was cancelled (maybe especially the one that was cancelled), produced useful collateral beyond the product itself. This collateral included reusable code, useful design patterns, and best practices, not to mention the experience stored inside the heads of the developers.

Seldom if ever did I see this R&D organization try to leverage this collateral beyond breaking up high-performance teams and reassigning individual members to unrelated projects where for the most part they were encouraged to shut up and get busy. It’s hard to imagine a poorer use of human resources, or a worse job of leveraging the full value of a project.

Sometimes the value is in the people.

When I finally resigned from that organization, the predominant emotion I felt was gratitude. My time there was a tremendous learning experience. I worked on many different unrelated projects, learned to come up to speed quickly, worked with a lot of great people, got along well with a broad range of personalities, exercised a variety of skills in several different roles, developed expertise in several different processes, languages, and tool chains, was exposed to lots of architectures and design patterns, and learned a lot about the various product lines and the business side of the house. I could not have asked for a better training program, and it has served me well in my subsequent work.

When it came time for me to literally walk out the door of that organization, I had to hunt down an administrative assistant to take my badge and walk me out of the building. No exit interview, no meeting with anyone from HR, no sign of any manager, not even, as had happened in the past, security guards to escort me out. If I didn’t have a sturdy ego, I would have had some self-doubts.

I’ve given a lot of thought about why this organization might have failed to leverage potentially valuable collateral, whether it was in products or people.

The use of silos in the organization may have made it difficult to leverage collateral from a project in one silo to one in another silo. Silos made it much harder to transfer staff and other resources, temporarily or permanently. It also made it difficult to leverage the experience of engineers who had been successfully transferred out of a silo back to the legacy systems on which they had once worked when customer problems occurred. I remember being specifically forbidden by one department head to consult on a complex legacy product for which I was one of the few remaining developers. Good for his silo, bad for the other, and definitely bad for the customer.

A forced ranking system may have rewarded the hoarding of information and technology. It introduced a natural selection towards producing results efficiently in the short term, without regard for any long term consequences. It punished the reporting of negative results, ignoring the fact that negative results are still results. Although learning what doesn’t work is what we call “experience”, admitting that you tried something that didn’t work made it a little more likely you would be in the next wave of layoffs.

For sure the trend of treating engineers as a fungible commodity didn’t help.

Exploiting collateral, whether it’s people, technology, or unintended consequences, may not be cheap. Merely identifying it can be expensive. Committing to using it carries risk. Exploits like code reuse frequently fail for legitimate reasons. The exploitation may only pay off in the long run, and so requires a long term view.

Engineers frequently prefer to reinvent rather than reuse. It takes some experience to realize that there is no glory in solving a problem that has already been effectively solved by someone else. Some programming languages make reuse high effort. Some software designs make it difficult.

Sildenafil citrate was an unusual failure. Most negative results do not give rise to a whole new, immensely profitable, industry. But I wish we did a better job extracting the full value of our work, whether it’s in the technology, the people, the processes, or the products themselves.