Wednesday, November 22, 2006

Question-Driven Development

I wrote my first computer program when I was a freshman in high school sometime around 1970. Contrary to popular belief, I was not banging out code on clay tablets. But I was using punch cards. Even so, I was using a dedicated mini-computer, and just a couple of years later, transitioned to an interactive time-sharing system. I remember fondly a couple of years after that, trudging through hip deep snow, three miles, up hill, both ways, to the college computer center.

It took me more than thirty-five years of writing code to put a name to my personal software process: Question-Driven Development. I find that I am constantly asking questions about my code as it, more or less, magically appears on the screen in front of my eyes. Sometimes the question is what the hell is this? But just as frequently it is one of the following.

How will I test this during development?
  • Can I unit test every single code path?
  • Do I have error legs that are difficult to exercise?
  • Have I checked my code coverage using tools like Cobertura?
  • Do I have confidence that taking a rare error leg won't somehow escalate the problem?
  • Are there private inner classes that need to be tested independently?
  • Have I tested my settors and gettors?
  • Do I have reasonable defaults if the settors aren't used?
  • If settors must be used, why aren't those settings set during construction?
  • Can I have a no-argument constructor, and if not, why not?
  • Have I tested my component for memory leaks using tools like JProfiler?
  • Have I done static analysis using tooks like Klocwork?
  • Is this code thread-safe, and if not, why not?

How will I debug this during integration?

  • How can I tell if its a bug my component or another component?
  • Are the interfaces between components independently verifiable, and if not, why not?
  • Are there independent ways to indict my component versus other components?
  • Do I have a reasonable message logging framework?
  • Have I chosen the severity of each logged message appropriately?
How will I troubleshoot this during quality assurance?


  • Can I quickly indict the failing subsystem?
  • Will turning up the log level while under load firehose the screen or the log file?
  • Will turning up the log level while under load somehow escalate the problem?
  • If performance problems arise, can I determine where the time/space is being spent?
  • Do I have a way of independently testing my component in the context of the larger system?
  • Do I have a way of testing my component end-to-end with the other components?
How will I support this during customer deployment?


  • Can I indict our software versus other software?
  • Can I efficiently capture the forensic information I need in a transportable form?
  • Can I quickly get the customer's system back up and running?
  • What is my priority when a problem occurs: forensic data capture or returning to an operational system?
  • Can I quickly determine a plausible story to reassure the customer?
  • Is that story likely to be correct or am I going to look stupid later?
  • Are development resources available to quickly fix bugs and turnaround a new release?
  • Have I given the field support folks the tools they need to get their job done?
  • Can the field support folks do their jobs without calling me all the time at all hours?
  • Have I chosen the severity of each of my external alarms appropriately?
  • What is the cost of raising an alarm?
  • Can the system recover automatically or are there circumstances where our field support folks need to roll a truck?

How will I transition this to another developer during my move to another project?

  • Will the new developer need to be a subject matter expert like me?
  • Do I need paper documentation or is a wiki or javadoc more appropriate?
  • Have I left the new developer a mess to clean up?
  • Do I have deprecated code that I need to remove?
  • How do I know for sure that no one is using the deprecated code?
  • Is the deprecated code unit tested?

What other questions should I be asking?

Does anybody really know what time it is?

Does anybody really care? With apologies to Chicago, I am convinced that the answers are no, and yes, respectively.

Calendars and timekeeping, like music, model railroading, firearms, motorcycles, and a few other pasttimes, are one of those interests that seem to keep cropping up among technical folk. I fell into it myself a while back, and it consumed two years of my copious free time.

My interest was motivated by two things that irritated the heck out of me. First, more than once I ran into bugs in production software that miscalculated dates and times, causing me no end of grief and misery. I kept thinking, this can't be that hard. (I was both right and wrong on that count.) Second, on a single legacy software product that I worked on, there must have been at least a half dozen different ways in which the date and time was represented to the end user. I kept thinking, there has got to be a standard way of doing this. (I was right on that one.)

Realizing that I never really understood something until I either taught it or implemented it (typically both), I ended up developing the time and date routines in the Desperado library of embedded open source C++ components available from Digital Aggregates. Developers working on platforms like Linux or Java, both of which provide quite usable support for dates and times, may never really appreciate how much work must have gone into those systems. Developers working on embedded platforms which lack such support should hope they never need it.

This article is a collection of fun facts to know and tell that I learned along the way about calendars and timekeeping. It is in no way a tutorial or a technical refererence (although I provide some of those at the end).

There is a standard for representing date and time: ISO 8601. It allows a few variations in the basic format to allow for cultural differences in representation and for readability by both humans and machines. But it enforces a consistency that avoids the ambiguity that still arises as to whether 02/01 is the first of February or the second of January. Plus, its format results in the character-representation of dates sorting into chronological order. One form of a ISO 8601 conformant date and time stamp would look like "2006-11-22T11:54:04-07:00", which encodes the date, local time, and the offset from UTC.

There is a standard calendar, the Common Era, which is a proleptic assumption of the Gregorian calendar. Proleptic means that the Common Era calendar extends backwards in time as if the Gregorian calendar had always existed. See, here's the thing. The Gregorian calendar was named for and endorsed by Pope Gregory XIII circa 1582, replacing the Julian calendar named for that Roman emperor, which tells you something about how long it had been around. But the Gregorian calendar was not universally adopted until 1923. Yeah, that's right, folks were getting confused on dates well into the twentieth century.

What this means for those of you who are building that time machine in your basement is: I have no idea how your date-and-time controls are going to work. The date is going to depend on exactly when and where you are going. If you are dropping into Greece in 1901, you are on your own, because their 1901 is not likely to be your 1901. You might be tempted just to represent everything in seconds-in-the-past, but that's problematic too (we'll get to that in a bit).

The Common Era has no Year Zero. The first year is 1C.E. (A.D. is a Gregorian calendar term that has unpleasent connotations for non-Christians), and the year before was 1B.C.E. (B.C. ditto).

The Common Era calendar (and the Gregorian calendar) repeats every 400 years. Yes, that's right, the algorithm you've known since childhood of a leap year every four years is wrong. It isn't a leap year if it falls on the century (the year ends in 00), unless it is the fourth century, in which case, it is.

Around 1882, a Protestant rector in Germany named Christian Zeller needed to predict on what Sunday Easter would fall. He became immortalized by his Zeller's Congruence Formulae that calculates on what day of the week any date in the Gregorian calendar will fall. Everyone uses this; don't even think about developing your own algorithm.

There is a standard clock. In fact, there are several standard clocks, depending on what kind of time you want to keep. We like to think of a day containing twenty-four hours. The problem is the concept of a "day", which is tied to the Earth's rotation. Apparently the Earth hasn't been wound up lately, because it appears to be slowing down.

International Atomic Time (TAI) is the time kept by the most precise timepieces we know how to make, atomic clocks that measure the natural vibration of the cesium atom. TAI has no correlation to the Earth's rotation. Atomic clocks aren't perfect either, but they are the best we know how to build.

Coordinated Universal Time (UTC) is the time we all know and love, the time at the Prime Meridian, once known as Greenwich Mean Time (GMT). (If you have never been to the Greenwich Observatory, I recommend it. Take the ferry.) UTC is tied to the Earth's rotation. So to keep UTC in sync with the slowing of the daily cycle of our planet, occasionally a leap second has to be inserted into the UTC timekeeping.

The decision as to when this is done is made by the International Earth Rotation Service. ("Hands up! International Earth Rotation Service! Nobody move!") So far, a leap second has been added to UTC twenty-three times, first in 1972, the most recent in 2005. The IERS is capable of inserting more than one leap second at a time, or even removing leap seconds in case the Earth starts speeding up (don't laugh, they're serious when they say this). Leap seconds are generally inserted at the end of a calendar year, but not always: nine times a leap second has been inserted (one imagines, on some kind of emergency basis) at the end of June.

(The acronyms TAI and UTC represent the official names of these measurements as written in French.)

The Global Positioning System (GPS) works strictly off computations based on the differences in timestamps received from satellites whose orbits are known with very high precision. (When I read about this, my first thought was: this really is Rocket Science!) Every GPS satellite carries multiple redundant cesium or rubidium atomic clocks. Because of this, GPS time is absolutely without question the most accurate time we know how to keep. Your cell phone is by far the most accurate timepiece you are ever likely to own, because, unless your service provider really hoses it up, the telephone system is synced to GPS time. (Before GPS existed, AT&T actually owned a couple cesium atomic clocks just for this purpose.)

Trick question: how many time zones are there? Answer: twenty-five. Not twenty-four. The time zone that falls on the Prime Meridian extends thirty-minutes on either side of it. Take a one-eighty around the Earth and you find the International Date Line (IDL), which effectively splits its time zone in two thirty-minute zones, each twenty-four hours apart. So in that sixty minute window, it is the same clock time on either side of the IDL, but the zones are a day apart.

The U.S. military has a cool way of indicating time zones using a single letter, as in 23:30T. Z indicates the time zone on the Prime Meridian. Since Z is spoken as "Zulu" in the International Phonetic Alphabet (all the cool kids are doing it), that's why time given in UTC is sometimes said to be "Zulu time". The time zone you are standing in can be indicated with a J, so local time is "Juliet", no matter what time zone you are in. That conveniently leaves twenty-four other letters to indicate the twenty-four time zones other than that of the Prime Meridian. The U.S. Mountain Standard Time zone I live in is T or "Tango", which is UTC-07:00. This of course completely ignores the fact that lots of places have time zone offsets of fractions of an hour. This may explain why the U.S. has never invaded Newfoundland (UTC-3:30). On the other hand, they did invade Afghanistan (UTC+4:30), so your mileage may vary.

Daylight Saving Time (DST) (note: "Saving" and not "Savings") is a wonderful complication if you are trying to write code. First of all, different countries that have some form of DST - and many do not - have different notions of when it begins and ends. Most of the U.S. implements DST, but not all. The U.S. has thus far used three different dates when DST begins and ends, first in 1966, then changing their minds in 1986 and 2007. The dates when DST begins and ends are always in the form of "the second Sunday in March" or "the last Sunday in October", making life especially pleasant for the developer. The decision of when to change when DST begins and ends in the U.S. is done by our elected representatives, and as you might guess, is fraught with politics.

Two years of hobby labor resulted in the Desperado C++ classes CommonEra, LocalTime, AtomicSeconds, LeapSeconds, TimeZone, DaylightSavingTime, and others. I am reasonably confident (so far) that I got it right.

Update (2011-07-08)

David Finkleman and his colleagues published an article in American Scientist on the proposal by the Radiocommunications Sector of the International Telecommunications Union (ITU-R) to cease adding leap seconds to Coordinated Universal Time (UTC) so that it would no longer be kept in sync with the earth's rotation. The ITU-R organization is proposing this because it's just too hard.

GPS time already forgoes the use of leap seconds, and as a result, just in the time that GPS time has existed, the wall clock time we know from the rotation of the Earth, slowing due to the drag caused by the lunar tides, has drifted fifteen seconds from the time GPS keeps through the use of atomic clocks. Hardware and software that displays wall clock time, but which synchronizes to GPS time, has to keep track of those missing leap seconds and add them back in. If such a change were to be made, it would cause UTC to be completely abstracted away from the wall clock time we use on a day to day basis.

Is this a good thing, or a bad thing? I don't know. But it sure as heck is not a minor thing. I got into the study of time for two reasons: it turned out to be critical to many of the products I've worked on in the past two decades, and much of the software I've used gets it wrong. UTC diverging from the wall clock time we use to manage our systems will only make this harder.

I've added the reference to this paper in the sources below.

Sources

I. R. Bartky, E. Harrison, "Standard and Daylight-saving Time", Scientific American, May 1979

P. Chan et al., The Java Class Libraries, Second Edition, Volume 1, Addison-Wesley, 1998, pp. 1775-1776

D. Finkleman et al., "The Future of Time: UTC and the Leap Second", American Scientist, Volume 99, No. 4, July-August 2011, pp. 312-319 (also here and here)

ISO, Data elements and interchange formats - Information interchange - Representation of dates and times, ISO8601:1988(E), International Organization for Standardization, Geneva, Switzerland, First edition, 1988-06-15

ISO, Portable Operating System Interface (POSIX) - Part 1, ISO9945-1:1996(E), Annex B, 2.2.2, International Organization for Standardization, Geneva, Switzerland

ITU-R, Standard-frequency and time-signal emissions, ITU-R TF.460-6, International Telecommunications Union, Radiocommunication Assembly, 2002

G. Klyne, C. Newman, Date and Time on the Internet: Timestamps, RFC 3339, The Internet Society, July 2002

NIST, The NIST Reference on Constants, Units, and Uncertainty

E. M. Reingold, N. Dershowitz, Calendrical Calculations, Millennium Edition, Cambridge University Press, Cambridge, U.K., 2002

J. R. Stockton, Date and Time

B. N. Taylor, Guide for the Use of the International System of Units (SI), Special Publication 811, NIST, 1995

B. N. Taylor, Metric System of Measurements: Interpretation of the International System of Units for the United States; Notice, Part II, NIST, July 1998

B. N. Taylor, ed., The International System of Units (SI), Special Publication 330, NIST, 2001

U. S. Code, Title 15, Chapter 6, Subchapter IX, "Standard Time"

"Leap Seconds", U. S. Naval Observatory

M. Wolf, C. Wicksteed, Date and Time Formats, NOTE-datetime, World Wide Web Consortium, September 1997

Desperado, Digital Aggregates Corp., 2006

Monday, November 20, 2006

Chip's Instant Managed Beans

Logging informational and error messages is a common approach to debugging in the lab and troubleshooting in the field. So common, that there are two widely used Java frameworks for managing message logging: the logging package that now comes with Java, and the more enterprise-oriented Apache log4j. And every C/C++ project I've worked on in the past decade or so had some kind of log message management framework, even if it was roll-your-own.

But message logging sometimes isn't all its cracked up to be. Frequently it isn't even possible. Most embedded projects I've worked on in the past three decades had no persistent store of any kind. The smaller projects were lucky to have a console serial port buried on the board somewhere, and the more complex ones might have had an Ethernet port. But even if there was a consenting syslog server to which to log messages, logging wasn't always practical or possible on real-time systems because of the processing overhead and bandwidth required to log the potential firehose of output when things really got sideways. More than once, unfortunately sometimes while troubleshooting a customer system, I have turned up the log level only to discover that the CPU cost was so high the hardware missed its watchdog poke. Wackiness ensued.

Even under the best of circumstances, a huge message log is frequently just more information than you need. Nothing like having your trusty field support person FTP a twenty megabyte log file to your desktop and tell you the customer wants to know what happened by close of business. (Fortunately, a helpful manager will call you every twenty minutes to "assist with your analysis".)

My embedded mentors Tamarra Noirot and Randy Billinger patiently taught me another approach that is an alternative or sometimes a supplement to logging: collections of counters managed as objects and exposed upon demand to the outside world. Sometimes you would give an eye tooth to know if at any time in the past had the board ever encountered a hard error on an I/O port, had the signaling stack ever thrown an exception, had a DSP ever been rebooted, had an incoming packet ever arrived with an invalid checksum. If you (or your trusty field support person) could just query the system to see if any of those error counters had ever incremented, you could, perhaps, at least come up with a plausible story of what probably happened. And maybe even get a clue that would help you fix the problem. Or at least shift the blame. Incrementing a counter is a whole lot cheaper than logging a message, and frequently carries just as much information.

There are several mechanisms to expose counters to the outside world. Sometimes it's a simple command line interpreter and a serial port. On systems which have an internet prototocol stack, it's telnet or secure shell. On yet more complex systems, it's an SNMP MIB.

If you are developing in Java, you're in luck: the latest release comes with a remote management capability: managed beans. A bean is a common Java design pattern. It is a class which has a no-argument constructor, its properties are accessible and modifable via public gettor and settor methods named according to a convention, and it is serializable. Hence, Java objects that are beans can be created, accessed, modified and even persisted, using Java's reflection mechanism, without any prior knowledge of the implementation. A managed bean, or mbean, can be registered with an mbean server so that it can be accessed and modified through the server, even remotely across a network. Java 1.5 has a built-in platform mbean server, and also includes jconsole, a GUI tool with which to browse registered mbeans.

This stuff makes embedded developers' mouths water. It you're not into Java, imagine having a standard mechanism that allowed you to export C++ objects to an SNMP MIB, than having a MIB browser that let you examine the contents of those objects, change them, and even call methods inside of them. It would give you the capability to monitor and control applications from the laptop in your office, even if those applications are deeply embedded in your remote servers without any external user interface.

As much as this sounds like rocket science, implementing a standard mbean in Java is actually pretty simple. But standard mbeans, whose capabilities and interface are fixed at compile time, aren't that flexible. Dynamic mbeans, on the other hand, are incredibly flexible, but not that simple to implement.

What we need is a serving of Chip's Instant Managed Beans. Instant mbeans are general-purpose dynamic mbeans that allow you to instrument your application with almost no effort at all. Instant mbeans come in two delicious flavors: Counters and Parameters.

The Counters instant mbean lets you expose an array of long integer counters inside your application to a management tool like jconsole. You can watch activity counters inside your application increment, giving you that warm fuzzy feeling that your software is actually doing something. You can reset error counters and see if they change. You can monitor high water marks. You can even alter tuning parameters in real-time.

And instant mbeans are so easy to use. Here is a code snippet that creates an instant mbean for some error counters in an application.

enum Counter {
RECEIVED_INVALID_MESSAGE,
ILLEGAL_STATE_ENCOUNTERED,
CAUGHT_IO_EXCEPTION,
SHOULD_BE_IMPOSSIBLE
}
 

Counters counters = new Counters(Counter.class);

Next we register the instant mbean with an mbean server under a Java object name. Sure, we could use any mbean server and object name we wanted to, but if we don't do anything more than the code snippet below, the Counters mbean will be registered with the default platform mbean server under a perfectly usable default object name.

counters.start();

Now if our application encounters an illegal state in its state machine, it executes the following code snippet to increment the appropriate counter. It can even log it, as shown below.

counters.inc(Counter.ILLEGAL_STATE_ENCOUNTERED);
 

logger.warning("illegal state encountered: count="
+ counters.get
(Counter.ILLEGAL_STATE_ENCOUNTERED));

If we remembered to run our application with the JVM flag

-Dcom.sun.management.jmxremote

so that the platform mbean server allows connections from the outside world, then all we need to do is run jconsole on the server itself, or remotely using the appropriate URL, and the values of all our error counters will be visible through the GUI to watch and even to modify.

Here is a screen snapshot of jconsole managing an application using exactly the code from above.

Counters

Chip's Instant Managed Beans are just that simple. But wait, there's more!

If your application is configured using, for example, a Java properties file, you can use the Parameters instant mbean to expose the Properties object. You can observe through jconsole what the configuration parameters are, and even alter them in real-time. And no inefficient polling here! The Parameters instant mbean lets you register a callback object so that your application will be notified when a parameter has been changed through jconsole.

Given the properties file

ConfigurationFile.properties

with the following keyword=value pairs

AcceptTimeoutPeriodMs=2000
MaximumQueueSize=200
Temporary=/var/tmp
RootPath=./Buckaroo

here is a code snippet to create a Parameter instant mbean from a Properties object.

Properties properties = new Properties();
InputStream stream =
new FileInputStream
("ConfigurationFile.properties");
properties.load(stream);
 

Parameters parameters = new Parameters(properties);
parameters.start();

Now we can use jconsole to see the how our application was configured, and even to change the configuration as it runs.

Here is another screen snapshot of jconsole managing an application using exactly the code from above.

Parameters

You can start out using instant mbeans to quickly get your application instrumented. Then, as the mood strikes you, use the instant mbeans as examples for developing your own more complicated managed beans.

The Counters and Parameters instant mbeans are part of the Buckaroo open source Java library available from Digital Aggregates under the Apache License. Chip's Instant Managed Beans makes remote application management easy and fun for the whole family!

Sources

Buckaroo, Digital Aggregates Corp., 2006