Saturday, August 14, 2010

Memory Models and Compiler Optimization

Those software developers who have the misfortune of being forced to hang around with me know one of my favorite topics is how memory models presented by optimizing compilers and modern microprocessors have broken the contract between most programming languages and the hardware on which they run.

This has prompted the C++ standards folks to propose augmenting the volatile keyword with a new atomic keyword in the 2010 draft of the ISO/IEC 14882 standard. This new keyword attempts to provide a synchronization mechanism similar to what Java currently does with its own volatile keyword, insuring that memory is read and written atomically using whatever means is at hand on the underlying hardware platform. (The C++ volatile keyword merely means that memory can change unexpectedly, behind the executing program's back as it were, for example as in the case of I/O devices that have control and data registers mapped into memory.)

Sarita V. Adve (University of Illinois at Urbana-Champaign) and Hans-J. Boehm (Hewlett-Packard) are two names that crop up a lot as authors of much of the material I read on this topic. I just recently read their article in Communications of the ACM,

S. Adve, H. Boehm, "Memory Models: A Case for Rethinking Parallel Languages and Hardware", CACM, 53.8, August 2010, pp. 91-101

and I found it to be an excellent yet readable survey on the topic. One of the more amusing examples they give about the issue of compiler optimization involves the following C code snippet. Given external integer variables X and Y and two local integer variables R1 and R2:

R1 = X;
R2 = X;
Y = (R1 == R2);

This may make no sense to developers who are not in the habit of dealing with concurrency such as multithreaded code or device drivers, but to the rest of us it seems pretty clear: Y is set to false if and only if the value of X changed in between the two times it was accessed. But if X isn't declared volatile, the compiler doesn't think it makes any sense either. The developer has taken advantage of a data race condition of which the compiler is completely unaware. Here are the optimizations that are likely to be performed.

First, the compiler recognizes that there is no point in accessing X twice, since it already has the value in R1.

R1 = X;
R2 = R1;
Y = (R1 == R2);

Then it realizes that there is no point in checking for equality, since it knows that R1 and R2 have the same value;

R1 = X;
R2 = R1;
Y = 1;

Then it sees that the value of Y does not depend on the values of any of the other variables, so it hides the memory subsystem write latency by moving the write operation earlier.

Y = 1;
R1 = X;
R2 = R1;

The resulting code bears no resemblance functionally to what the programmer wrote, yet the compiler optimizations are perfectly legal, and in fact are likely to be performed by the C compiler you are using today. (I really like this example, because I have written C code almost exactly like the original snippet, where X is a volatile hardware time base register.)

Adve and Boehm assert, as many of their colleagues do, that relying on such a data race, instead of using real synchronization mechanisms such as memory barriers, atomic memory operations, mutexen, semaphores and the like, even for read-only variables, leads to unreliable code. Even if it works today, it might not work after your next compiler upgrade. And as others have pointed out, depending on the C and C++ volatile keyword to be implemented correctly by your compiler vendor is unfortunately also an iffy proposition.

Adve and Boehm point out some of the widespread practices in software development that make solving the memory model problem difficult:
  • global address space;
  • imperative languages;
  • object oriented design; and
  • pointer based data structures.
Holy cats, is that all? Yeah, I think that's enough. Reminds me of my graduate school days during the furor over the Japanese "Fifth Generation" project in which Japan was going to use massively parallel data flow architectures and artificial intelligence to make all traditional computing systems obsolete. The research group I was in was looking at operating system and programming languages architectures for such highly concurrent systems. We played with designs incorporating functional languages, single assignment, lazy evaluation, and other far out ideas. As far out as those ideas were, though, many of them turned out to be used by the designers of today's optimizing compilers; it's all under the hood, used in intermediate forms that your source code takes along the path from C and C++ to machine code.

To paraphrase computer scientist Tony Hoare, I don't know what the super concurrent programming language for future massively many core processors will be, but I have an inkling it might not be called C or C++.

Saturday, July 17, 2010

Cascada and the BeagleBoard

Readers who are interested in embedded development may recall Diminuto and Arroyo, my projects to develop a simple Linux and GNU-based teaching environment around the Atmel AT91RM9200EK evaluation board. That evaluation board uses the AT91RM9200 System on a Chip (SoC) based on the 32-bit ARM9 processor core. This SoC and other AT91 variants have been very popular over the years in embedded systems in my experience. My only complaint about the AT91RM9200EK was it's price tag: around US$1200, which made it a little pricey for outfitting a lab with several workstations for teaching a class.

Some time ago, Texas Instruments introduced a new SoC family, the OMAP (Open Multimedia Application Platform). The BeagleBoard is a single board computer that uses the OMAP 3530 SoC, which is based on the 32-bit ARM Cortex-A8 processor core. The OMAP 3530 is an asymmetric multi-core system which includes a digital signal processor (DSP) core and a graphics processor, making it very popular in hand-held devices like mobile phones. The BeagleBoard retails for around US$150, and is trivially capable of running Linux, not to mention Linux-based systems like Google's Android. It includes peripherals such as USB ports, an SD card slot, video out, and audio in and out. All of this is in a form factor that will fit in your shirt pocket.

Several vendors have produced daughter cards for the BeagleBoard that provide additional peripherals. Digi-Key along with chip vendor Micrel produces one such daughter card, the Zippy2, which adds an RJ45 Ethernet jack, a second serial port, a real-time clock, a second SD card slot, and other useful stuff, all in a form factor identical to that of the BeagleBoard. The Zippy2 retails for about US$100. Add a few tens of dollars of cables and other miscellanea and you have yourself a real system.

I just completed soldering the expansion connector onto my very own BeagleBoard, joining the BeagleBoard to my Zippy2, installing the pre-built U-Boot and Linux images onto an SD card using my Linux server, and booting it all up. Entire process: about two hours. Seriously. That includes opening the boxes and setting up my soldering iron.

I expect to spend the next few months playing with this new project, which I have dubbed Cascada. This will include porting some of my Diminuto, Arroyo, and Desperado software over to it. You may expect to be subjected to an article or two on this effort.

Never has learning systems programming and embedded development been so easy or so cheap.

Sunday, July 11, 2010

Cognitive Dissonance

I reckon that spending fifteen years attending and working at a university qualifies me as a former academic, even through my time in that role ended two decades ago. I went on to work at a national laboratory and supercomputer center that was funded in part by the U.S. National Science Foundation (NSF), and then eventually to Bell Laboratories. I'm living proof that you can have a pretty good career just making it up as you go along, and I got a lot of insight into the full spectrum of the research and development work experience, from academia to federally funded big science to corporate R&D.

So it was with real interest that I read Beryl Lieff Benderly's article "The Real Science Gap" in Miller-McCune magazine, published by the non-profit Miller McCune Center for Research, Media, and Public Policy. Benderly observes something I've been wondering about for some time: if we're so short of scientists and engineers, as our politicians, high-technology pundits, and industry leaders would have us believe, how come there are so many people with advanced science and engineering degrees that are under-employed or unemployed?

Benderly describes how our current system of research in the United States evolved: following World War II, the U.S. government established a system of institutes and foundations that funded universities, the universities funded researchers, post-doctorate fellows, and graduate students, and the results of their research sometimes made it into products and technologies manufactured and applied by corporate America. But over time, the entry-level jobs for newly minted Ph.D.s dried up as funding focused on those few research organizations that have been very successful, leaving them, with just a few exceptions, out of work or at best working at relatively low-wage jobs as post-doc research assistants. Benderly's article is well worth reading no matter where you are personally on the R&D employment spectrum.

Coincidentally, right after reading that article, I came across Andy Grove's article "How to Make an American Job" in the July 5/July 11 2010 issue of Bloomberg Businessweek. Grove, you may recall, is an engineer who was employee #3 of Intel and went on to run the company. Grove makes a similar observation about how corporate R&D used to be the driver of the creation of manufacturing jobs in the U.S., but now is mostly stimulating job creation overseas.

His article is also worth reading, as is the rebuttal by Vivek Wadhwa, "Why Andy Grove Is Wrong About Job Growth", the gist of which is: thanks to globalization, things have changed since Intel was founded in the 1970s, and who wants those Foxconn jobs anyway?

The fact is, there is are useful nuggets of wisdom in all three articles. But rather than pontificating on them myself, I'd like to encourage you to read all three articles and see what you think.