Saturday, August 14, 2010

Memory Models and Compiler Optimization

Those software developers who have the misfortune of being forced to hang around with me know one of my favorite topics is how memory models presented by optimizing compilers and modern microprocessors have broken the contract between most programming languages and the hardware on which they run.

This has prompted the C++ standards folks to propose augmenting the volatile keyword with a new atomic keyword in the 2010 draft of the ISO/IEC 14882 standard. This new keyword attempts to provide a synchronization mechanism similar to what Java currently does with its own volatile keyword, insuring that memory is read and written atomically using whatever means is at hand on the underlying hardware platform. (The C++ volatile keyword merely means that memory can change unexpectedly, behind the executing program's back as it were, for example as in the case of I/O devices that have control and data registers mapped into memory.)

Sarita V. Adve (University of Illinois at Urbana-Champaign) and Hans-J. Boehm (Hewlett-Packard) are two names that crop up a lot as authors of much of the material I read on this topic. I just recently read their article in Communications of the ACM,

S. Adve, H. Boehm, "Memory Models: A Case for Rethinking Parallel Languages and Hardware", CACM, 53.8, August 2010, pp. 91-101

and I found it to be an excellent yet readable survey on the topic. One of the more amusing examples they give about the issue of compiler optimization involves the following C code snippet. Given external integer variables X and Y and two local integer variables R1 and R2:

R1 = X;
R2 = X;
Y = (R1 == R2);

This may make no sense to developers who are not in the habit of dealing with concurrency such as multithreaded code or device drivers, but to the rest of us it seems pretty clear: Y is set to false if and only if the value of X changed in between the two times it was accessed. But if X isn't declared volatile, the compiler doesn't think it makes any sense either. The developer has taken advantage of a data race condition of which the compiler is completely unaware. Here are the optimizations that are likely to be performed.

First, the compiler recognizes that there is no point in accessing X twice, since it already has the value in R1.

R1 = X;
R2 = R1;
Y = (R1 == R2);

Then it realizes that there is no point in checking for equality, since it knows that R1 and R2 have the same value;

R1 = X;
R2 = R1;
Y = 1;

Then it sees that the value of Y does not depend on the values of any of the other variables, so it hides the memory subsystem write latency by moving the write operation earlier.

Y = 1;
R1 = X;
R2 = R1;

The resulting code bears no resemblance functionally to what the programmer wrote, yet the compiler optimizations are perfectly legal, and in fact are likely to be performed by the C compiler you are using today. (I really like this example, because I have written C code almost exactly like the original snippet, where X is a volatile hardware time base register.)

Adve and Boehm assert, as many of their colleagues do, that relying on such a data race, instead of using real synchronization mechanisms such as memory barriers, atomic memory operations, mutexen, semaphores and the like, even for read-only variables, leads to unreliable code. Even if it works today, it might not work after your next compiler upgrade. And as others have pointed out, depending on the C and C++ volatile keyword to be implemented correctly by your compiler vendor is unfortunately also an iffy proposition.

Adve and Boehm point out some of the widespread practices in software development that make solving the memory model problem difficult:
  • global address space;
  • imperative languages;
  • object oriented design; and
  • pointer based data structures.
Holy cats, is that all? Yeah, I think that's enough. Reminds me of my graduate school days during the furor over the Japanese "Fifth Generation" project in which Japan was going to use massively parallel data flow architectures and artificial intelligence to make all traditional computing systems obsolete. The research group I was in was looking at operating system and programming languages architectures for such highly concurrent systems. We played with designs incorporating functional languages, single assignment, lazy evaluation, and other far out ideas. As far out as those ideas were, though, many of them turned out to be used by the designers of today's optimizing compilers; it's all under the hood, used in intermediate forms that your source code takes along the path from C and C++ to machine code.

To paraphrase computer scientist Tony Hoare, I don't know what the super concurrent programming language for future massively many core processors will be, but I have an inkling it might not be called C or C++.