Saturday, April 09, 2011

The House of Cards

I cannot say that I am especially a fan of the horror fiction genre. But I am a total fanboy of the non-fiction work of Eric Eide and John Regehr and their team at the University of Utah School of Computing. In 2008 they published a paper
Eric Eide and John Regehr, "Volatiles are miscompiled and what to do about it", Proceedings of the 8th ACM International Conference on Embedded Software, 2008-10
which scared the crap out of me.

They developed a tool called randprog to see how a broad set of open source and commercial C compilers handled the volatile keyword. volatile is used to clue in the compiler that the declared variable might change outside of the context of the program. It is most typically used for variables that represent hardware registers that are mapped into memory.

The results were... interesting, to say the least. They found many many examples where the use of volatile was miscompiled. One of the tricks they suggested to get around a buggy compiler was to embed the use of a volatile variable inside accessor (getter) and mutator (setter) functions; the semantics of function parameters and return values are similar to those of volatile, and the code paths in compilers to handle those idioms are much more heavily travelled and hence more mature and bug free.

If, like me, you make a good living writing code down near the bare metal, perhaps even code upon which lives depend, a better description than interesting might be terrifying. But not terribly surprising.

You might think that volatile would be useful in multithreaded programs, where, indeed, shared variables can change outside of the context of the code the compiler can see in a particular translation unit. But the experience in the community of developers that write real-time, embedded, and/or highly concurrent code has been so bad that the use of volatile is highly discouraged.

If you look under the Documentation directory of any fairly recent Linux kernel (I have six from which to choose on my build server), you will find the file volatile-considered-harmful.txt. The file name should give you a clue where this is heading. In that file is an article
Jonathan Corbet, Why the "volatile" type class should not be used, Linux kernel documentation
which states that, regarding Linux kernel development, "the use of volatile is likely to be seen as a bug and will bring additional scrutiny to the code". This isn't because volatile doesn't work. It's because the optimization suppressing characteristics of volatile are seldom what is really required, except in certain very narrow circumstances, which do not include variables shared between concurrent sections of kernel code.

Outside of the kernel, where application developers are increasingly developing highly concurrent (a.k.a. multithreaded) code to leverage multi-core platforms, volatile in C and C++ code is also problematic. The memory models implemented at the hardware level by memory subsystems, which neither know nor care about whether you declared a shared variable in your high-level source code on some far off server to be volatile, frequently defeat what you intended to convey by the use of the keyword.

For example, in section 6.4.2, "Atomicity Operations", of the lengthy but entirely worthy tome
Ulrich Drepper, What Every Programmer Should Know About Memory, Red Hat Inc., 2007-11-21
the author discusses the MESI cache coherency protocol as commonly implemented in the memory subsystems of the processors you are using today.
If multiple threads modify the same memory location concurrently, processors do not guarantee any speciļ¬c result. This is a deliberate decision made to avoid costs which are unnecessary in 99.999% of all cases. For instance, if a memory location is in the ‘S’ state and two threads concurrently have to increment its value, the execution pipeline does not have to wait for the cache line to be available in the ‘E’ state before reading the old value from the cache to perform the addition. Instead it reads the value currently in the cache and, once the cache line is available in state ‘E’, the new value is written back. The result is not as expected if the two cache reads in the two threads happen simultaneously; one addition will be lost.
As a result, libraries that implement threading for C and C++, like POSIX threads, handle shared variables by using a combination of arcane compiler-specific directives that are not part of the programming language, and equally arcane processor instructions that implement memory barriers that keep the memory subsystem from doing something stupid.

The downside is that these mechanisms frequently effect all of the memory in the application, or even in the entire processor, not just the one shared variable with which the developer was concerned. This has been cogently discussed in
Sarita V. Adve and Hans-J. Boehm, "Memory Models: A Case for Rethinking Parallel Languages and Hardware", Communications of the ACM, 53.8, 2010-08
as well as in other forums.

You Java developers out there, don't get smug. You would do well to read
and come to understand what violence has been done to your Java compiler and Java Runtime Environment in order to fix this so that you could remain blissfully unaware of what's going on under the hood while your code runs slower albeit correctly. The same fixes applied by implementations of POSIX threads for C and C++ were applied to Java and the JRE.

But the fun doesn't stop there.

In
Xuejun Yang, Yang Chen, Eric Eide, John Regehr, "Finding and Understanding Bugs in C Compilers", Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011-06
my heros are back with a new random C code generator, Csmith, that is able to similarly test their collection of C compilers for misbehavior beyond just volatile. And they found that C compilers do aim to misbehave.

Csmith randomly generates legal C programs within a set of constraints that avoids producing code that strays into areas that the ANSI C standard defines as, well, undefined. They compile this code with their menagerie of C compilers. If the compilers don't crash and do produce an executable program (they don't always, as it turns out), they run all of the executables and see if they all produce the same answer. This process is automated so that they can test a million little C programs across many compilers.

They characterize most compiler optimizer implementations with this pseudo code snippet

analysis
if (safety check) {
transformation
}

and observed the "most common root cause for bugs that we found was an incorrect safety check". But their paper cites plenty of examples of wrong analysis and wrong transformation as well.

What's more, when they do misbehave, they do so in different, unrelated ways: "Csmith has yielded no evidence of correlated failures among unrelated C compilers".

Has Csmith been successful? "So far we have reported 79 GCC bugs and 202 LLVM bugs"; "To date, we have reported 325 in total across all tested compilers"; "twenty-five of the GCC bugs were marked by developers as P1: the maximum, release-blocking priority".

Where have the bugs been? "Most of the bugs we found in GCC were in the middle end: the machine-independent optimizers".

Are the bugs that Csmith is finding important? Aside from the GCC developers own classification of some of the bugs as release-blocking, "bugs that we have found and reported have been independently rediscovered and re-reported by application developers". What has been observed in captivity has also been observed in the wild.

Is Csmith a cost effective way of finding bugs? "Our rough estimate ... is that each of the more than 325 bugs we reported costs less than $1000 to find". That's a remarkably cost effective tool when you consider how many hours of the fully-loaded cost of a software developer that amount of money would buy.

Whenever I am tempted to think I've run into a compiler bug, I assume I'm really experiencing a senior moment and simply don't understand my own code. Nearly all the time, that's the case. But not always. I have occasionally resorted to scrutinizing assembly output to debug my code, and have rarely encountered what was clearly a compiler misstep. For example, decades ago I caught a compiler reusing a register for an arithmetic calculation then blithely returning to using the register as a pointer to a structure. Wackiness ensued. Try code inspecting or unit testing your way out of that one.

But it's not when a compiler bug causes my software to crash, as it did in that example, that I'm the most concerned about. It's when it doesn't crash. Applying automated bug extermination to our tool chains sound to me like a good addition to the arsenal in defense of reliable software.

No comments: