Monday, September 29, 2014

What You Don't Know Can Hurt You

Below is a little snippet of C code from [Krebbers 2014]. Peruse it and see if you can predict what two values it will print. It's only a handful of lines long. Go ahead, take your time. I'll wait.

#include <stdio.h>
void main() {
    int x;
    int y;
    y = (x = 3) + (x = 4); 
    printf("%d %d\n", x, y); 
}

So let's compile it on my build server that's sitting a few feet away from me. It's a Dell x86_64 system with four 2.4GHz Intel cores running Ubuntu 14.04 with the 3.13 Linux kernel and the GNU 4.8.2 C compiler. It's old but still mighty.

coverclock@ubuntu:~/src/misc$ gcc -o foo foo.c
coverclock@ubuntu:~/src/misc$ 

Good; no warnings, no errors.

coverclock@ubuntu:~/src/misc$ ./foo
4 8
coverclock@ubuntu:~/src/misc$ 

Huh.

This code isn't multi-threaded. It's barely single threaded. In fact, the code snippet is so simple, it hardly qualifies as anything beyond the classic "Hello World!" program.

Here's the thing: you may have gotten completely different results, if you used a different compiler. Or a different version of the same compiler. Or maybe even different compiler options for optimization or debugging levels. As Mister Krebbers points out in [Krebbers 2014]:
By considering all possible execution orders, one would naively expect this program to print 4 7 or 3 7, depending on whether the assignment x = 3 or x = 4 is executed first. However, the sequence point restriction does not allow an object to be modified more than once (or being read after being modified) between two sequence points [ISO C, 6.5 p. 2]. A sequence point occurs for example at the end ; of a full expression, before a function call, and after the first operand of the conditional ? : operator [ISO C, Annex C]. Hence, both execution orders lead to a sequence point violation, and are thus illegal. As a result, the execution of this program exhibits undefined behavior, meaning it may do literally anything.
Okay, so maybe not a huge surprise to folks who have memorized the ISO C standard. Or who are tasked with debugging problematic code by occasionally resorting to looking at the assembler code. Using a symbolic JTAG debugger that monitors the program at the hardware level, I've seen the program counter single step backwards in a sequential piece of C code, as the debugger traced the execution path the processor took through the optimized machine code and then tried to correlate it to the original source.

This is why you don't write tricky C code, playing games like trying to smash as much stuff into a single statement as you can. Because it can belie any kind of rational analysis. Because it becomes a debugging nightmare for the developer tasked with current engineering who comes after you. Because its behavior may change with your next compiler update. Or when it's ported to a project using a different compiler suite altogether.

Because it can bite you in the ass.

References

R. Krebbers, "An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C", 41st ACM SIGPLAN -SIGACT Symposium on Programming Languages, January 2014

International Organization for Standardization, ISO/IEC 9899-2011: Programming Languages - C, ISO Working Group 14, 2012

Lambda The Ultimate, "An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C", September 2014

11 comments:

Paul Moorman said...

Using my very old programming skills, I calculated 4 and 8 the first time. Seemed to be the only logical answer.

Anonymous said...

Anyone that write code like this, should NOT be a programmer.

Anonymous said...

How Did you come to 4 and 8?

Chip Overclock said...

A big THANK YOU to embedded pundit and guru Jack Ganssle for citing this article in his popular newsletter.

http://www.ganssle.com/tem/tem270.html

Chip Overclock said...

"How did you come to 4 and 8?"

I didn't. I compiled the code and ran it and that's what it printed. The question is: how did the executable code generated by the compiler come to 4 and 8? And the point is: according to the ISO C standard, it could have printed _anything_. Or nothing. Or core dumped. According to the standard, the results are undefined.

Anonymous said...

Yet another example of code that no human would ever really write to prove a point that 99.9% of programmers really don't care about.

Ok, for the 0.1% of you, I am sorry.

Back to writing real code for real things in the real world.

markm said...

It's not real code for real things in the real world, but it is a simplified versions of things that often are in real code. Say,

int *x, *y;
int z;

x = malloc(sizeof(int));
y = x;
/* ... much intervening code ... */
z = fn1(x) + fn2(y);

Maybe neither function changes the referent of *x or *y. Maybe only one of them does. But maybe both of them do, and you have to be aware of all details of the code to notice that this might be a problem.

Anonymous said...

"How did you come to 4 and 8?"

I looked at the .asm file after compiling this with Microsoft C/C++ V18.00. This compiler did all the assignments in the equation first, then did the addition. So, at first x was 3, but then it changed to 4. Then adding 4 + 4 yields 8. Here is the asm:

mov DWORD PTR x$[rsp], 3
mov DWORD PTR x$[rsp], 4
mov eax, DWORD PTR x$[rsp]
mov ecx, DWORD PTR x$[rsp]
add ecx, eax
mov eax, ecx
mov DWORD PTR y$[rsp], eax

Anonymous said...

If we take for example a factorial function written in 1 line of C code, would you also consider it in the same way as the code example in your article?

Factorial can be written in other ways. It might not be a very relevant example. It doesn't have any assignments and should always give the same result. But does the fact that a developer can understand it differently without referring to a standard make it not recommended?

I want to touch on this subject since I think that before going to a C standard to understand some code, humans should understand it first.

int factorial(int n)
{
return (n == 1 || n == 0) ? 1 : factorial(n - 1) * n;
}
(this is not my code)

Chip Overclock said...

I believe the factorial example you cite is legal C code. The example I give is not legal C code, it violates the standard, although it does compile without even warnings. Your factorial example is pretty easily understood and is much like C code I would routinely write.

Unknown said...

Thanks for this tidbit of C lore.

I ran the example through both GCC (v4.8.1) and Clang (v3.3) on my OpenSUSE 13.1 machine.

GCC remains silent and prints 4 8.
It does issue a warning, when using -Wall.

Clang gives the following warning

warning: multiple unsequenced modifications to 'x' [-Wunsequenced]

and the program prints 4 7!

But of course you are right in saying that anyone who writes such a thing should consider a career change.