Saturday, May 03, 2008

Generic Embedded Programming Using the C Preprocessor

In the article Generic Embedded Programming Using Templates, I described how C++ templates could be used for generic programming in embedded applications. Generic programming represents a higher level of abstraction in which you express an algorithm without regard to the data types on which it operates. Experienced C developers may find that templates remind them of the C preprocessor. This is apropos, because in this article I will describe one way to use the C preprocessor to do a simple but useful form of generic programming in C.

Like C++ templates, the C preprocessor provides a powerful form of software reuse known as code generation. The symbolic substitution and macro capability provided by the C preprocessor, along with the native capabilities of the C compiler itself, lend themselves to a simple form of generic programming that can ease the burden of software maintenance. One of headaches of using the standard header files available in C and POSIX is the wealth of integer type definitions that hide what the actual data type is. For example size_t is returned by the standard I/O functions fread and fwrite. What is the range of size_t? Is is signed or unsigned? What about ssize_t which is returned by the recv socket system call? How about pid_t that is returned by the getpid system call?

Obviously, a few minutes perusing the hundreds of files under /usr/include (made simpler using a tool like cscope) will eventually lead you to discover that size_t is unsigned, and its counterpart ssize_t is signed. But on some systems size_t is thirty-two bits wide, others sixty-four bits. And on one system to which I've ported C code, it was a sixty-bit wide floating point data type, because that was the only data type that architecture had.

So what happens if you want to write portable code that assigns the minimum or maximum possible values to a variable declared as either of these two data types? You may be tempted to use the symbols MININT and MAXINT that are defined in /usr/include/values.h for the ssize_t variable, but is this safe? What if the code is moved to another machines where ssize_t isn't the same width as an int? What about all those integer data types defined in the third-party library you are porting? What if the next upgrade changes one of those type definitions you are using in your application? It's enough to make you want to turn to Java, which has a fixed set of integer data types which are the same on all platforms

But there's a simple way to write type-independent code where you generate the correct values for the integer data type automatically, providing the architecture implements true two's complement binary integers.

For size_t, or any other unsigned type, the minimum possible value is, of course, 0. So the minimum of an unsigned integer is easily defined. Let's suppose we define a simple macro to generate that value.

#define minuint(TYPE) ((TYPE)0)

This macro just casts the value zero to the appropriate type. It can be used just like this.

size_t min_size_t = minuint(size_t);

The maximum possible value of any unsigned type has all of its bit set. This is just the value ~0. So this is our next macro, and an example of its use.

#define maxuint(TYPE) ((TYPE)(~minuint(TYPE)))

size_t max_size_t = maxuint(size_t);

By now you may suspect that we can do the same for signed types. And so we shall. The minimum value of any signed number is the largest (numerically smallest) negative number. In two-complement arithmetic, this number has its most significant bit set and all of its other bits are zero. Using the C compile-time operator sizeof which returns the number of bytes in a variable or a type, this is easily computed.

#define minsint(TYPE) ((TYPE)(((TYPE)1)<<((sizeof(TYPE)*8)-1)))

ssize_t min_size_t = minsint(ssize_t);

Get it? The sizeof gives us the size of the argument TYPE. Multiplying that value by eight gives us the number of bits. Subtracting one gives us the bit position (relative to zero) of the top bit, and shifting the 1 (cast to the correct type) gives us the largest (numerically smallest) negative number.

It may come as no surprise by now that the largest signed number is again the bit-wise inversion of the smallest signed number, just as it was for unsigned numbers.

#define maxsint(TYPE) ((TYPE)(~minsint(TYPE)))

ssize_t max_size_t = maxsint(ssize_t);

So there you have it.

What's that? Yes, that's right, with these macros you have to know whether the data type is signed or unsigned. What? Really, must I do everything myself? Oh, very well; how about these macros, which will work with either signed or unsigned integer types?

#define minint(TYPE) ((maxuint(TYPE)>0)?minuint(TYPE):minsint(TYPE))

#define maxint(TYPE) ((maxuint(TYPE)>0)?maxuint(TYPE):maxsint(TYPE))

See the trick? If the integer type is signed, the value generated by the maxuint macro is interpreted as a -1. It the integer type is unsigned, the value is interpreted as the largest possible number for that data type which is always positive. The conditional expression automatically chooses the appropriate form of the minimum or maximum macros.

And here's an even better trick. All of this is done at compile time. The preprocessor expands the macros to their equivalent C expressions and submits the generated code to the C compiler. Since none of the values in the resulting expressions are run-time variables, the C compiler reduces the initializers to their equivalent integer numbers. If someone changes the underlying integer type of some type definition, the C compiler will automatically generate the correct minimum and maximum values the next time your code is built. Expanding on the trick above, you can even check to make sure that the data type is signed or unsigned.

Here's a code snippet that demonstrates the macros using a variety of integer data types.

size_t minsize = minint(size_t);
size_t maxsize = maxint(size_t);
ssize_t minssize = minint(ssize_t);
ssize_t maxssize = maxint(ssize_t);
pid_t minpid = minint(pid_t);
pid_t maxpid = maxint(pid_t);
uint64_t minuint64 = minint(uint64_t);
uint64_t maxuint64 = maxint(uint64_t);
int64_t minint64 = minint(int64_t);
int64_t maxint64 = maxint(int64_t);
short minshort = minint(short);
short maxshort = maxint(short);
unsigned short minushort = minint(unsigned short);
unsigned short maxushort = maxint(unsigned short);

printf("minsize 0x%08x %u\n", minsize, minsize);
printf("maxsize 0x%08x %u\n", maxsize, maxsize);
printf("minssize 0x%08x %d\n", minssize, minssize);
printf("maxssize 0x%08x %d\n", maxssize, maxssize);
printf("minpid 0x%08x %d\n", minpid, minpid);
printf("maxpid 0x%08x %d\n", maxpid, maxpid);
printf("minuint64 0x%016llx %llu\n", minuint64, minuint64);
printf("maxuint64 0x%016llx %llu\n", maxuint64, maxuint64);
printf("minint64 0x%016llx %lld\n", minint64, minint64);
printf("maxint64 0x%016llx %lld\n", maxint64, maxint64);
printf("minshort 0x%04x %d\n", minshort, minshort);
printf("maxshort 0x%04x %d\n", maxshort, maxshort);
printf("minushort 0x%04x %d\n", minushort, minushort);
printf("maxushort 0x%04x %d\n", maxushort, maxushort);

Here are the results when you run it. (This code was tested on a P4 Linux system and a P4 Cygwin system.)

minsize 0x00000000 0
maxsize 0xffffffff 4294967295
minssize 0x80000000 -2147483648
maxssize 0x7fffffff 2147483647
minpid 0x80000000 -2147483648
maxpid 0x7fffffff 2147483647
minuint64 0x0000000000000000 0
maxuint64 0xffffffffffffffff 18446744073709551615
minint64 0x8000000000000000 -9223372036854775808
maxint64 0x7fffffffffffffff 9223372036854775807
minshort 0xffff8000 -32768
maxshort 0x7fff 32767
minushort 0x0000 0
maxushort 0xffff 65535

This technique, as simple as it is, can make your C code more portable, more readable by eliminating some magic numbers, and more easily maintainable.

Update (2008-05-04)

When I write production code I have a habit of cranking the warning level of the GNU C compiler up to the max. This causes the compiler to object to checking for unsigned values being negative. To circumvent this in the versions of these macros I wrote for the Desperado C++ library, I reversed the check in the equivalent of the maxint and minint macros. I made this change in the macros above from their form in my original article.

Update (2008-12-03)

Jeremy Overesch identified a bug in the code above which I have corrected in this version of the article. The far more complicated Desperado C++ code, from which these macros were derived, had it right. In my effort to make the code readable, I broke it. And since I didn't port the full unit test from Desperado, I missed the bug. Shame on me, and a big Thank You to Jeremy.

Update (2009-03-16)

The Blogger editor seems intent on hosing up the source code in this article, regardless of how it looks in the preview. I apologize in advance for any formatting weirdness that I miss.


Jeremy Overesch said...

I came across your post when I was trying to find a way to determine if a variable is signed or unsigned in the preprocessor. Oddly enough, your Max/Min defines were my end goal.

I however, have found a slight error in your code. Under most circumstances, it works great. However, when I attempted to get the size of a short, or any type less than the computer's native size, I would get warnings and the comparison for signed/unsigned would not work properly. Adding (TYPE) in front of the inversion(~), on the max defines fixed this issue.

Chip Overclock said...

That'll show me not to second guess myself. I adopted this article from the Desperado generics.h header file (which I also wrote), which has versions of these macros with so much casting (and parenthesis) that they are unreadable. But they do have a unit test that shows that those versions work. I tried to simplify the macros for this article, and botched it. Thanks!