Monday, March 16, 2015

Being Evidence-Based Using the sizeof Operator

How long is an int in C?

How does the length of an int compare to that of a long in C++?

These and related questions crop up, I dunno, seems like weekly, in discussion forums related to embedded development in social media sites like LinkedIn.com. What then follows is a lengthy meandering comment thread full of misinformation or at least many answers that are only applicable to a very specific toolchain or hardware target, probably the only target the commenter has ever used.

Folks, you don't have to guess at this. You can be evidence-based and find out exactly what the answer is for your particular combination of toolchain and target.

The sizeof operator in C and C++ determines the size in bytes of the variable, array, type, or structure to which it is applied, at compile-time. It can't tell you anything about the dynamic run-time behavior of how your application uses memory. Nor will it give you the results you might expect if the compiler cannot know the size of the argument. But it can tell you all sorts of other useful stuff.

For example, given

struct Header {
 struct Header * next;
 size_t size;
};

typedef struct Header header_t;

header_t datum;

header_t data[10];

header_t * that;

you can use expressions like

sizeof(struct Header)

sizeof(header_t)

sizeof(datum)

sizeof(data)

sizeof(datum*)

sizeof(that)

sizeof(*that)

and even

(sizeof(data)/sizeof(data[0]))

in your C and C++ code to automatically insert appropriate constant values of type size_t instead of coding those values as, say, preprocessor symbols, or hard coding them as integer constants.

You don't have to take my word for it. You can code it up, compile it, and run it yourself, and see what happens.

Below is a little C program that I compile and run every time I find myself using an unfamiliar compiler or processor. And sometimes even a familiar compiler or processor, because I am old and have senior moments. This is from my Diminuto library of stuff I have found useful for doing systems programming under Linux/GNU. But you can just write your own and omit all the Diminuto-specific types.

As long time readers of this blog know, I have run similar code on non-Linux systems like VxWorks-based PowerPCs; on Harvard architectures like Arduino and PIC micro-controllers; on chips that had twenty-four bit integers; on an architecture for which a signed integer type was not the same length as an unsigned of the same integer type. It pays to check and not guess. There's a lot of crazy stuff out there.

(Note that the blogger editor may wrap some of the source lines. You can always get the original source code from GitHub.)

/* vi: set ts=4 expandtab shiftwidth=4: */
/**
 * @file
 *
 * Copyright 2014 Digital Aggregates Corporation, Colorado, USA
 * Licensed under the terms in README.h
 * Chip Overclock
 * http://www.diag.com/navigation/downloads/Diminuto.html
 *
 * There's a lot of duplication here, but I'm paranoid that way. Remarkably,
 * I once worked on an embedded project using a proprietary non-GNU C compiler
 * in which the sizeof of signed type was not the same as the sizeof of the
 * unsigned of the same type. I was a little astounded by that. Also, note
 * that you can take the sizeof a void and of a function type (as opposed to a
 * void or function pointer). It's good to know these things.
 */

#include <stdio.h>
#include <pthread.h>
#include <stdint.h>
#include <stddef.h>
#include "com/diag/diminuto/diminuto_types.h"

#define printsizeof(_TYPE_) printf("sizeof(%s)=%zu\nsizeof(%s*)=%zu\n", #_TYPE_, sizeof(_TYPE_), #_TYPE_, sizeof(_TYPE_*))

typedef enum Enum { ENUM = 0 } enum_t;

typedef void (function_t)(void);

int main(void)
{
    printsizeof(char);
    printsizeof(signed char);
    printsizeof(unsigned char);
    printsizeof(short);
    printsizeof(signed short);
    printsizeof(unsigned short);
    printsizeof(int);
    printsizeof(signed int);
    printsizeof(unsigned int);
    printsizeof(long);
    printsizeof(signed long);
    printsizeof(unsigned long);
    printsizeof(long long);
    printsizeof(signed long long);
    printsizeof(unsigned long long);
    printsizeof(float);
    printsizeof(double);
    printsizeof(void);
    printsizeof(enum_t);
    printsizeof(function_t);
    printsizeof(int8_t);
    printsizeof(uint8_t);
    printsizeof(int16_t);
    printsizeof(uint16_t);
    printsizeof(int32_t);
    printsizeof(uint32_t);
    printsizeof(int64_t);
    printsizeof(uint64_t);
    printsizeof(intptr_t);
    printsizeof(uintptr_t);
    printsizeof(size_t);
    printsizeof(ssize_t);
    printsizeof(off_t);
    printsizeof(pid_t);
    printsizeof(pthread_t);
    printsizeof(pthread_mutex_t);
    printsizeof(pthread_cond_t);
    printsizeof(diminuto_ticks_t);
    printsizeof(diminuto_ipv4_t);
    printsizeof(diminuto_port_t);
    printsizeof(diminuto_unsigned_t);
    printsizeof(diminuto_signed_t);
    return 0;

}

My current build system is a Dell PC with a four-core 2.4GHz Pentium processor. It's running Ubuntu 14.04, Linux 3.13.0, and gcc 4.8.2. Here is what I get when I run the program there.

sizeof(char)=1
sizeof(char*)=8
sizeof(signed char)=1
sizeof(signed char*)=8
sizeof(unsigned char)=1
sizeof(unsigned char*)=8
sizeof(short)=2
sizeof(short*)=8
sizeof(signed short)=2
sizeof(signed short*)=8
sizeof(unsigned short)=2
sizeof(unsigned short*)=8
sizeof(int)=4
sizeof(int*)=8
sizeof(signed int)=4
sizeof(signed int*)=8
sizeof(unsigned int)=4
sizeof(unsigned int*)=8
sizeof(long)=8
sizeof(long*)=8
sizeof(signed long)=8
sizeof(signed long*)=8
sizeof(unsigned long)=8
sizeof(unsigned long*)=8
sizeof(long long)=8
sizeof(long long*)=8
sizeof(signed long long)=8
sizeof(signed long long*)=8
sizeof(unsigned long long)=8
sizeof(unsigned long long*)=8
sizeof(float)=4
sizeof(float*)=8
sizeof(double)=8
sizeof(double*)=8
sizeof(void)=1
sizeof(void*)=8
sizeof(enum_t)=4
sizeof(enum_t*)=8
sizeof(function_t)=1
sizeof(function_t*)=8
sizeof(int8_t)=1
sizeof(int8_t*)=8
sizeof(uint8_t)=1
sizeof(uint8_t*)=8
sizeof(int16_t)=2
sizeof(int16_t*)=8
sizeof(uint16_t)=2
sizeof(uint16_t*)=8
sizeof(int32_t)=4
sizeof(int32_t*)=8
sizeof(uint32_t)=4
sizeof(uint32_t*)=8
sizeof(int64_t)=8
sizeof(int64_t*)=8
sizeof(uint64_t)=8
sizeof(uint64_t*)=8
sizeof(intptr_t)=8
sizeof(intptr_t*)=8
sizeof(uintptr_t)=8
sizeof(uintptr_t*)=8
sizeof(size_t)=8
sizeof(size_t*)=8
sizeof(ssize_t)=8
sizeof(ssize_t*)=8
sizeof(off_t)=8
sizeof(off_t*)=8
sizeof(pid_t)=4
sizeof(pid_t*)=8
sizeof(pthread_t)=8
sizeof(pthread_t*)=8
sizeof(pthread_mutex_t)=40
sizeof(pthread_mutex_t*)=8
sizeof(pthread_cond_t)=48
sizeof(pthread_cond_t*)=8
sizeof(diminuto_ticks_t)=8
sizeof(diminuto_ticks_t*)=8
sizeof(diminuto_ipv4_t)=4
sizeof(diminuto_ipv4_t*)=8
sizeof(diminuto_port_t)=2
sizeof(diminuto_port_t*)=8
sizeof(diminuto_unsigned_t)=8
sizeof(diminuto_unsigned_t*)=8
sizeof(diminuto_signed_t)=8
sizeof(diminuto_signed_t*)=8

My current reference target is an Nvidia Jetson board with a TK1 four-core ARMv7 processor. It's running Ubuntu 14.04, Linux 3.10.24, and gcc 4.8.2. Here is what I get when I run the program there.

sizeof(char)=1
sizeof(char*)=4
sizeof(signed char)=1
sizeof(signed char*)=4
sizeof(unsigned char)=1
sizeof(unsigned char*)=4
sizeof(short)=2
sizeof(short*)=4
sizeof(signed short)=2
sizeof(signed short*)=4
sizeof(unsigned short)=2
sizeof(unsigned short*)=4
sizeof(int)=4
sizeof(int*)=4
sizeof(signed int)=4
sizeof(signed int*)=4
sizeof(unsigned int)=4
sizeof(unsigned int*)=4
sizeof(long)=4
sizeof(long*)=4
sizeof(signed long)=4
sizeof(signed long*)=4
sizeof(unsigned long)=4
sizeof(unsigned long*)=4
sizeof(long long)=8
sizeof(long long*)=4
sizeof(signed long long)=8
sizeof(signed long long*)=4
sizeof(unsigned long long)=8
sizeof(unsigned long long*)=4
sizeof(float)=4
sizeof(float*)=4
sizeof(double)=8
sizeof(double*)=4
sizeof(void)=1
sizeof(void*)=4
sizeof(enum_t)=4
sizeof(enum_t*)=4
sizeof(function_t)=1
sizeof(function_t*)=4
sizeof(int8_t)=1
sizeof(int8_t*)=4
sizeof(uint8_t)=1
sizeof(uint8_t*)=4
sizeof(int16_t)=2
sizeof(int16_t*)=4
sizeof(uint16_t)=2
sizeof(uint16_t*)=4
sizeof(int32_t)=4
sizeof(int32_t*)=4
sizeof(uint32_t)=4
sizeof(uint32_t*)=4
sizeof(int64_t)=8
sizeof(int64_t*)=4
sizeof(uint64_t)=8
sizeof(uint64_t*)=4
sizeof(intptr_t)=4
sizeof(intptr_t*)=4
sizeof(uintptr_t)=4
sizeof(uintptr_t*)=4
sizeof(size_t)=4
sizeof(size_t*)=4
sizeof(ssize_t)=4
sizeof(ssize_t*)=4
sizeof(off_t)=4
sizeof(off_t*)=4
sizeof(pid_t)=4
sizeof(pid_t*)=4
sizeof(pthread_t)=4
sizeof(pthread_t*)=4
sizeof(pthread_mutex_t)=24
sizeof(pthread_mutex_t*)=4
sizeof(pthread_cond_t)=48
sizeof(pthread_cond_t*)=4
sizeof(diminuto_ticks_t)=8
sizeof(diminuto_ticks_t*)=4
sizeof(diminuto_ipv4_t)=4
sizeof(diminuto_ipv4_t*)=4
sizeof(diminuto_port_t)=2
sizeof(diminuto_port_t*)=4
sizeof(diminuto_unsigned_t)=8
sizeof(diminuto_unsigned_t*)=4
sizeof(diminuto_signed_t)=8
sizeof(diminuto_signed_t*)=4

See? Now was that so hard?

Friday, March 13, 2015

Finding the Door into Summer

Not being much of a fantasy reader, I wasn't a fan of Terry Pratchett's fiction, although I did develop a lot of affection for the man from the times I've seen him on panels and such at science fiction conventions (SF fans being like the ancient Greeks in that our gods walk among us and occasionally meddle in our affairs). Nor am I a huge fan of the late Robert Heinlein, although unlike Pratchett I did read a lot of his books in my youth. But the recent death of the former -- at an age not all that much greater than mine -- got me thinking about the latter.

In Heinlein's 1956 novel The Door Into Summer, the book begins with the protagonist renting a rural farmhouse and living with a cat named Petronius, indeed the very same after which my own feline overlord Petronius is named.

Petronius Sloan

Typical of many rural buildings, the house had been added to over the years such that it had an unusual number of doors leading outside. And like many cats, Petronius was constantly demanding to go outside, even in the dead of winter. The cat insisted on exploring all the doors, all the time, and the hero of the story remarked that Petronius was looking for the "door into summer", the door that lead to the grassy meadow full of butterflies that it remembered from the prior season. (SPOILER ALERT) So at the end of the book when Petronius dies of old age after a long and happy life, the hero says that his beloved cat finally found that door he had been looking for.

I liked that metaphor very much. When it is my time to have that final stroll with Death, I hope my friends say "Chip finally found the door into summer".

Friday, March 06, 2015

Configuration Files Are Just Another Form of Message Passing (or Maybe Vice Versa)

In 1989, when I was at the National Center for Atmospheric Research or NCAR in Boulder Colorado, I was working on a spooling system for our output production system. When you have supercomputers (and NCAR had a lot of them, then and now), everything else has to be super too: storage systems, networking, and even your print system. Way back then, NCAR had two gigantic Xerox laser printers which would fill a normal sized room. It also had several automated film cameras to which we wanted to spool graphical output.

This being the time before digital images could be rendered on demand on a laptop and projected on a screen using equipment you could buy at your local office supply store, scientists wanted to express their high resolution scientific visualizations that they generated on the big Cray supercomputers on some persistent media. A Dicomed automated film camera would render the visualizations to a CRT screen and the camera would take photograph of it, all under software control. The system could produce still photographs, movies, and microfiche, in color or black and white. (If you don't recognize any of these technologies, don't sweat it, just go with me here.) So a small team of us set about to develop a distributed graphical spooling system in C to run under SunOS. The result was the Text And Graphics System or TAGS.

TAGS Development Team
The TAGS Development Team (ironic that this old photograph is out of focus).

I was tasked with developing a message passing system for TAGS. After a lot of thought and experimentation, I invented parcels. (Parcels with TAGS, get it?) Parcels were a text-based free-format mechanism that passed messages over stream sockets as a sequence of keyword-value pairs, what we might now refer to as a property. The message payload of a parcel might look something this.

job=1234
user="Chip Overlock"
parameters="align=left title=\"Model A Run 14\""
timestamp=1989-12-01T18:04:22
color=yes

When a parcel was received, it was parsed and loaded into a binary tree that could be search by keyword, so it was a kind of associative memory. As with the parameter property above, parcels could themselves be values of properties that could be subsequently parsed and loaded into yet another binary tree, so parcels could be nested.

Because parcels were text, we could generate messages for testing or other special purposes with our favorite text editor. Parcels could be written to and read from files; that's how TAGS maintained job state persistently across reboots. And best of all, if another developer added some properties to the message that his software sent my software, I didn't have to coordinate with him at all. My software didn't care that were were extra properties in the parcel, as long as the properties my software did care about hadn't changed. It hugely simplified development and deployment.

Yeah, seems kinda quaint now, doesn't it? But parcels predate the ubiquitous use of Extensible Markup Language or XML as a text-based message passing format. Using parcels was a revelation: it was a message passing format, it was a configuration file format, it was a floor polish and a desert topping.

TAGS ran in production for about eight years until both it and the hardware it served and on which it ran were obsolete. At its peak, TAGS produced a half a million frames of film per month. Output media included 35mm color slides, 105mm microfiche, and ultimately even video tape and monochrome and color paper. My parcel code ended up having more longevity than TAGS itself, going on to be used in other NCAR systems after I left the organization.

Years later,  I became quite conversant with the Document Object Model (DOM) and the facilities in Java to parse XML, as well as C and C++ based XML parsing libraries, as  I worked on subsequent projects in other organizations that took a similar approach to message passing and configuration files. You probably won't be surprised to learn that XML made me positively nostalgic for my NCAR days; XML had all the good stuff that I liked about parcels.

<job>1234</job>
<user>Chip Overlock</user>
<parameters>
 <align>left</align>
 <title>Model A Run 14</title>
</parameters>
<timestamp>2006-06-15T08:24:00</timestamp>
<color>yes</color>

A couple of years ago I had a gig with Aircell (now Gogo Business Aviation) to help a team develop a satellite communications system for business aircraft. We used Java Script Object Notation or JSON for much of our configuration files and message passing. It was parcels all over again.

{
 "job": "1234",
 "user": "Chip Overlock",
 "timestamp": "2014-01-28T14:15:04",
 "color": "yes",
 "parameters": {
  "align":  "left",
  "title": "Model A Run 14"
 }
}

We also ended up making some serious use of Asterisk, the open source PBX. The final product, which today you might find in a Gulfstream or Lear business jet, allows a traveller to use a VoIP app on their smart phone to wirelessly place a SIP call that goes to space over the Iridium satellite network, down to a ground station into the telephone system, or even to another plane equipped with the same  product. It was international dialing to anywhere in the world, any time, from anywhere, at any altitude. It worked. You could even send a FAX. (Remarkably, billionaires flying in seventy million dollar business jets still insist on using facsimile machines. Go figure.)

image
My Pelican cases and the flight test aircraft (photo by Doug Young).

Yeah, I was impressed, too. But it's funny what you carry away from projects like that. What I really wanted to use in other projects in the future was Asterisk's configuration file support. Asterisk, and lots of other systems, use a really flexible configuration file format that is sometimes referred to as the INI (for initialization) format. INI files are similar to parcels in that they have properties that are keyword-value pairs, but they allow you to group collections of properties into sections using the square bracket notation as shown below.

job=1234
user=Chip Overlock
timestamp=2015-03-06T11:08:30
color=yes
[parameters]
align=left
title=Model A Run 14

Recently I found myself (somewhat deliberately, it must be admitted) between gigs with some time on my hands and a lot of unused computer equipment sitting around, so I perused my old-school index card file for project ideas.

I had always wanted to learn how to use Flex and Bison. Flex is an open source reimplementation  of Lex, a UNIX tool that generates a lexical scanner based on regular expressions. Bison is similarly an open source reimplementation of YACC or Yet Another Compiler Compiler, a UNIX  tool that generates a shift-reduce parser from an LALR(1) grammar. Over the years I had written a lot of recursive descent parsers and table-driven parsers based on push down automata (PDA), but had never used Lex or Yacc, although I had seen other developers make good use of Flex and Bison (including one on the SATCOM project). I had written my share of grammars that were more or less LL(1), but had only played with LALR(1) grammars as an academic exercise when I was in graduate school.

During another interstitial period, I had added an implementation of red-black balanced binary trees to my Diminuto library, a collection of C code for systems programming under Linux/GNU, portions of which has found its way into many commercial products (including, again, the SATCOM project). I wanted a non-trivial application for the red-black trees in Diminuto beyond the unit-tests that I had written.

And I really wanted a C-based parser for INI files.

Assay is my C-based parser for INI files. The lexical scanner and shift-reduce parser used by Assay  are generated by Flex and Bison from an LALR(1) grammar that I developed that describes, as near as I could determine anyway, the INI file format used by Asterisk. You can trivially import the sections and properties from an INI file into a red-black tree which can be efficiently searched. And you can just as trivially export the tree (what Assay refers to as a configuration), into a file. Or any other thing that can be represented as a standard I/O stream (FILE *).

Here's a code snippet from an Assay unit test that, given a path name, creates a new configuration object, imports the contents of an INI file into it, exports the resulting configuration in INI file format to standard output for inspection, and then interrogates the imported configuration. (I've removed all the unit test framework for readability.)

assay_config_t * cfp;
const char * value;
cfp = assay_config_export_stream(assay_config_import_file(assay_config_create(), PATH1), stdout);
value = assay_config_read_string(cfp, "section1", "keyword1");

You don't have to import an INI file to use Assay. You can create configurations programmatically (which is just what the Bison-generated Assay parser is doing as it reads the INI file).

assay_config_t * cfp;
assay_section_t * scp;
assay_property_t * prp;
cfp = assay_config_create();
scp = assay_section_create(cfp, "ALFA");
prp = assay_property_create(scp, "ONE");
assay_property_value_set(prp, "1", strlen("1") + 1);

Here's the funny thing: it wasn't until I was wrapping up the project, slamming what I thought were the last of the unit tests down to the Assay repo on GitHub, and fixing some typos in the Doxygen comments, that it occurred to me, maybe I could use Assay for message passing too, by using a FIFO or a socket that could be represented by a standard I/O stream object.

A half a day later I had new unit tests doing just that.

Here's two code snippets from an Assay unit test, one from a producer process that creates a configuration, loads it from an INI file, sends it across a socket, and then destroys the configuration, and another showing a consumer process that create a configuration and receives the INI message into it.

assay_config_destroy(assay_config_export_stream_send(assay_config_import_file(assay_config_create(), PATH1), stream = fdopen(pipeline[1], "w")));

cfp = assay_config_import_stream(assay_config_create(), stream = fdopen(pipeline[0], "r"));

So finally,  after more than a quarter of a century of implementing and using stuff just like this, over and over, it finally occurred to me that maybe configuration files are just a special case of message passing. Or possibly vice versa. Whenever I am faced with implementing either a configuration file handler, or a message passing sub-system, maybe it would be a really a good idea to consider whether perhaps those two tasks are really the same problem.

References

J. L. Sloan, Parcels with TAGS, NCAR Technical Note, NCAR/TN-377+IA, National Center for Atmospheric Research, 1992-10,  http://opensky.library.ucar.edu/collections/TECH-NOTE-000-000-000-188

Wikipedia, "INI file", http://en.wikipedia.org/wiki/INI_file

Asterisk Project, "Asterisk Configuration Files", https://wiki.asterisk.org/wiki/display/AST/Asterisk+Configuration+Files

VOIP Info, "Asterisk config files", http://www.voip-info.org/wiki/view/Asterisk+config+files

John R. Levine, Tony Mason, Doug Brown, lex & yacc, 2nd ed., O'Reilly, 1995

John R. Levine, flex & bison, O'Reilly, 2009

Tom Niemann, "LEX & YACC TUTORIAL", http://epaperpress.com/lexandyacc/download/LexAndYaccTutorial.pdf

Saumya, K. Debray, "Lex and Yacc: A Brisk Tutorial", http://www.cs.arizona.edu/~debray/Teaching/CSc453/DOCS/tutorial-large.pdf

Assay, Digital Aggregate Corporation, 2015, http://www.diag.com/navigation/downloads/Assay.html