Saturday, January 03, 2009

Abstraction in C using Sources and Sinks

The problem with telling people that I do embedded software development is that no one really knows what that means. Including me. I've worked on embedded systems that were so resource constrained that they were written completely in assembly code, had no operating system, and in order to fix a bug we had to mine the code base to find instructions we could eliminate in order to make room. I've worked on embedded systems that had a full blown multi-user Linux system as their operating system, and the code base was hundreds of thousands of lines of bleeding-edge C++ that looked like the poster child for the use of templates and dynamic memory allocation.

Somewhere in between there is a sweet spot in which the application is written in C, but could still benefit from an object oriented design and implementation. OO scares a lot of the old school embedded developers (you know who you are). Yet my introduction to the use of those classics of OO - abstraction, encapsulation, and inheritance - was in 1974 when I was programming in assembly and using IBM's OS/360 Input/Ouput Subsystem, which was written in the late 1960s. Many years later when I was studying how C++ worked under the hood, the idea of the virtual pointer and the virtual table seemed quite familiar. I'd seen it all before in the OS/360 IOS. You get as old as I am, you begin to wonder if you're ever going to see anything that's truly innovative.

One of the most useful applications of OO in embedded systems, in my not-so-humble opinion, is that of abstracting out I/O: the ability to write applications that are agnostic as to from where their input comes and to where their output goes. This is exactly what the IOS and its myriad of pointer-filled control blocks (what C programmers would call structures) accomplished.

I used this idea for one of my clients when I was hired to implement a wide assortment of bitmap-graphic processing functions for an existing C-based product whose digital hardware could fit in your shirt pocket. I quickly realized that a lot of the code I had to write - like a deflate-based decompression algorithm, or a decoder for Portable Network Graphics (PNG) files - would be used in many places in the system. I also realized that the data for these algorithms could be buffered in memory, could come streaming in from a serial, Ethernet, or USB ports, or could be stored in a flash-based file system. I badly needed some abstraction for sources and sinks that I could implement on this tiny, non-POSIX platform.

As I implemented it, a Source is an abstract interface that requires the implementation to provide a read byte and a push byte method. The read produces a byte if it is available, End Of File (EOF) if the Source is empty and no byte would ever be available, and End of Data (EOD) if the byte isn't available now but might be in the future. The push pushes a single byte back into the Source where it is guaranteed to be the next byte produced by a subsequent read. (This vastly simplifies any algorithm that can be described by a grammar with one-character look-ahead, for example LL(1) or LALR(1). Such grammars are extremely useful ways of describing any algorithm that does parsing or data transformation or indeed any state machine. Both the deflate and PNG applications mentioned above can be described as grammars, and from there the implementation in code is almost a mechanical process. But that is an article for another day.)

int readSource(Source * that);
int pushSource(Source * that, char data);

A Sink is an abstract interface that requires the implementation to provide a write byte method. The write consumes the byte provided by the caller if it can, returns EOF if the Sink is full and no byte will ever be consumed, and EOD if the byte cannot be consumed now but might be in the future.

int writeSink(Sink * that, char data);

There are also a close method for both Sources and Sinks, which may or may not do anything, depending on the underlying implementation. No open method? Nope, because as we shall soon see the open is part of the implementation, not the abstraction.

int closeSource(Source * that);
int closeSink(Sink * that);

As you can see, these simple byte operations expect a pointer to a Source or a Sink structure (what C++ programmers would call an object). What these pointers actually point to depends on the implementation. But the application doesn't need to know this. It just takes whatever Source or Sink pointer you give it and implements the application (what my younger colleagues would call the business logic) in an I/O agnostic manner.

What might an application look like that uses Sources and Sinks?

Here is a trivial one that just copies all of the data from a Source to a Sink, until either the Source is empty or the Sink is full, without having any idea what the Source and Sink really is (and not doing much in the way of error checking).

size_t copy(Source * source, Sink * Sink) {
size_t total = 0;
int data;

while (!0) {
if ((data = readSource(source)) < 0) {
if (writeSink(sink, data) < 0) {
pushSource(source, data);

return total;

Here's a code snippet of a calling sequence that uses the function above to copy data from a file to an existing socket represented by an open file descriptor. It does so by providing the function with a concrete implementation of its required source and sink.

FileSource file;
DescriptorSink descriptor;
Source * source;
Sink * sink;
size_t total;

source = openFileSource(&file, "/home/coverclock/data");
sink = openDescriptorSink(&descriptor, sock);
total = copy(source, sink);

What kind of Sources and Sinks might one implement?

A BufferSource implements a Source that produces its data from a fixed size memory buffer.

Source * openBufferSource(BufferSource * that, void * buffer, size_t size);

A BufferSink implements a Sink that consumes its data into a similar memory buffer.

Sink * openBufferSink(BufferSink * that, void * buffer, size_t size);

A FileSource produces data from a file in the file system (in Linux this could be just about anything, thanks to the /proc and /sys file systems).

Source * openFileSource(FileSource * that, const char * path);

A DescriptorSink consumes data to any file descriptor (like a TCP/IP socket).

Sink * openDescriptorSink(DescriptorSink * that, int fd);

A NullSink consumes all data without out any complaint and tosses it into the bit bucket.

Sink * openNullSink(NullSink * that);

A CompositeSource concatenates two Sources into a single Source.

Source * openCompositeSource(CompositeSource * that, Source * primary, Source * secondary);

An ExpanderSink writes each data byte to two different Sinks. (Yes, of course you can use another ExpanderSink as one or even both of the Sinks.)

Sink * openExpanderSink(ExpanderSink * that, Sink * primary, Sink * secondary);

A Fletcher8Sink computes a Fletcher 8-bit checksum as the Sink is written and automatically appends the checksum to the Sink when it is closed. (And before you ask, of course there is a Fletcher8Source that automatically verifies the data as it is read from the Source.)

Sink * openFletcher8Sink(Fletcher8Sink * that, Sink * primary);

A RingBuffer implements a circular buffer that exposes both a Source and a Sink interface.

RingBuffer * openRingBuffer(RingBuffer * that, void * buffer, size_t size);
Source * sourceRingBuffer(RingBuffer * that);
Sink * sinkRingBuffer(RingBuffer * that);

How are these Sources and Sinks implemented?

Easily. Most are a few lines of code, a few maybe a page. I'll be writing more about the implementation. But you can find the source code for these and other Sources and Sinks (and their unit tests) now in the
Digital Aggregates Concha distribution. Concha is a clean-room open-source implementation of the Source and Sink design pattern (which my younger colleagues will recognize as a form of dependency injection). It is licensed under the Desperado modified LGPL which allows static linking without any viral licensing implications.

Update (2011-02-21)

I improved the example slightly since the original publication of this article. Apologies as usual for the poor code formatting. (The Blogger editor doesn't respect leading spaces, even those explicitly coded as HTML special sequences, nor does it respect the pre HTML tag.)


Anonymous said...

My C is somewhat rusty, but shouldn't it be

while (1) {
if ((data = readSource(source)) != EOF) { break; }
if (writeSink(sink, data) != EOF) { break; }


Chip Overclock said...

Yes. I'll fix that. That code snippet was actually cut and pasted from some of the unit test code, but I had to edit it on the fly because Blogger (despite what the FAQ says) doesn't seem to deal with the HTML pre tags correctly (the last time I tried it anyway) and the original C code has angle brackets that got interpreted by the Blogger editor, hosing it completely up. Thanks for pointing it out!