Thursday, November 19, 2020

Is the Virtuous Cycle really that virtuous?

In October 2020 I wrote a longish article titled Layers about the layered implementation that emerged organically from the fifteen years (so far) of work I put into my Diminuto systems programming library that is the underlying platform for much of my C-based work. After thinking more about the "virtuous cycle" I described in that article - where feedback from actual use of Diminuto lead to its evolution and expansion and then subsequent use in even more projects (including some shipping products) - I added an afterword. Upon even more reflection, the afterword got longer and longer. Eventually I decided it merited its own short blog article. This is it.

I wonder from time to time whether I'm the only person still working in C. Then I see the TIOBE Index where C is the #1 most used programming language. I find this remarkable, since my years of PDP-11 experience eons ago lead me to still think of C as a kind of structured PDP-11 assembler language.

C places around fifth in other surveys, which seems about right to me. Although I find Rust challenging to work in, I couldn't criticize anyone for using it for systems-level work. And maybe using Go instead of Java for application work. And I like Python too for non-performance-sensitive high level work.

One of the things I find really mysterious is that organizations who work in C, GNU, and Linux don't seem to put any effort into making C easier, simpler, more reliable, and more productive to use. That's exactly what using Diminuto does for me. I'm not trying to talk C shops into using Diminuto. But but I am arguing that they should do something like what I do with Diminuto. And they could do a lot worse than looking carefully at the layered implementation approach I talk about in this article.

I believe this is another case of the cognitive bias towards prioritizing the short term over the long term. Putting code that is likely to be commonly useful in other projects into a reusable library, along with unit tests, is a pattern that has worked very well for me. But in organizations larger than my tiny one-person company, it requires that the folks managing the project and holding the purse strings see the long term value in that. For most project managers there is no long term. And certainly nothing worth spending resources - schedule and budget - outside of their immediate project and silo.

It also requires enough experience to know what is likely to be more generally useful in the long term; I haven't always gotten that right myself. And once you have a library that is used in multiple projects, changes and maintenance on that library requires more diligence and incurs a higher price, since it affects multiple code bases. An API change that breaks legacy code requires careful consideration indeed to balance the cost of breakage against the cost of technical debt.

Line managers and developers are typically not incentivized to think long term. Agile processes do not encourage this either. I find it hard to argue against this. It's easy to get side-tracked and take your eye off of the immediate goal. Long term thinking is the job of senior architects and upper management. And as I like to say: if you don't survive in the short term, there is no long term. It's just that in my tiny corporation, I'm both the line developer and the senior architect. I have to pay attention to the tiny details and the big picture.

Balancing short term and long term needs is a hard problem.

Tuesday, October 27, 2020

Layers

 "Either you design a layered architecture, or you'll wish you had."
-- Ken Howard, Ph.D., Bell Labs

When Colorado got hit recently with an early winter storm bringing several inches of snow and 10┬║F temperatures before Halloween had even arrived, I found myself wearing four layers of clothing while shoveling my driveway. It gave me time to think about the software layers that developed organically in Diminuto,  my open source C library and framework that has been the basis of almost all of my personal projects, and some or all of which has found its way into several shipping products. It will become obvious I'm using the term layers here a little differently than you might have seen before when referring to software architecture or network protocol stacks.

History

My work on Diminuto ("tiny") started in earnest in early 2008, with portions of it borrowed from earlier work I did in C++ in 2005, and based on even older C code I wrote in 1994. I began with a physically large and relatively expensive Atmel development board with an ARMv4 processor, a chip that seems laughably underpowered compared to the current ARMv7 and ARMv8 offerings that are in your mobile phone. This board predated the Raspberry Pi (2012), and even the BeagleBoard (2008). Having spent years writing C and C++ code for one real-time operating system (RTOS) or another like VxWorks, I was experimenting with building a minimalist Linux-based system that could run comfortably on the (then) memory-limited ARM processor. My testbed consisted of:

  • a custom Linux kernel configured with only those features I really needed;
  • uClibc, a stripped down C library;
  • BusyBox, that combines a bunch of commonly used Linux/GNU utilities into a single binary executable; and
  • my C-based application.

(You can click on any image to see a larger version.)

Board Setup

Even though I was shooting for a tidy KonMari-style system that would make Marie Kondo proud, I still found it useful to organize some of my code into a library and set of header files. Even then it had occurred to me that I might reuse some of this stuff in another context one day.

Diminuto, and my work that used it, eventually migrated to more capable platforms, like the BeagleBoard.

Cascada

And then to the first Raspberry Pi.

image

Eventually I found myself using later, even more powerful Raspberry Pis, and that Single Board Computer (SBC), with Diminuto, running on its Debian-based Linux/GNU distro, became my standard deployment platform for a lot of later projects, like Roundhouse, my IPv6 testbed.

Untitled

I occasionally found myself porting Diminuto to more powerful platforms, like the Nvidia Jetson TK1 development board.

Nvidia Jetson TK1

Diminuto played a key role in even more exotic projects, like building Astrolabe, my own stratum-0 NTP server using a chip-scale cesium atomic clock (CSAC).

Astrolabe (O-2)

Over the span of twelve years, Diminuto continued to grow organically in a kind of positive feedback loop, as the C library and framework became more useful than my quest for a tiny embedded Linux system. I would work a gig to develop some commercial product, and encounter a software feature in either the Linux kernel or the GNU library that I thought was useful - like the run-time loadable user-space modules that are used by the Asterisk open source PBX - and I would fold support for that feature, along with unit and functional tests and sometimes command line tools, into Diminuto. Or I would use some or all of Diminuto in some commercial development, and find some things in my code that didn't work so well, so I would fix it. And I would continue to leverage that growing base of reusable code in my own personal work.

Today Diminuto amounts to over 56,000 lines of C and bash code distributed amongst almost 350 files, implementing 84 different features and 38 different command line utilities, with 97 unit test programs and 28 functional test programs. It forms the underpinnings for almost all of my Linux/GNU-based personal projects in C or C++, like Hazer, my project that explores technologies like Differential Global Navigation Satellite Systems (DGNSS) and Inertial Measurement Units (IMU).

(To give you another way to think about this evolution, decades ago I started out using the Source Code Control System (SCCS), and then the  Revision Control System (RCS), to manage my source code changes to the precursors to Diminuto. Then I migrated to Concurrent Versions System (CVS). Then to Subversion (SVN). And finally git, with the repository hosted on GitHub. There are pieces of Diminuto that came from that earliest work that have been managed by five different code management systems.)

Different people learn in different ways. I envy my friends and colleagues who can pick up a book on some new technology, read it, and claim to have learned something. (Liars.) I can only learn by doing. I'm a very hands-on student. (Which is why I also have git repositories of PythonGo, and Rust code.)

Working on Diminuto has been fundamental to my ability to learn not just new (to me) and evolving C, Linux, GNU, and POSIX features, but also to experiment with Application Program Interface (API) design. I've learned a lot about porting Diminuto between processor architectures (I use it almost daily on ARMv7, ARMv8, and Intel x86_64 machines), and between different versions of Linux and GNU, and even different operating systems and C libraries; in the past I've ported all or parts of it to uClibc, Cygwin, Android/Bionic, and MacOS/Darwin. It has given me a way to write useful code once, test it throughly, and then reuse it, either in my own projects or in products I've helped my clients develop. As much as I like being paid by the hour, there are more fulfilling things to do than to write the same underlying code over and over again.

Even though I have at times ported Diminuto to other platforms, it is not portable, nor is it intended to be. It is specific to Linux and GNU, and to the kinds of work that I am routinely called upon to do. I regression test changes only against relatively recent versions of the kernel and the C library.

As Diminuto evolved incrementally in its virtuous cycle, without really much of a long-term plan on my part, I none the less noticed that it had developed a layered architecture, driven by the needs of whatever I was using it for at the time. Maybe this was an accident, or maybe it's my forty-plus years of experience talking. But in any case, as my old friend and colleague at Bell Labs, Dr. Ken Howard, would surely remark, a layered architecture is a good thing.

Layers

Layer 0: Documentation

Like most software developers, I'm more interested in writing code than I am in writing documentation. But like most software developers of a certain level of experience, I have come to realize that I have to document my work, if not for others, then at least for myself.  Plus, documentation is its own reward: while writing comments I sometimes think: "Hey, wait a minute. That doesn't make any sense at all." It's like a one-person code review. Also, like other developers of a certain age, I've looked at code I wrote months or years ago and wondered: "What the heck does this even do? And WHY?"

Documentation for Diminuto (and for most of my other projects) comes in three flavors:

  • blog articles like the one you are reading now;
  • Wiki-style documentation like that in the Diminuto README; and
  • Doxygen comments embedded directly in the code.

Doxygen is a software tool that takes your source code containing comments written in Doxygen's markup language, and produces (with a little help from other tools) documentation in the form of HTML pages, traditional UNIX manual pages, and PDF files.

Doxygen supports comments for files, functions, types, structures, variables, constants, preprocessor symbols, pretty much anything you can do in C. (Doxygen supports other languages as well.) The PDF reference manual for Diminuto currently runs to 784 pages! Of course, the resulting documentation is only as good as the Doxygen comments from which it is generated.

My approach is to document each Diminuto feature with Doxygen comments in its .h header file - the part of Diminuto that is publicly accessible even in a binary installation - which defines the API for that feature. Doxygen comments may also appear in .c translation units to describe functions that are not part of the public API.

Screen Shot 2020-10-27 at 11.11.29 AM

Even if I never refer to the generated documentation, Doxygen provides a canonical comment format for documenting my code. I routinely use the Doxygen comments to remind myself how my own code works.

Layer 1: Types

As much as possible I use the standard pre-defined data types provided by GNU and Linux, like the following:

  • size_t and ssize_t defined in <stddef.h>;
  • fixed-width integer types like int32_t and uint64_t defined in <stdint.h>;
  • bool defined in <stdbool.h>; and
  • pid_t in <sys/types.h>.

But it quickly became clear that it would be a good thing to have some standard Diminuto types that could be used broadly across all feature implementations. The header file

"com/diag/diminuto/diminuto_types.h"

not only automatically includes the standard header files cited above, but also defines some types that are used in Diminuto, like:

  • diminuto_ticks_t that holds the common Diminuto unit of time;
  • diminuto_ipv4_t that holds an IP version 4 address;
  • diminuto_ipv6_t that holds an IP version 6 address; and
  • diminuto_port_t that holds an IP port number.

All the higher layers are based on these types.

(Why the long #include path above for the Diminuto header file? It's to make integration into other code bases easier. See Some Stuff That Has Worked For Me In C and C++.)

Layer 2: Logging

I wanted to write reusable C code that could be part of a command line tool, part of a system daemon, or even part of a loadable module that runs in kernel space. To do that, I had to abstract away the specific mechanism used to display error messages. The Diminuto logging API defined in the header file

"com/diag/diminuto/diminuto_log.h"

does that. It borrows its design from

  • the standard syslog(3) interface;
  • from the Linux kernel's printk interface;
  • from Android/Bionic's logging mechanism; and
  • even from non-C logging facilities like those I've used in Java.

Diminuto defines eight log levels, from highest to lowest severity:

  • emergency;
  • alert;
  • critical;
  • error;
  • warning;
  • notice;
  • information; and
  • debug.

Whether log messages are displayed or not is determined by an eight-bit log mask. The default value of the log mask is 0xfc, which enables all log messages except information and debug. The log mask can be set programmatically or from an environmental variable. As a special case, if the environmental variable, which is called COM_DIAG_DIMINUTO_LOG_MASK, has the value "~0", all log messages are enabled.

So far, this sounds pretty mundane. Here's the interesting part.

  • If the log function call is compiled as part of a kernel module, the Diminuto log functions automatically revert to using the kernel's printk logging mechanism; each severity levels is translated to the appropriate printk level; and their emission is controlled as usual by the value of /proc/sys/kernel/printk.
  • If the calling process is one whose parent is the init process a.k.a. process id 1 (this typically means the child process is a daemon that has been inherited by init), or if that process is the session leader (which typically means it has no controlling terminal on which to display a log message), the log function routes the formatted message to the standard syslog(3) API; its severity level is translated to the appropriate syslog(3) severity; and its emission can be controlled via the standard setlogmask(3). For most systems, this means the message will be passed to the syslog daemon (however that may be implemented) and written to a file such as /var/log/syslog.
  • In all other cases,  the formatted message is written to standard error.

This vastly simplified the handling of log messages. It made Diminuto code much more easily reusable in a broad variety of contexts. The same function, using Diminuto logging, can be used in a command line utility or in a daemon; either way, output messages (in particular, error messages) will be handled appropriately. Some functions can even be used in the context of a kernel-space device driver. But more importantly, it helps insure that error messages - and, as we will see below, a lot of useful context for debugging - are not lost. Furthermore, the emission of each Diminuto log message is serialized under the hood by a POSIX mutex, insuring that messages from different threads in the same process don't step on each other in the error stream.

Layer 3: Errors

The standard function perror(3), defined in <stdio.h>, prints an caller-defined error message on standard error along with a system-defined error message associated with  the global variable errno, which is typically set by failing system calls and C library functions. For example, the code snippet

errno = EINVAL;

perror("framistat");

where EINVAL is an integer constant defined in <errno.h>, displays the message

framistat: Invalid argument

on standard error.

The code snippet

errno = EINVAL;

diminuto_perror("framistat");

on the other hand displays

2020-10-27T19:28:02.889470Z "nickel" <EROR> [25416] {7f981766e740} foo.c@3: framistat: "Invalid argument" (22)

on standard error, or logs it to the system log, as appropriate. (It is displayed as one line; the rendering here folds it up.)

Because diminuto_perror goes through the normal Diminuto logging mechanism, the message includes the date and time in UTC with microsecond resolution, the host name of the computer, the severity level, the process identifier of the caller, the thread identifier of the caller, the name of the translation unit, the source line within the translation unit, and even the numeric value of the error. 

The diminuto_perror API is defined in the header file

"com/diag/diminuto/diminuto_log.h"

and is part of the Diminuto log feature.

Layer 4: Assertions

The standard assert(3) function, defined in <assert.h>, evaluates a caller-specified scaler expression, and if the result of that expression is false, it displays an error message on standard error and ends the execution of the caller using abort(3) defined in <stdlib.h>.

foo: foo.c:2: main: Assertion `1 == 2' failed.

Aborted (core dumped)

The diminuto_assert API that is defined in the header file

"com/diag/diminuto/diminuto_assert.h"

does all that too, but it emits the error message using the Diminuto log feature, so all that logging goodness applies here too.

2020-10-27T19:48:15.562081Z "nickel" <EROR> [25564] {7f5ed5040740} foo.c@2: diminuto_assert(1 == 2) FAILED! 0=""

Aborted (core dumped)

(It also prints the value of errno, just in case it's useful to the developer debugging this failure.)

This is a lot more useful than it might first appear. Truth is, most serious errors - like error returns from systems calls - hardly ever happen, and if they do, they aren't typically realistically recoverable. A bunch of "error recovery" code often just serves to clutter up the source code, complicate testing and debugging, and add to the cognitive load of the developer. Most of the time, the best thing to do is to dump core and exit, and leave the recovery to higher minds - like the user sitting at the terminal, or maybe systemd. Using diminuto_assert can really clean up your code by reducing both error handling and error messaging to one function call.

Here's a tiny code snippet from the gpstool utility in my Hazer GPS project, which is built on top of Diminuto.

Screen Shot 2020-10-27 at 1.55.00 PM

If the standard malloc(3) function fails to allocate memory, the entire program just goes toes up. Not a lot of drama, just an error message and a core dump, without a lot of other error handling code cluttering up the translation unit.

Layer 5: Unit Tests

I've used a lot of different unit test frameworks over the years for C, C++, Java, and Python development. The Diminuto unit tests didn't need a lot of complex support for mocking and what not. I wanted something dirt simple but functional. Most of all, I didn't want users of Diminuto to have to install an entirely different package to support the unit tests. So an almost trivial unit test frame became part of Diminuto itself. (And before you ask: yes, there is a unit test in Diminuto for the Diminuto unit test framework.)

Here is a code snippet that is one simple unit test from the suite that exercises the Diminuto InterProcess Communication (IPC) feature.

Screen Shot 2020-10-27 at 2.23.07 PM

The unit test framework is nothing more than a series of C preprocessor macros. Here you can see that a new test is declared, a mixture of assertions (where failures are fatal) and expectations (where failures are logged and counted) are made, and the outcome of the test is displayed. If the unit test succeeds, the following is displayed.

2020-10-27T20:27:08.354153Z "nickel" <NOTE> [25741] {7fbabe66a000} tst/unittest-ipc.c@237: TEST: test=8

2020-10-27T20:27:08.354193Z "nickel" <NOTE> [25741] {7fbabe66a000} tst/unittest-ipc.c@246: STATUS: test=8 errors=0 total=0 SUCCESS.

The unit test framework API is defined in the header file

"com/diag/diminuto/diminuto_unittest.h"

and makes use of the Diminuto log feature.

Layer 6: Wrappers

Most of the features in Diminuto have fairly complex implementations. Some don't. Some are not much more than wrappers around system calls and C library calls. But for something that is used often, a little wrapping can lead to fewer surprises (contrary to the usual case with wrapping).

Consider the code snippet below, from the Hazer gpstool utility. 

Screen Shot 2020-10-27 at 4.14.09 PM

You can see if the standard fopen(3) call fails, the error leg calls both diminuto_perror, which prints the name of the file that failed to open and also the error string corresponding to the errno set by fopen(3), and diminuto_assert, which core dumps the application.

Compare that to this code snippet, also from the Hazer gpstool utility,

Screen Shot 2020-10-27 at 4.20.16 PM

in which only diminuto_assert is called.

Why the difference? Because the failing function in the second example is itself part of Diminuto. It already called diminuto_perror before it returned to the application. Even just having simple wrappers around frequently used system calls and library calls that automatically print an error message - using the smart Diminuto log-based features - reduces the amount of clutter in the source code of the application.

Of course, there maybe circumstances which the application should to print its own error message with more specific contextual information. But most of the time, the error emitted by the Diminuto code is sufficient.

Layer 7: Models

Why do some Diminuto features have complex implementations? Because they implement a specific model or design pattern of using a particular system call or C library facility. That model may have complex behavior. Typically the model is based on forty-plus years of hard won experience on my part on doing exactly this kind of thing. Or sometimes just on my opinion of the right way to do things. Or on horrifically tragic mistakes I've made in the past.

Here's an example that I already talked about in Timer Threads. Diminuto has a timer feature whose API is defined in

"com/diag/diminuto/diminuto_timers.h"

and which is based on POSIX timers. There is a potential race condition in POSIX timers between the application that starts (arms in POSIX-speak) and stops (disarms) a timer and the callback function that is invoked by the timer. POSIX timer callbacks run in a thread-like context - which is just a man page way of saying it is in fact a POSIX thread, separate from the application thread. When the application stops the timer, the callback may already be running, especially with a periodic timer, and on a multi-core (which these days is virtually any) processor. If the application releases resources - for example,  frees memory, or closes files - that it shares with the timer callback function, that function can have those resources pulled out from under it. Wackiness ensues.

Diminuto incorporates a POSIX mutex and condition in every Diminuto timer. When the application calls the Diminuto function to stop the timer, the function calls the POSIX function to stop the timer (not shown below), enters a critical section, indicates to the timer callback that it is placing it in a disarmed state, then waits for the timer callback to acknowledge it.

Screen Shot 2020-10-19 at 07.43.00

When the Diminuto timer callback (a private proxy function that invokes the user-specified callback) has finished its current business, it enters a critical section, checks to see if it has been disarmed, and if so, acknowledges that fact and signals the waiting application.

Screen Shot 2020-10-21 at 11.24.35 AM

This prevents the race condition. This is some fairly complicated stuff, and it's all transparent to the application.

What if you don't want to use that model? Fine. Don't use Diminuto in that particular instance. There's nothing keeping you from coding to the native Linux or GNU API. Hey, knock yourself out. You can still make use of the Diminuto log facility. You can make your own horrifically tragic mistakes. In these here parts, we call that a learning experience.

Layer 8: Functional Tests

Besides unit tests, Diminuto also includes a host of functional tests. These form the last line of defense against some boneheaded blunder on my part. They also provide, along with Diminuto's unit tests and command line tools, some really useful examples (I use them myself) of how to use Diminuto in the real world.

Some of the functional tests use special test fixtures I fabricated just for this purpose, to test hardware-centric features like General Purpose Input/Output (GPIO) pin input and output, serial port reading and writing, and pulse width modulation (PWM).

Untitled

When I make a change to any of this hardware-related code, or its dependencies, or just when I'm feeling lonely, I get the text fixture out, hook everything up, and run some functional tests that do things like light up LEDs in response to button presses, or adjust the illumination of an LED to achieve some specific value of lux on a light sensor.

The functional tests and command line tools form the outermost layer of the Diminuto framework.

End Matter

Diminuto has been enormously useful to me (and, sometimes, to my clients) for a wide variety of reasons, not the least of which are the lessons I carried away from the layered architecture which emerged organically from its development.

If you want to see non-trivial examples of Diminuto in action, checkout the gpstool and rtktool applications that are part of my Hazer GPS/GNSS project. They each make good use of a variety of Diminuto features. gpstoolrtktool, and their Diminuto underpinnings, are in constant use, 24x7, as part of my Differential GNSS work.

Supporting Material

Chip Overclock, "Being Evidence-Based Using the sizeof Operator", 2015-03, <https://coverclock.blogspot.com/2015/03/being-evidence-based-using-sizeof.html>

Chip Overclock, "Buried Treasure", 2017-01, <https://coverclock.blogspot.com/2017/01/buried-treasure.html>

Chip Overclock, "Some Stuff That Has Worked For Me In C", 2017-04, <https://coverclock.blogspot.com/2017/04/some-stuff-that-has-worked-for-me-in-c.html>

Chip Overclock, "renameat2(2)", 2018-04, <https://coverclock.blogspot.com/2018/04/renameat22.html>

Chip Overclock, "When The Silicon Meets The Road", 2018-07, <https://coverclock.blogspot.com/2018/07/when-silicon-meets-road.html>

Chip Overclock, "When Learning By Doing Goes To Eleven", 2020-03, <https://coverclock.blogspot.com/2020/03/when-learning-by-doing-goes-to-eleven.html>

Chip Overclock, "Headless", 2020-06, <https://coverclock.blogspot.com/2020/06/headless.html>

Chip Overclock, "Clock Time", 2020-10, <https://coverclock.blogspot.com/2020/10/clock-time.html>

Chip Overclock, "Timer Threads", 2020-10, <https://coverclock.blogspot.com/2020/10/timer-threads.html>

Afterword (updated 2020-11-19)

This afterword got so long I promoted it to its own blog article: Is the Virtuous Cycle really that virtuous?

Wednesday, October 21, 2020

Timer Threads

Way back in 2009, I wrote some code under Linux/GNU to use timers. Back then, it was the setitimer(2) feature, originally from BSD. But that approach has a flaw: real-time clock adjustments, for example via Network Time Protocol (NTP), can jitter the timer period. The newer timer_create(2) et al. POSIX timers can be configured to use a monotonic clock that makes no such adjustment.

POSIX timers can also be configured to either send a kill(2)-type signal, like SIGALRM, just like the BSD timers, or to invoke a callback functionWhile refactoring my code to use POSIX timers while maintaining the original signal-based API, I added the ability to also use timers with callback functions. I functionally tested the function callback feature by using it to do Pulse Width Modulation (PWM) on a General Purpose Input/Output (GPIO) pin. The functional test ran fine for hours... until it segfaulted! Pointing the GNU Debugger (GDB) at the core dump showed the callback had used a pointer that it shared with the application, and that pointer was NULL.

It only took me a few moments to realize it was a race condition: the callback runs in the context of its own POSIX thread. When the main thread in the application disarms (stops) the timer, then releases resources associated with it, the timer callback can be running, especially on a multicore processor. The timer callback had the resources pulled out from under it. I added a POSIX mutex and condition so that the main thread waits until the callback thread acknowledges that it's being stopped.

Here's the code snippet from the timer stop function in Diminuto, a framework of C code useful for the kinds of work I am typically called upon to do. (You can click on an image to see a larger version.) The stop timer function informs the timer callback that it is being disarmed, and waits for the callback to acknowledge that fact, which it will do by transitioning to the idle state and condition signaling the application. The POSIX mutex and condition objects are embedded inside the Diminuto timer object.

Screen Shot 2020-10-19 at 07.43.00

Here's a code snippet from the callback function. Note that non-periodic (a.k.a. one-shot) timers place themselves in the idle state as soon as their callback completes.

Screen Shot 2020-10-21 at 11.24.35 AM

As soon as I saw the backtrace in GDB, I knew I was going to learn something useful. That made for a good day.

Friday, October 02, 2020

Clock Time

POSIX threads (a.k.a. pthreads) - the standard that defines an API and an execution model for supporting threads in a programming language - are old news. In the past twenty years I've written or helped write three different pthread-based frameworks in C++ for different jobs or gigs, and used several other frameworks that others had written. Today finds me finishing up writing yet another pthread framework (YAPTF), this time in C, for my own use. It will be a part of Diminuto, my C-based systems programming library whose open source is available on GitHub. This is something I should have done a long time ago, instead of hacking little separate bits of pthread code into virtually every non-trivial C project that I wrote.

Here's the thing: it wasn't until I was writing a unit test for this new feature that I finally realized why the pthreads API function

int pthread_cond_timedwait(pthread_cond_t *cond, pthread_mutex_t *mutex, const struct timespec *abstime);

uses the absolute wall clock time (e.g. 2020-10-02T17:38:24.03894879UTC) instead of a relative duration (e.g. 0.50000000s) to specify a timeout (abstime) for the condition that the caller is asking to wait for. Timeouts in other POSIX standard features, like the select(2) system call, are specified as a duration.

It finally dawned on me it was because this allows you to specify a consistent fixed timeout for complex situations that may have to receive multiple pthread signals to satisfy their pthread wait condition.

Here's a code snippet from the unit test. Below, my function diminuto_thread_wait_until calls pthread_cond_timedwait in the pthread library. (Click on the image to see a larger version.)

Screen Shot 2020-10-02 at 11.25.11 AM

The test initializes the thread object (a C struct instance) with a pointer to the function, body7, that implements the thread logic. The test starts the thread with an argument of just a null pointer that is passed to the function. Then the test acquires the current clock time and (effectively) adds one second to it. The test then enters a critical section which locks a pthread mutex associated with the thread. Inside the critical section, if the thread has not yet entered the running state (something the framework changes), the test waits until it receives a pthread signal on the condition associated with the thread from my framework indicating that the thread state has changed. Then it checks again. When the thread is in the running state, the test exits the critical section, unlocking the mutex.

No matter how many times the test waits, the thread state changes, a signal is sent, and the test wakes up to check the state again, the unit test will never wait longer than a total of one second. If more than a total of one second elapses, the wait will return the error code ETIMEDOUT instead of zero, and the test will fail.

This eliminates all the bookkeeping I've had to do in real-time operating systems that have non-POSIX threading models. In those RTOSes, I had to keep track myself of how much time had elapsed, subtract that duration from the total timeout, check that I hadn't used up all the timeout, and if not then wait again. It's not that hard (in fact, Diminuto has features that make this pretty easy), but it does call for more code.

I confess I felt a warm glow inside when I figured this out. There is a joy in understanding a problem that people smarter than you have solved.

Monday, September 21, 2020

Dead Reckoning

 U-blox, the Swiss manufacturer of GPS and GNSS receivers that I've used in a number of projects, has models that incorporate both a multi-band receiver and an Inertial Measurement Unit (IMU). The IMU uses accelerometers and gyroscopes to estimate the movement and rotation of the receiver (and whatever it is embedded in) even in the absence of a usable GNSS signal. This is referred to as dead reckoning. The u-blox NEO-M8U module, for example, uses both the ensemble GNSS fix (if it exists) and the IMU measurements to produce a continuous fusion fix that can be maintained even as the GNSS fix is temporarily lost and then reacquired.

How large is the IMU? It's inside the thumbnail-sized M8U module. Here's is an image of the M8U mounted on a tiny red printed circuit board from SparkFun Electronics, which I in turn mounted in a plastic box on a test fixture, along with a multi-band GNSS active antenna.

Note: Click on any image to see a larger version.

Untitled

How is this possible? The accelerometers and gyroscopes are examples of a MicroElectroMechanical System (MEMS): microscopic electro-mechanical devices with moving parts, manufactured using semiconductor chip fabrication technology.

The IMU isn't perfect. Because the MEMS devices in the module are fairly noisy, the IMU has to do a fair amount of filtering and debouncing of their output. This leads to a noticeable latency of several seconds for the IMU to respond to changes in roll, pitch (road grade), or yaw (vehicle heading). This latency might explain some of the results I present below.

The IMU requires a short period of calibration at start up for it to determine the vehicle reference frame. The automatic calibration requires that the vehicle initially make turns and travel at sufficient speeds, all with adequate GNSS reception. I had to account for this calibration period during my test drives.

I also found that I got the best results when the M8U module was leveled and securely mounted with respect to the vehicle. Ultimately, this led me to mount the module on a gimbaled camera mount (like you might use for a GoPro), and to incorporate a spirit level into the test fixture.

But it's kind of remarkable that it works at all. The controller in the device itself does all of the heavy lifting. My software, running on a laptop or small computer like a Raspberry Pi, merely configures the module using proprietary UBX commands, and then reads the continuous stream of data from the device in the form of standard NMEA sentences and UBX messages. The device generates a position fix once a second, and reports useful information about the fix, like whether GNSS, or the IMU, or (more typically) both, were used in the solution.

Test Fixture

Here is the test fixture mounted on the dashboard of my Subaru WRX during a field test.

Untitled

The module connects via a USB cable to a laptop running some flavor of Linux/GNU and my open source C-based Hazer software package. The USB cable provides both power to, and two-way communication with, the M8U module. As is typical, the active antenna is powered by the M8U via a voltage bias applied to its coaxial cable.

Here is the computer I used in later test runs: a GPD MicroPC industrial laptop running MATE, an Ubuntu-based version of Linux.

Untitled

You won't want to do any touch typing on this tiny, almost pocket sized, computer, but it was perfect for this application. (Earlier I used an ancient HP 110 netbook running Mint, another Ubuntu-based Linux. It worked fine too.)

Test Strategy

I had to do some pondering as to how to test this device. The most obviously way was to obscure the antenna with something that would block the signals from the various GNSS satellite constellations. I owned several RF-blocking bags - originally purchased when I was testing cellular data modems - that I could have used for this purpose. I did do some early testing using this technique. But I wanted a more dynamic, organic approach that was in line with the M8U's intended use case in road vehicles.

That's when I remembered I lived in a western suburb of Denver Colorado, just east of the Rocky Mountains. Mountains with lots of highway tunnels. Like the short tunnels built in the 1930s along the United States route 6 road just a few minutes west of where I live. Or the ginormous 2.7km (1.7 mile) Eisenhower and Johnson tunnels built in the 1970s, that carry the Interstate route 70 highway through the Continental Divide about an hour west of my home. This project clearly called for a road trip.

My software can be configured to save the current position fix - latitude, longitude, and altitude, along with a bunch of other data - once a second. The data is saved in human-readable form to a file in Comma Separated Value (CSV) format. This makes it easy to post-process using simple scripts or even Excel. I wrote scripts that converted the CSV data into various files in Keyhole Markup Language (KML), an XML-based data format used to annotate place marks and visualize two- and three-dimensional paths in "earth browsers" like Google Earth.

So far I have done four test drives with my test fixture, through either the US-6 tunnels, or the I-70 tunnels, or (as luck would have it) a route that went through both.

Note: All of the screen shots from Google Earth below follow the usual convention of west to the left, east to the right, north up, and south down. In the U.S. we drive on the right-hand side, so in an east-west road, the upper lanes would be west-bound and the lower lanes would be east-bound. The discussion below assumes that Google Earth is pretty accurate in its mapping of latitude/longitude coordinates, and that they use the same datum, or definition of latitude/longitude, as GPS: WGS84.

Here's is a screen shot of one of my test drives as visualized in Google Earth.

Screen Shot 2020-09-15 at 12.13.25 PM

The red line (a LineString in KML) marks my path, and the yellow push pins (Placemarks in KML) mark the important points like tunnels along the way (different post-processing scripts define the push pins differently). The label on each push pin is effectively elapsed seconds. (It's actually the position fix number, but there is a fix recorded every second.)

The total travel duration for this road trip was about two and a half hours. This route took me west through on I-70 through a short tunnel just east of Idaho Springs, then through the Eisenhower Tunnel, turning around at Silverthorne, then back east on I-70 through it's peer, the Johnson Tunnel, exiting onto US-6 through a series of smaller tunnels, and finally back home. I began each test drive with enough meandering around my neighborhood that the IMU should have been able to calibrate. (Whether more meandering would have yielded better results is an open question.)

Road Trips

Here is a satellite view of the eastern end of the Eisenhower/Johnson tunnels on I-70. The yellow push pins here mark where there were changes in the position fix - a change in the number of satellites in view, or a total loss of GNSS navigation with a fallback onto the IMU. I'm entering the west-bound Eisenhower Tunnel at the top, and later exiting the east-bound Johnson Tunnel at the bottom.

Screen Shot 2020-09-21 at 8.41.08 AM

When I entered the Eisenhower Tunnel at the top going west,  I quickly lost GNSS reception until it was completely gone and I was relying solely on the IMU at the last of the west-bound push pins. Later, on the return leg, when I exited the Johnson Tunnel at the bottom going east, having turned around, it took a few moments for the module to reacquire GNSS signals at the first east-bound push pin, but then it quickly acquired more satellites by the last east-bound push pin.

A similar story is found at the western end of the Eisenhower/Johnson tunnels, exiting the west-bound Eisenhower Tunnel at the top, entering the east-bound Johnson Tunnel at the bottom.

Screen Shot 2020-09-21 at 8.40.40 AM

It took a few moments to reacquire the fix upon exiting the tunnel when heading west. Going back east on my return leg, I lost the GNSS fix quickly upon entering the tunnel. 

I traversed the Eisenhower and Johnson tunnels on three different test drives, each with slightly different results.

Screen Shot 2020-06-22 at 1.19.49 PM

In screen shot above, at the western end of the tunnels, the purely IMU position fix of my west-bound track drifted so far left (down), it crossed the east-bound track and made it look like I was in the wrong tunnel (which is impossible). When the module reacquired GNSS, it quickly corrected the fix to put me in the correct lane.

Screen Shot 2020-09-03 at 12.54.01 PM

In the screen shot above, its hard to say whether the IMU position fix drifted since I don't know exactly the path of the two tunnels through the mountain range, but it seems likely. (There is a pronounced bend in the tunnels at the eastern end, which you can see from the orientation of the eastern entrance relative to the western entrance.)

Screen Shot 2020-09-15 at 12.15.02 PM

Once again, there is an obvious drift in the screen shot above that makes it look like the tunnels merge at the western end (they do not). It's corrected as soon as I get near enough to the tunnel mouth that GNSS is reacquired.

There were even more radical corrections on the far shorter tunnels along US-6. (Updated 2020-10-03) Note, however, that both of these tests shown below were performed before I came up with the more stable test fixture on which I mounted the GNSS module; results after that change were noticeably better. This illustrates the importance of having a secure mounting. (End Update)

Screen Shot 2020-06-15 at 1.44.39 PM

On the western end of the tunnel above, you can see the sudden and abrupt correction made as I exited the tunnel going west-bound. It looks like I was smoking the tires on the WRX as I made a pair of bootlegger turns.

Screen Shot 2020-09-15 at 12.19.16 PM

Above, on another tunnel along different section of US-6 going east bound, you can see a similar sudden correction.

But some of the test runs in some of the tunnels looked nearly perfect.

Screen Shot 2020-09-03 at 12.52.54 PM

This tunnel is far shorter than, and not as deeply buried as, the Eisenhower/Johnson tunnels. Going back and examining the original CSV file, I see in this particular case that a GNSS fix was maintained for about half way through the west-bound tunnel, and for about a third of the way through the east-bound tunnel. In both directions, the position fix was then made solely using the IMU until I exited the tunnel.

Parking Garage

Another idea I had to test the IMU was by driving around inside a four-level parking garage at a shopping mall just a few minutes away from my home. This is what that path looked like after I converted the CSV into KML and imported it into Google Earth.

Screen Shot 2020-09-17 at 10.14.11 AM

Even driving around in the lowest two levels, then parking at the lowest, darkest, level of the garage, where no direct sunlight was visible, I was able to get a GNSS fix with at least six satellites (the minimum necessary is four if you want to also solve for altitude). The M8U used both the GNSS and the IMU for a position fix, but in this case the IMU was continuously able to be checked and if necessary corrected using GNSS.

Screen Shot 2020-09-17 at 10.34.31 AM

Looking at a close up, tilted, view of the parking garage in Google Earth, the path deviates enough that it looks like my WRX flew out of the parking deck and (somehow) back in again in a couple of places. This could be a lack of precision in the position fix of the M8U (GNSS isn't perfect either), an error on the part of Google Earth, or maybe I should audition for the next Fast and Furious movie. If I were to fly out of the parking deck and back in again, the Subaru WRX would be the car I'd use to do it.

Since the iterative least-squares algorithms typically used by GNSS receivers to compute the overdetermined solution is more accurate the more satellites it uses, and in open sky the M8U is able to use many satellites from several constellations, it is possible that in some circumstances in the garage the IMU could actually be more accurate than the GNSS when only a very few satellites are visible and they are close enough in their orbits that Dilution of Precision is an issue.

Conclusions

The integrated Inertial Measurement Unit in the NEO-M8U added to the accuracy of the position fix when GNSS signals were completely unavailable, but only for a short amount of time. The longer the device went without GNSS, the more the IMU position fix drifted.

The IMU might be quite useful for its intended application with typical road vehicles. I wouldn't use it to navigate in any coal mines. But it works fine over short periods of no satellite reception. And it's remarkable that it works at all.

Tuesday, June 16, 2020

Headless

Here's the problem I had to solve: my Differential Global Navigation Satellite System (GNSS) base station runs in "survey-in" mode for days in order to achieve a positioning resolution down to a few centimeters. This procedure takes so long that the Raspberry Pi 3B+, on which my gpstool software runs to process the output from the u-blox ZED-F9P GNSS module, is powered by an Uninterruptible Power Supply (UPS).
 (Click on any image to see a larger version.)
Untitled

gpstool has a useful, simple, text-based real-time output display. I'd like to keep an eye on the display as it runs. But the Raspberry Pi base station runs headless - sans display, keyboard, or mouse - tucked inside a narrow drawer near where its antenna is mounted in a skylight. To avoid disrupting this lengthy operation, gpstool runs as a daemon, carefully disassociating itself from any human interface device, and insulating itself from Linux/GNU software signals that might interfere with it. How do I track its progress?

The typical approach in development projects on which I've worked in the past is to log copious text messages to the system log, a service provided by Linux/GNU that saves such messages to a file or files in a protected system directory, which is managed by a privileged syslog process that itself is a daemon. gpstool makes use of this facility. But the rate at which the state of things change in the GNSS module and in gpstool is frequent enough to be a kind of firehose of data to the syslog. It would be a lot more user friendly to carefully ssh into the Raspberry Pi - an action which itself is not without some risk - and use some kind of command line tool to bring up the real-time display, then later discard it and log out, all without interfering with gpstool itself.

This seemed to me to be a common enough problem that instead of merely implementing some specialized solution in Hazer, the git repository home of gpstool and my other GNSS-related software, I should implement it in Diminuto, my git repository containing a general purpose C-based Linux/GNU systems programming library and toolkit. Diminuto underlies Hazer and many of my other projects. 

This article describes what I did and how it works.

Step 1: The Application Programming Interface (API)

The Diminuto observation API provides the following function calls for applications like gpstool. (I'll explain what these library functions do under the hood in a bit.)

FILE * diminuto_observation_create(const char * path, char ** tempp)

The application calls diminuto_observation_create, passing it the path name of an observation file to which it wants to write its real-time display. The library function returns a standard input/output file pointer that the application can use with standard C library calls like fprintf to write its display. The function also provides a pointer to a character string that the application is responsible for providing to subsequent API calls.

FILE * diminuto_observation_commit(FILE * fp, char ** tempp)

When the application is finished with the observation file (all the output for its current display iteration has been written to it), the application calls diminuto_observation_commit with the original file pointer and the original variable containing the pointer to the character string. Once the observation file is committed - and not before - it is visible in the file system to other software, and to humans via the ls command. The library function closes the file pointer and releases the storage associated with the character string, so the contents of the two arguments are no longer useful. The library function returns a null file pointer to indicate success.

FILE * diminuto_observation_discard(FILE * fp, char ** tempp)

Should the application want to discard the current observation file and its contents, it calls diminuto_observation_discard. The observation file is never visible in the file system, and any data contained in it is lost. The file pointer is closed, and the storage associated with the character string is released. The library function returns a null file pointer to indicate success.

FILE * diminuto_observation_checkpoint(FILE * fp, char ** tempp)

Should the application want to keep the current observation file before it commits or discards it - an action that might be stimulated by a human operator doing something like sending the application a SIGHUP or "hangup" software signal (a common idiom in the Linux/GNU world, and one used by gpstool) - the application calls diminuto_observation_checkpoint. A new file appears in the file system that has the name of the original observation file appended with a microsecond-resolution timestamp. This checkpoint file persists in the file system with whatever data was written to the file pointer between the time of the create and the commit or the discard, regardless of when the checkpoint function was called.

Step 2: The Application Task Loop

The Hazer gpstool command line utility implements a task loop in which it reads and processes data from the GNSS module; about once a second, it pauses to update its real-time display.

At the top of the task loop, gpstool calls diminuto_observation_create and gets a pointer to a standard I/O file object. As it processes information from the GNSS module, it writes to this file pointer.

Screen Shot 2020-06-16 at 11.32.54 AM

The contents of the file looks something like this once a complete display has been generated. When this file is complete, gpstool calls diminuto_observation_commit and the observation file containing this display is now visible in the file system. Then gpstool loops back to the top of the task loop, calling diminuto_observation_create again.

Should gpstool receive a SIGHUP software signal, it makes a note of this fact, and eventually calls diminuto_observation_checkpoint.

When gpstool exits, it calls diminuto_observation_discard to clean up any uncommitted temporary file that may have existed from its final, partial, iteration of the task loop.

Step 3: The Library Implementation

This is how the Diminuto observation module is implemented.

diminuto_observation_create:
  • Dynamically allocate a character string containing the observation file path name appended with the string "-XXXXXX". The use of the observation file path name is important, as it insures that this character string will name a file that is in the same directory as that of the observation file.
  • Use the standard mkstemp function to create a file using the character string as its name, but automatically replacing the "XXXXXX" with a randomly generated character sequence like "qx03ru" (actual example) that guarantees that the file is unique in the target directory. The standard function returns an open file descriptor (fd) for this new file.
  • Use the standard fdopen function to create an open standard I/O file pointer for this descriptor.
  • Store the pointer to the allocated character string in the provided variable.
  • Return the open file pointer to the temporary file for success
diminuto_observation_commit:
  • Recover the original observation file path name by truncating the added temporary file suffix from the character string in a second dynamically acquired character string.
  • Close the file pointer using the standard fclose function. This has the desirable side effect of flushing the standard I/O memory buffer to the temporary file.
  • Use the standard rename system call to rename the temporary file to be the observation file. Because the two files are in the same directory, this system call performs this action atomically: the temporary file disappears as if it were deleted, and the observation file appears with the full intact contents of the data written to the temporary file by the application. (rename performs its action atomically provided the source and destination are both in the same file system; being in the same directory is a simple way to insure this.) The rename system call replaces any existing file with the same name, so any prior observation file is deleted from the file system as a side effect. There is never a time when a partially written observation file is visible.
  • Free the character string that contains the temporary file name.
  • Free the observation file path name that was recovered in the first step.
  • Return a null file pointer for success
diminuto_observation_discard:
  • Close the file pointer using the standard fclose function.
  • Delete the temporary file using the standard unlink system call.
  • Free the character string containing the temporary file name.
  • Return a null file pointer for success.
diminuto_observation_checkpoint:
  • Read the system clock using the Diminuto function diminuto_time_zulu.
  • Create a new file name by truncating the mkstemp suffix from the character string that is the name of the temporary file, and append a new suffix that is a UTC timestamp like "-20200616T161740Z958048" (actual example), in another dynamically acquired character string. Note that this contains the year, month, day, hour, minute, second, and microseconds, in an order that will collate alphabetically in time order.
  • Use the standard function fflush to flush the standard I/O buffer for the temporary file out to the file system.
  • Use the standard link system call to create a hard link - a kind of alias in the file system - between the temporary file and the checkpoint file name. Because the two files are in the same directory, this action is also done atomically: the checkpoint file appears in the file system containing all the data that is in the temporary file. Because it is a hard link, as the application continues to write to the temporary file, the data will also appear in the checkpoint file (which are, in fact, the same file, now known under two names). When the temporary file is either committed or discarded, the checkpoint file and its contents will remain.
  • Free the checkpoint file path name.
  • Return the file pointer to the temporary file for success.
Step 4: The Observation Script

We now have a mechanism that gpstool, or any other application, can use to create a sequential series of output files. How to these files get displayed?

The Linux kernel has a facility called inotify, which can be used to monitor file system activity and report it to an application. Lots of existing tools use this facility, like the udev mechanism that supports the hot-plugging of peripherals and the automatic attachment of removable media like USB thumb drives. Most Linux distros have a package of user-space utilities, inotify-tools, that provide command line to this facility.

Diminuto has an observe shell script that calls the utility inotifywait in a loop with the appropriate parameters so that the script is told when a file with the name of the observation file appears in the observation file directory as a result of a move operation. The implementation of the diminuto_observation_commit function emulates what the mv command does, and so it triggers inotifywait to emit the name of the observation file whenever a commit operation is performed. The observe script captures this name and emits it itself to whomever is running it, then loops to call inotifywait again.

Note that observe doesn't actually display the observation file. It has no idea what the observation file contains, or how an application like gpstool wants it displayed, or to where. It just watches for the file to show up in the file system.

(By the way, the observe script has its own SIGHUP implementation. So while the Hazer gpstool uses diminuto_observation_checkpoint to checkpoint the observation file, the Diminuto observe script provides a similar function.)

Step 5: The Rendering Script

To actually display the observation file, Hazer has a peruse script that includes a lot of Hazer-specific context about which Diminuto knows nothing. The Hazer peruse script merely calls the Diminuto observe script with the path name of where gpstool will create the observation file, based on the gpstool -H (for headless) command line option. It pipes the output of observe into a pipeline that reads the file name when it appears, clears the terminal screen, does some minor pretty-printing post-processing of the contents off the observation file, and copies it to standard output. It does this every time a new observation file by that name is moved to the target directory (even though it replaces an existing observation file).

This separates the processing of the input from the GNSS module from the output of the real-time display. I can ssh into the Raspberry Pi running a long-term survey-in as the Differential GNSS base station, fire up the peruse script, check on its progress, and then control-C out of the peruse script, with no impact to gpstool.

Example

base.csv  base.out                         base.out-ee02oB
base.err  base.out-20200608T190630Z025023  base.pid

This is an actual directory listing from a long term base station survey that's running right now.
  • base.out is the latest committed observation file;
  • base.out-ee02oB the current temporary file being written that will replace it once committed;
  • base.out-20200608T190630Z025023 is a checkpointed observation file.
In addition, there are some other files generated by gpstool.
  • base.csv is a dataset of GNSS solutions in Comma Separated Values (CSV) format;
  • base.err is the file to which gpstool is redirecting its standard error output;
  • base.pid contains the process identifier of gpstool used to sent it a SIGHUP signal.
Remarks

The ability to checkpoint observation files is so useful that I use this mechanism even when I'm not doing a long-term survey. Just yesterday, Mrs. Overclock kindly served as my co-driver as we tested the u-blox NEO-M8U, another GNSS module which includes an Inertial Measurement Unit (IMU). The tiny board-mounted module's IMU contains a gyroscope and accelerometers implemented as a Micro-ElectroMechanical System (MEMS). This can be used to approximate the module's location even when the satellite signals cannot be received - like when we drove through a series of highway tunnels on route US6 west of where we live near Denver Colorado.

I wrote a script that combined gpstool using the Diminuto observation capability, with the peruse script, and another script, hups, that sends gpstool a SIGHUP signal any time any key was pressed on the laptop running my software. This made it easy for Mrs. Overclock to capture the real-time gpstool display in a series of timestamped files, for example as we entered a short tunnel about 215 meters in length, and moments later when we exited it.

Screen Shot 2020-06-15 at 1.44.39 PM

Here's a visualization from Google Earth, produced using the Positioning, Navigation, and Timing (PNT) data captured by gpstool about once per second, converted into a Keyhole Markup Language (KML) file by another Hazer script, then imported into Google Earth. (The red continuous visualized path is not a product of the observation and checkpointing mechanism; but that mechanism was used to identify the locations marked by Google Earth with the yellow push-pins.)

The IMU tracked our path from east (right) to west (left) as we went through the tunnel. (You can see remains of the old pre-tunnel road in the satellite imagery too.) As we left the tunnel and the GNSS signals were re-acquired, the NEO-M8U determined that the IMU had our location a little off and corrected it.

I assure you that we didn't do two sudden tire-smoking turns as we exited the tunnel. Although had we done so, my Subaru WRX would have been the vehicle in which to do it.