Tuesday, July 31, 2018

We are all relying on GPS all the time whether we realize it or not.

On or around 2200 UTC on January 25 2016, what should have been a routine system administration task was being undertaken by the folks that manage the U.S. Department of Defense Global Positioning System (GPS) satellite constellation, formerly known as "Navstar". A GPS satellite identified as PRN 32 (its pseudo-random noise code number) or as SV 23 (its satellite vehicle number indicating its launch order) was being decommissioned. Parameters controlling how that satellite corrects from GPS time to Coordinated Universal Time (UTC) were being removed from a table that was sent as a routine update to all of the other GPS satellites.

But there was something special about SV 23. It was the first of the Block IIA generation of GPS satellites, launched on November 26, 1990. By virtue of its age - it was more than twenty-five years old - its SV index placed it in the lowest slot in the table. That tickled what I'm guessing was a Day One bug in the GPS software.

As a result, for nearly thirteen hours and forty-five minutes, until the problem was discovered and corrected, UTC time stamps transmitted by those satellites that had applied the update, and received by those GPS receivers that chose to use one of those satellites from whence to update its own tables in its own memory, were off by about 13.7 microseconds.

Doesn't sound like much, does it?

This event was described in the article "The World Economy Runs on GPS. It Needs a Backup Plan" [Paul Tullis,  Bloomberg Businessweek, 2018-07-25], and analyzed in detail by scientists at the U. S. Department of Commerce National Institute of Standards and Technology (NIST) in the paper "The effects of the January 2016 UTC offset anomaly on GPS-controlled clocks monitored by NIST" [Jian Yao et. al., Proceedings of the 2017 ION Precise Time and Time Interval Meeting, 2017-01-31]. The event tripped an alarm system at NIST designed to detect errors in the official NIST(UTC) time source - NIST being the official source for time and frequency in the United States - by comparing it to GPS; but when the alarm went off, it was GPS that had slipped.

Like all radio navigation systems, GPS relies on calculations with time for navigation. By computing its distance from a GPS satellite with an orbit known with very high accuracy, a GPS receiver knows that it is somewhere on a sphere whose radius is the distance between the satellite and the receiver. This distance is computed by the receiver by knowing the speed of light (with some adjustment for propagation delay in the atmosphere) and subtracting the timestamp received from a satellite from its own timestamp.

By doing this exact same calculation with another satellite, the receiver knows it is on the circle described by the intersection of two spheres. A third satellite reduces this to two points. A fourth satellite brings it to a single point.

But it's not exactly that simple: the clock in your GPS receiver (which for most of us, is our mobile phone) is nowhere near the same quality as the atomic clocks - cesium or rubidium clocks, rated for space flight - that is in every GPS satellite. And even if it were, it's not synchronized to GPS time, at least not the first time you turn it on.

So this solution for the intersecting spheres isn't a point; it's a volume. But by using a little math and a lot of compute cycles, your receiver can iteratively solve for your position by adjusting its clock in whatever direction makes this volume grow smaller. It continues to iterate, adjusting its clock, until it reaches the resolution of the GPS timestamps, a handful of nanoseconds. And at that point, two wondrous things have happened.

First, through this computation I just described known as trilateration, your GPS receiver now knows its position to within a few feet, because light travels about foot in a nanosecond.

Second, the cheap ass clock in your GPS receiver is now synchronized to GPS time. And by continuously recomputing the navigation solution, your receiver can be kept synchronized to GPS time, even as it drifts.

And it isn't just synchronized to GPS time; it is syntonized to GPS time: it is now a frequency source that is approximately in phase with the GPS atomic clocks. This is tantamount to getting every human being in the world to jump up and down at the same time.

But here's the thing: being off by 13.7 microseconds... that's 13700 nanoseconds (actually, it was 13696.03 nanoseconds, as it turns out). That's about a 13700 foot navigation error. That's over two and a half miles.

Okay, that's serious.

Well, in this particular case, not really, at least not for those that rely on GPS just for geolocation: the error was in the adjustment from GPS time (which is used for navigation) to UTC time (which is what our clocks care about). The GPS timestamps were unaffected, but the UTC time had slipped. That's why the alarm at NIST went off.

Imagine what would have hit the fan if the GPS timestamps had been off by that margin.

But GPS isn't just about navigation. Because GPS is usually such a reliable source of precision time and frequency, any system or application that relies on precise time or frequency now typically uses an inexpensive GPS receiver to serve as a time and frequency source. There was a time when the major telecommunication carriers like AT&T had their own atomic clocks used to provide network synchronization. But today, sitting here at the Palatial Overclock Estate, I have five precision time and frequency sources on my home network, in the form of GPS-disciplined clocks, with their GPS antennas sitting in windows, peering at the sky. Commercial atomic clocks cost tens of thousands of dollars; GPS receivers are tens of dollars; GPS-disciplined clocks, hundreds of dollars. Today, GPS is the common source for precision time and frequency.

So what relies on GPS for precision time and frequency, or for geolocation?

Everything you care about.

The internet. The mobile telephone network. Emergency communication systems. Radio and television broadcasters. Electrical utilities. Cable television. The banking system and stock exchanges (because the order of transactions depends on each being labeled with a highly precise timestamp). Aircraft. Advanced weapons systems.
"The U. S. Department of Homeland Security has designated 16 sectors of infrastructure as 'critical', and 14 of them depend on GPS." [Tullis]
It isn't just failures in the GPS system due to bugs or innocent mistakes. GPS can be hacked, spoofed, and trivially jammed. In some cases, with just a couple hundred bucks of commodity parts, a little ingenuity, and some open source software.

There is currently no backup to GPS. The Russians have their own global navigation satellite system - called GLONASS - with global coverage. But it is just as prone to failure, with or without malicious intent, as the U. S. GPS system. There has been talk about reviving Long Range Navigation (LORAN), the post-WWII era ground-based radio system that used hyperbolic navigation, that GPS replaced. LORAN transmitters are defensible, more powerful, more easily repairable, and less easily jammed. But that is at best a long-term (and expensive) solution. And it's not clear to me that it solves the time and frequency distribution need. Folks are working on a sort of GPS firewall that aims to detect when GPS signals are being spoofed. But we are a long way from having that sort of capability in our mobile phones. And some solutions require that you have an atomic clock on site as a backup for when GPS is unavailable. I'm probably the only person you know that has a cesium atomic clock in his living room.

I used to watch the television series The Walking Dead. The other day it occurred to me that its portrayal of an apocalyptic zombie-plague wasn't a bad metaphor for a suddenly GPS-denied world.

Thursday, July 19, 2018

When the Silicon Meets the Road

Test what you fly; fly what you test.
-- NASA aphorism

When you are responsible for maintaining a library of reusable software that contains components that deal with honest to goodness hardware peripherals, most or all of which don't exist on your build server, how do you unit test it? As an embedded developer with decades of experience under my ever lengthening belt, I deal with this quandary often. I've worked on projects that had a vast infrastructure to simulate the underlying hardware platform; maintaining that infrastructure was its own product development effort with its own unit testing issues. I've worked on projects that had part of a sizable test organization dedicated just to testing the software on development lab bench mules that were originally hardware prototypes. I've worked on projects that had no good long term approach. You probably have too.

One of the problems with any approach is the constant economic pressure not to maintain whatever test infrastructure existed once the project gets past one or two product releases and management turned its sights onto the next shiny new revenue producing thing. When it becomes time to fix some bugs and do a new release of the legacy product, the new engineer on the project - almost inevitably it is the new engineer on the project, sometimes a junior engineer tasked with proving themselves, in some group that might be euphemistically called something like "sustaining engineering" - is now faced with having to figure out all over again how to test their changes, on software they aren't familiar with.

I had to deal with this issue writ small when working on Diminuto, an open source (LGPL) library of C functions that I have been developing over the span of more than a decade. It is my go-to systems programming toolkit. These days it’s hosted on GitHub as com-diag-diminuto, but it's been around long enough to have been maintained using several different version control systems. It started out in 2008 as a collection of supporting code for a tiny (hence the name) ARMv4 embedded project using Buildroot, with a stripped down Linux kernel, uClibc, BusyBox, and not much else. The project pre-dates the existence of the Raspberry Pi, and even the BeagleBoard.

Over time I kept expanding Diminuto as my needs grew, and porting it to new - and generally less expensive yet more powerful - platforms as they became available, as well as to new Linux releases as they arose. Eventually, all or parts of Diminuto ended up inside a handful of shipping products that I helped my clients develop. While I’ve done time in Python, wrote hundreds of thousands of lines of embedded C++ using the STL, fielded loads of Bash scripts, and even hacked JavaScript as the need arose, most of my paying gigs continue to be lower level systems work in C, writing device drivers, daemons, utilities, and glue code that holds all the higher level stuff together that's written by application and user interface developers. Diminuto also provides the infrastructure for a bunch of my other personal projects, most of which are also hosted on GitHub.
(Added 2018-07-20) An issue with maintaining software libraries of reusable code intended for systems programming and embedded applications across multiple products is that the library is architecturally divorced from any one specific product. So while you may maintain a legacy product in a test lab for unit and functional testing, it may not be adequate to test all the features in a library because the product doesn't necessarily use all those features. So to really test a new library release, you might have to corral several otherwise unrelated test mules, if that's even possible. And even that might not be sufficient for acceptable test coverage. One reason this doesn't crop up more often - in my experience anyway - is that companies don't seem to see the value in such libraries, unless they come from somewhere else (e.g. open source). I suppose that's because, again, of the cost of maintaining them. That's another reason for Diminuto: I got tired of writing and debugging and testing and documenting the same proprietary closed source code over and over again (although, you know, paid by the hour) for different clients. Or, remarkably, sometimes for the same client, but that's a story for another time. And so, some or all of Diminuto now ships in several different products produced by completely unrelated organizations.
Diminuto has a bunch of modules, what I call features, that are designed to work together.

Some features are not much more than preprocessor macros, for example:
  • Critical Section (scoped POSIX thread mutex serialization);
  • Coherent Section (scoped acquire/release memory barriers);
  • Serialized Section (scoped spin locks); and
  • Uninterruptible Section (scoped blocked signals).
Some provide simpler interfaces to inherently complicated stuff:
  • IPC4 and
  • IPC6 (IPv4 and IPv6 sockets); and
  • Mux (multiplexing using the select(2) system call).
Some provide convenience interfaces that combine several operations into a single function call, sometimes atomically to make them thread safe:
  • I2C (I2C bus);
  • Log (logging API that works in kernel space or in user space, and which automatically directs log messages to the system log for daemons, or to standard error for applications);
  • Pin (general purpose I/O using the /sys file system GPIO interface);
  • Serial (serial port configuration).
Some implement interfaces that are more consistent and mutually interoperable than the native POSIX and Linux capabilities:
  • Frequency,
  • Time,
  • Delay, and
  • Timer (the usual time-related stuff but all using the same units of time).
Some implement functionality that is a bit complex in and of itself:
  • Modulator (software pulse width modulation or PWM using POSIX interval timers);
  • Controller (proportional/integral/derivative or PID controller);
  • Shaper (traffic shaping using a virtual scheduler); and
  • Tree (red-black tree).
Some are just useful to have:
  • Phex, pronounced "fex" (print non-printable characters); and
  • Dump (print a formatted dump of selected memory).
Having this collection of documented, pre-tested, reliable functions that handle the heavy lifting of the POSIX and Linux APIs that I routinely am called upon to use means I can rapidly toss together a usable working piece of C code without having to worry about whether I did the conversion between microseconds and nanoseconds for two different POSIX calls correctly.

Virtually all of the Diminuto features have unit tests that verify the correctness of the implementation. These unit tests are part of the Diminuto repository, and like the library itself, I would expect them to build and run on any modern Linux distribution on any processor architecture.

I'm a big believer in unit tests. But some of these features, like Pin, Serial, Modulator, and Controller, can only really be adequately tested using functional tests on actual peripheral hardware. Over the years, to test these features, I’ve built a series of hardware test fixtures. These range from simple custom wired connectors, to breadboards with ICs on them. When I work on these features, I pull the appropriate fixture out of a box, hook it up to the system under test (typically a Raspberry Pi), and run the functional test.

Untitled

Why is this important? Because I want to know my generically-useful but hardware-related features work before I (or anyone using my library) deploy them in an actual product development effort. And if for some reason they don’t work in the project, I want to be able to back up and use my functional tests to verify basic sanity, and to help me see where the problem might actually be. Even if the bug turns out to be in my code, the functional tests at least help me determine what does work and indict what doesn’t work. The functional tests also serve as a living example of how I intend the Diminuto features to be used.

Here are a couple of CableMax USB-to-serial adapters, one with a hand-wired loopback, the other with a commercial loopback adapter. These make use of an FTDI USB-to-serial chip, which is pretty much the only brand I use these days for this type of device.

Untitled

I use these to test the Serial feature and its ability to set baud rate, data bits, and stop bits using the lbktest functional test.

Here is a NaviSys GR-701W USB GPS dongle that supports one pulse per second (1PPS) signaling using the data carrier detect (DCD) modem control line.

NaviSys Technology GR-701W

I use it to test the Serial feature DCD support by reading NMEA sentences from the GPS receiver using the dcdtest functional test. As fringe as it may seem, I have several other projects, like com-diag-hazer and its own dependents, that rely specifically on this feature. It pays to verify its functionality before any regressions creep outside of Diminuto proper. Which is really the point of all the functional and unit tests.

Here are more two CableMax USB-to-serial adapters connected back to back with a null modem adapter in between.

Untitled

I also use these to test the Serial feature, either between computers or even on the same computer using two USB ports, using Diminuto's serialtool utility.

Here is what my latest functional test breadboard looks like.

Diminuto Hardware Test Fixture

It consists of a Uctronics expansion board that connects the breadboard to the Raspberry Pi via a ribbon cable; four LEDs with built in resistors; two momentary contact buttons, one active high, the other active low with a pull up resistor; and an Avago APDS9301 ambient light sensor with an I2C interface. I also bring out the pins to the hardware console port on the Raspberry Pi.

The pintest functional test, based on the Diminuto's pintool utility, uses Mux and Pin to read the buttons and to write patterns to the LEDs. Pin, Mux, and the underlying Raspberry Pi GPIO controller, supports multiplexing GPIO pins using select(2).

The pwmrheostat functional test uses Pin and Modulator to control as many as four LEDs concurrently, all at different brightness levels.

The luxrheostat functional test uses Pin, Mux, Modulator, and I2C to control a single LED pointed at the ambient light sensor, turning the LED brightness up and down, while reading the light level.

The luxcontroller (formerly pidtest) functional test uses Modulator, Mux, Pin, and I2C to control the brightness of an LED and read its intensity on an ambient light sensor, and Controller to maintain a specified light level as the background illumination changes.

(I am in the process of adding a TI ADS1115 analog to digital converter or ADC, also with an I2C interface, to this breadboard to extend the testing of the Modulator software PWM feature and the Controller PID feature. I have a first cut at a adcrheostat and adccontroller functional tests.)

Here is a FTDI TTL-232R-3V3 USB-to-serial adapter that works with 3.3 volt logic-level signals, with logic clips at the end of the signal wires.

Untitled

I use this to further test the Serial feature using the Raspberry Pi hardware console port that is accessible via the breadboard.

These test fixtures are pretty simple stuff. But they allow me to exercise the hardware-related features of Diminuto such that I have high confidence that I haven't done something boneheaded. When I'm not using them, these fixtures go back into the box and go up on the shelf.

When I'm thinking of adding another hardware-centric feature to Diminuto, one of the first things I decide is how I am going to test it: if I can do so with an existing fixture; if I need to add to an existing feature like the breadboard; if I need to build a new fixture; and how much that's going to cost. Sometimes I decide the cost isn't worth it, and forgo developing the feature in Diminuto at all, leaving it to a future client to pay for and maintain if its needed. Or maybe I just wait for technology to catch up.

Test what you ship; ship what you test.
-- Chip Overclock aphorism

Updated 2018-07-25

Here's a photograph of the breadboard after I added the ADC. It's getting complicated enough that I probably should actually generate an official schematic just for my own records if nothing else.

Untitled

The adcrheostat and adccontroller functional tests now use Pin and Modulator to do PWM on a GPIO pin, and I2C to read the voltage on the pin from the ADC (the tiny blue board). The latter test uses Controller to implement a PID controller that strives to maintain a specified voltage. These tests are a minor variation on how luxrheostat and luxcontroller use instead an LED and an ambient light sensor (the tiny red board).