Saturday, April 04, 2020

Meet the new bug. Same as the old bug.

The Guardian in the U.K. reports on an FAA Airworthiness Directive that requires Boeing 787 aircraft be power cycled every fifty-one days to prevent "several potentially catastrophic failure scenarios".

From the AD:
The FAA is adopting a new airworthiness directive (AD) for all The Boeing Company Model 787–8, 787–9, and 787–10 airplanes. This AD requires repetitive cycling of the airplane electrical power. This AD was prompted by a report that the stale-data monitoring function of the common core system (CCS) may be lost when continuously powered on for 51 days. This could lead to undetected or unannunciated loss of common data network (CDN) message age validation, combined with a CDN switch failure. The FAA is issuing this AD to address the unsafe condition on these products.  
My friend and colleague Doug Young, who knows a thing or two about aircraft and avionics, brought this to my attention. I find fifty-one to be an interesting number, because unlike a lot of this class of bugs, it doesn't map quite so obviously into some kind of power-of-two/number-of-bits/frequency issue.

I tried to talk Doug into it being related to the number of bits (nineteen) in the data field of an ARINC 429 message (A429 being a common avionics bus that he and I have worked with), but even I thought it was a bit of a stretch, since it would require some system clock maintaining a frequency of 10000 ticks per day, causing the data field to overflow after 52.4287 days.

Both Doug and I independently arrived at the possibility of a uint32_t value in units of milliseconds, but that overflows after 49.71 days, a discrepancy that I find makes it unlikely.

As I've mentioned before, I ran into stuff like this all the time in my Bell Labs telecommunications days. Occasionally - alas - it was in code I wrote, definitely making for a learning experience.

Oh, and by the way, just last month HP announced yet another firmware bug in which some of their disk drives stop working after 40,000 hours of operation, also not an obvious power-of-two issue.

We will get fooled again.

Cabin Fever

Gotta wonder if the current COVID-19 craziness has pointed out a big problem in the tiny house movement. Just like those microapartments you read about in ginormous (and expensive) cities like Tokyo and New York did explicitly, the tiny house movement may have implicitly assumed that the residents won't actually spend all that much time in their abodes.

The Spousal Unit and I are really fortunate: a four bedroom tri-level in a semi-rural neighborhood near Denver Colorado. But even then, we get our share of cabin fever. If we were stuck together in a tiny house, we'd be posting photographs of each other holding the daily newspaper just to prove we hadn't killed each other.

(I'm not the only one thinking about this: "Homes Actually Need To Be Practical Now" and "Trapped at Home With People You Met On Craigslist", both in my favorite magazine, The Atlantic.)

Tuesday, March 31, 2020

Habits

You’ve heard the maxim that it’s far far more expensive to acquire a new customer than keep an existing one? Part of this is habit. Humans are for the most part creatures of habit. To keep an existing customer, you just need to keep feeding their habit (e.g. a latte every morning at the local Starbucks drive through), and not give them a reason to change it (e.g. keep botching their order). But to get a new customer, you have to get them to change their habits. You have to convince them to do your thing in place of whatever thing they've been doing. That’s a big deal.

I wonder how many of us are going to have our habits rewired due to all of this Coronavirus craziness. Sure, I used to go to Starbucks around 0600 every morning and read for an hour or two. Maybe when all this is over, I realize I can save a lot of calories and money by just making a cup of coffee at home - like I’ve been doing for the past several weeks... or months.

Yeah, I used to go to the gym six times a week. But maybe I can save a lot of time and effort by staying with doing those core floor exercises on my mat in the living room three times a week, and taking a long walk outside for cardio the other three times.

Maybe I learn an important lesson: I don’t really need Starbucks four dollar lattes, or an expensive gym membership.

Could be a lot of places are going to find out a lot of their regulars aren’t coming back.

Other ideas:

Businesses are slow to rehire, acting conservatively in the face of an uncertain future (H/T to Demian Neidetcher).

Businesses find their old employees have made other plans, so they are forced to hire new people who must be retrained, preventing them from achieving their prior level of service, at least anytime soon.

Businesses don't survive this, forcing us to make other accommodations anyway (H/T to Kelly Dixon).

I think it's going to be a substantially different world at the far end of all of this; not just politically and medically, but professionally and personally.

Saturday, March 21, 2020

Product Specifications

This is a photograph of a piece of paper going into the kind of paper shredder you would buy at your office supply store.

Untitled

This model has two slots: the regular intake slot labelled "6 SHEET MAXIMUM" and the auto-feed slot labeled "75 SHEET MAXIMUM". I am in the process of destroying this shredder - which was brand new just a few days ago - or at least certainly violating its warranty, by exceeding its duty cycle: over the span of a couple of weeks, I'll be reducing several decades of financial paperwork to tiny bits of paper.

It occurred to me, as I spend hours feeding paper into this device, stopping occasionally to let it cool down, clearing jams, lubricating it, and cleaning some components using a procedure for sure not covered in the owner manual, that some folks might not know the real meaning of those "SHEET MAXIMUM" notations. As a professional product developer with more than four decades of experience, let me enlighten you.

Decades ago I worked at a national lab in Boulder Colorado. My boss, who had a Ph.D. in physics, used to say that any claims a supercomputer manufacturer made for the ginormous systems they sold were in effect guarantees that those systems could under no circumstances exceed those specifications. It didn't matter whether they were for instructions per second, floating point operations per second, disk I/Os per second, and so forth.

In the fullness of time, I, as a product developer, came to understand this was much more broadly applicable than just supercomputers. Applicable to both the devices I was trying to use in the products I was helping my clients develop and ship, and to those products themselves. In this latter capacity, I may have played a role in determining a product specification or two. It is even possible you have used some of those products.

Let's focus on the auto feed specification for my paper shredder: "75 SHEET MAXIMUM". What does this really mean?

(Disclaimer: what follows is a completely fictional but probably highly accurate portrayal.)
At some point in time, an engineer, who probably had at least a master's degree in mechanical engineering, took one of these shredders and ran some tests on it. It was likely it was a laboratory mule used for prototyping and early firmware development, many months before the first production model came off the assembly line, since such a machine would have already had to have the manufacturing process established to print this label on the unit. This was done in a climate-controlled clean room environment, carefully cleaned and scrubbed, and devoid of any food or drink. The engineer - and indeed everyone in the lab - wore clean-room "bunny" suits, meticulously free of any foreign matter.
The engineer carefully installed brand new hand-sharpened blades; lovingly hand adjusted all the tolerances of all the moving parts, running bearings far more expensive than what would be used in the actual production unit, and carefully lubricated all the parts using the most expensive lubricant from his lab supplies. 
The engineer used brand new paper, right out of the box, devoid of any toner, perfectly flat, and as thin as was available commercially, but not too thin. The engineer cleaned the paper with aerosol canned air, fanned the stack of paper out to make sure none of the pages stuck together, and checked that each page was momentarily grounded to eliminate any static electricity. 
The engineer carefully started testing the printer, using a pair of calibers and a stainless steel ruler to make sure the stack of paper in the input bin of the shredder was precisely aligned. The engineer tested stack after stack, adding one single page to each test. For each test, the engineer took the temperature of the electric motors in the shredder, making sure they were not overtaxed, allowing the shredder to cool down completely, probably overnight, in between each test. Each morning, the engineer relubricated and realigned all the mechanical bits in the prototype, possibly replacing any parts that showed the most minuscule amounts of wear when viewed under a laboratory microscope costing tens of thousands of dollars. 
When the engineer reached seventy-six pages, the shredder exploded. The shrapnel killed everyone in the room. The shredding blades were especially lethal. 
The engineer's supervisor, who had an MBA, came into the room, read the final number off the engineer's laboratory notebook, and notified manufacturing what the limit was: seventy-five sheets.
"75 SHEET MAXIMUM" means seventy-five is the limit under the most optimal, controlled conditions that are humanly possible in the most optimistic of all conceivable circumstances.

In my basement, feeding ancient reports that had been folded, spindled, and mutilated, having sat all that time in a filing cabinet, with no climate control other than the nearby furnace which runs from time to time, reports from which I had just violently removed the staples I had inserted years ago, and having one of our beloved feline overlords supervising my efforts, I find that seven sheets sometimes works. And sometimes not.

IMG_3914

That's what a product specification means.

Monday, March 16, 2020

Placer: X Macros, SQLite, and Schema

In every learning project, I try to have at least two goals. In Placer, I wanted to re-learn how to use SQLite, and to remember how to make use of X macros.

SQLite - pronounced S Q L -ite, like a mineral - is the server-less library-based embeddable relational database system that supports the Structured Query Language or SQL. It is likely running inside a mobile device near you. It ships inside every iOS device, every Android device, most web browsers, and (rumor has it) Windows 10. It has been said that SQLite is the most widely deployed database system in the world, a claim I find plausible.

X macros are perhaps the most egregious use of the C preprocessor outside of the International Obfuscated C Code Contest. This isn't my first rodeo with SQLite, nor with X macros, nor with the application of X macros to using SQLite, having seen other developers who were a lot smarter than me do this kind of thing in a large embedded telecommunications project about thirteen years ago.
Disclaimer: I'm not a database person, even though I took a database class that covered (even then) relational databases, more than forty years ago. It doesn't seem to have changed that much.
SQLite is a relational database. So, conceptually anyway, a relational database organizes data in one or more tables. Each table contains rows. Each row contains different instances of the same collection of fields. One of the fields is identified as a primary key, a field that contains a unique value identifying a specific row in the table. The database system can optimize its indexing such that retrievals by the primary key can access a specific row very quickly, without scanning every row in the table. In this context, the term schema is used to describe the design and format of a relational database table.

In the Placer repo, the C header file SchemaPath.h contains the following. (You can click on any of these images to see a larger version.)

IncludeSchemaPath

As you might guess, this code - somehow - describes the schema of a database table called Path. Every row in Path contains sixteen fields, named path, type, nlink, and so forth. The terms TEXT, TEXT16, INTEGER64, INTEGER, and BLOB, determine the type of the data that will be contained in the field. It seems likely that the field path is the primary key for the table. Some of the fields seem to be arrays, since they have an additional parameter that indicates a size.

So what are these operators, PLACER_SCHEMA, PLACER_INTEGER, and so forth? They look like invocations of C preprocessor macros. Except the macros aren't defined here. Which is a good thing, because this header file doesn't have any #if #endif guards to keep the contents of this header file from being seen more than once. Because we are going to include it more than once. We're going to include it a lot.

In the functional test program survey.c, the following snippet of code can be found.

SurveyPlacerStructureDefinitionUsage

Now you're guessing that com/diag/placer/placer_structure_definition.h defines these macros. I'll cut to the chase and tell you that com/diag/placer/placer_end.h undefines them just to keep the prior definitions from bleeding into later code.

When the C preprocessor includes these files and processes them, the following code is generated.

SurveyPlacerStructureDefinition

That's because the file com/diag/placer/placer_structure_definition.h contains the following preprocessor code.

IncludePlacerStructureDefinition

That seems pretty straightforward. The macros create a C structure definition that can be used to create an object that can store the data found in a single row in the Path table.

In a function in survey.c that adds a new row to the Path table, the following code snippet can be found.

SurveyPlacerSqlInsertUsage

When these header files are read and processed, the following C code is generated.

SurveyPlacerSqlInsert

That's because the header file com/diag/placer/placer_sql_insert.h contains a different set of definitions for the same schema macros.

IncludePlacerSqlInsert

This generates an SQL INSERT command will direct SQLite to insert a new row of data into the table Path. But where's the actual data? SQLite allows the application to bind the actual data to this command by identifying each field not by its field name, but by a one-based index number. But we need a set of calls to the type-appropriate SQLite bind functions to do this.

By now, it will come as no surprise that

SurveyPlacerStmtBindUsage

generates the following code (keeping in mind that code generated by the C preprocessor is not meant for humans to grok, but is easily understood by the C compiler)

SurveyPlacerStmtBind

because the header file com/diag/placer/placer_stmt_bind.h once again redefines the schema macros

IncludePlacerStmtBind

doing the exact right thing for each individual field according to both its position in the INSERT statement and its data type.

All of this code is generated automatically by the preprocessor. All survey.c has to do is define the schema in one header file, which it calls SchemaPath.h, and then include it along with the appropriate header files containing C macro definitions.

In the end, given an automatically generated C structure, an automatically generated SQL command, and an automatically generated bind function, survey.c just does this to add a new row into the Path table. (In the snippet below, schema is the name of an object of type struct Path.)

SurveyBind

The symbol placer_generic_steps_callback above is the name of a callback function that handles the normal housekeeping of inserting a new row into any table of any database. But suppose you need to do something more complicated, like read a row, or a collection of rows from table Path?

In the snippet below, the code is doing an SQL SELECT from Path for zero to no more than eight rows whose value of ino (a field that turns out to be the unique inode number for a particular file in the file system whose metadata is held in the database) matches that of the variable ino that was read from the command line. We just use SQLite's bind function for integers to bind the value of the variable ino to the parameterized SELECT statement.

SurveyPlacerStepsCallback

SQLite will read rows from the Path table in the database and provide them one at a time to the callback function placer_struct_Path_steps_callback, and then the function placer_struct_Path_display is used to display them on a FILE stream.

You know where this is going.

SurveyPlacerStepsCallbackUsage

generates

SurveyPlacerPathStepsCallback

using the macros defined in com/diag/placer/placer_steps_callback.h (not all macro definitions shown)

IncludePlacerStepsCallback

to load successive rows from the table into successive instances of the Path structure.

Similarly

SurveyPlacerDisplayUsage

generates

SurveyPlacerPathDisplay

using the macros defined in com/diag/placer/placer_structure_display.h (ditto)

IncludePlacerStructureDisplay

to display each field in a selected row appropriately.

It's just that simple.

No, really. All the heavy lifting is done by the header files that are all predefined (and tested). The x macros in Placer make the transition between the C structure and the table row as delivered to and from SQLite pretty straightforward.

As I said before, I didn't come up with this technique originally. The folks I learned it from long ago may have been the first to apply it to SQLite, I don't know. But this application of C preprocessor macros - in a manner that is just the opposite of how most developers think of using them - had already been around for many years.

The com-diag-placer repository of implementation, unit tests, and functional tests, with documentation generated using Doxygen, is available on GitHub. It is licensed under the GNU LGPL 2.1.

Monday, March 09, 2020

When Learning By Doing Goes To Eleven

There are different modes of learning, and each modality works best for different people. I am always envious of my friends and colleagues who can learn by reading a book. That never worked for me. I can only learn by doing, albeit sometimes with a book or web page propped up in front of me. This explains why I have over thirty repositories on GitHub. And why I have a GPS-disclipined stratum-0 NTP server with a cesium chip-scale atomic clock in my living room.

Astrolabe (O-2)

Some of my repos on GitHub are standalone projects, like the stratum-0 server or my learning experiences with Go and Rust. But some of them follow a pattern: my project implements a Linux/GNU-based framework implemented in C as a library and a set of header files, then some unit tests, then some functional tests, then maybe some tools I found useful. Why C? Because for the past few decades, most of my income has come from working on embedded and real-time systems, close to bare metal, in C and sometimes C++. C and C++ have been very very good to me. Although I do admit to some hacking now and then in JavaPython, and occasionally even in JavaScript (although I couldn't write a line of JavaScript from scratch even if my life depended on it).

While more or less wrapping up my most recent C-based learning project, it occurred to me that maybe there was an overall pattern to these C frameworks. I had incorporated decades of experience working in ginormous C and C++ code bases into their architecture and the design of their APIs, being careful to make sure they could be used together. I started to wonder if I was subconsciously working towards some greater goal. So I decided to list these frameworks here, all in one place, for the first time, mostly for my own benefit, in the hope the bigger picture would reveal itself.

So here they are, in no particular order. Some of these projects have been around long enough that I have ported them through four different source code control systems. All of them have been cloned, built, tested, and used on
  • x86_64 i7-7567U,
  • x86_64 i7-5557U, and
  • ARMv7 BCM2835
targets and with
  • Ubuntu 18.04 "bionic",
  • Ubuntu 19.04 "disco", and
  • Raspbian 10 "buster"
platforms, among others (but be sure to read the README files for each repo to see what issues there may be with specific platforms). All of them are licensed by default under version 2.1 of the Lesser GNU Public License (LGPL). Most have associated with them a list of articles I have written about it here in this same blog, and all have - seriously - a playlist of music videos.

Hazer

Description: Parse NMEA and other typical output from GNSS devices.
Inception: 2017
Repository: https://github.com/coverclock/com-diag-hazer
Playlist: https://www.youtube.com/playlist?list=PLd7Yo1333iA9FMHIG_VmuBCVTQUzB-BZF

Hazer started out as a way for me to learn the National Marine Equipment Association (NMEA) standard used to describe the output of virtually all GPS, and later GNSS, devices. I had encountered NMEA with GPS in many embedded projects over the years, but had never dealt with it from scratch. Hazer was so useful, it evolved into a tool to test GPS/GNSS devices, then to implement other more specific projects like a moving map display and vehicle tracker when integrated with Google Earth, GPS-disciplined NTP servers, and most recently a differential GNSS system with a fixed base station and a mobile rover.

Articles:
Assay

Description: Parse INI-format configuration files using bison and flex
Inception: 2015
Playlist: https://www.youtube.com/playlist?list=PLd7Yo1333iA-YIyldvOB56QS-HZed_4g1

Having written parsers and lexical scanners when I was in graduate school (one in Prolog, if you can believe it), long before I ever had access to a UNIX system, I was naturally interested in learning how to use yacc and lex. In some of my commercial work I routinely encountered INI-format configuration files. Those seemed to be generally useful things, with a format that was easier for humans to grok than the XML or JSON files I routinely dealt with too. So I implemented an INI file parser using the GNU counterparts bison and flex, using a syntax that was more or less based on the countless INI implementation I found in the field. The parser includes the ability to run a process and collect its output as the value of the property, which makes for some entertaining capabilities.

Articles:
Codex

Description: Provide a slightly simpler C API to OpenSSL and BoringSSL.
Inception: 2018

I had never written C code to directly use the various libraries that each implement the Secure Socket Layer (SSL). And I knew I was in for a lengthy learning experience, as I learned about all the stuff that came along with it: keys and their generation, certificate verification and revocation, encryption algorithms and methods, and so forth. I was fortunate to have a native guide: my old office mate from my Bell Labs days was deeply into all of this, and was generous to a fault with his time and expertise. In the end I not only learned the basics of effectively using OpenSSL and its variants, but also came up with a slightly simplified API that I was confident that I could use in other applications.

Articles:
Placer

Description: Automate C-based schema generation for the SQLite 3 RDBMS.
Inception: 2020

My most recent project entailed my getting back into SQLite, the embeddable server-less SQL relational database management system. SQLite is used in every iOS device, every Android device, and (rumor has it) Windows 10. It is said to be the most widely deployed DBMS in the world, and it seems likely that this is true. Many Linux-based embedded products I've worked on in the past used SQLite to manage persistent data. But I'd never coded up an SQLite application myself from scratch. So: yet another learning experience. I always try to accomplish more than one goal with every project; in this one, it was to leverage x-macros - perhaps the most twisted use of the C preprocessor outside of the International Obfuscated C Code Contest - to automate the generation of C structures, functions, and variables to implement user-defined schemas for database tables. I still have a lot to learn about SQLite, but there is a lot of useful knowledge encapsulated in the library, unit tests, and functional tests for this project.

Articles: (none yet)

Diminuto

Description: Implement commonly useful C systems programming capabilities.
Inception: 2008

All of the other projects in this list rely on Diminuto to provide common underlying facilities. Diminuto started out as a project to build a minimal Linux 2.4 system on top of an ARMv4 processor. Over time, the framework and library became far more important than the original goal. Portions of Diminuto are now shipping in at least four different products manufactured by various clients (whether they realize it or not). When I implement a feature in any C-based project, I pause and consider whether it might be more generally useful. If I decide that it is, I implement it in Diminuto instead of the root project. Diminuto has thus grown organically to include a bunch of useful stuff, only a small portion of which is listed here.
  • a simple mechanism to debounce digital I/O pins;
  • a demonization function;
  • an API to handle time and data stamps, time and duration measurement, and time delays;
  • functions to expand and collapse C-style escape sequences in strings;
  • a socket API that supports outgoing and incoming IPv4 and IPv6 connections;
  • a logging API that writes to the system log if the caller is a daemon, or to standard error if not;
  • a simplified API for socket and file descriptor multiplexing;
  • an API to configure serial ports;
  • a traffic shaping API based on the virtual scheduling algorithm;
  • a red-black tree implementation;
  • a simple unit test framework;
  • a file system walker.
Articles:
Update (2020-03-11)

If you install the doxygen, TeX, and LaTeX packages documented in the Makefile comments for the documentation make target for each of these projects, you can generate HTML and PDF documentation for each library via make documentation readme manuals. The build artifacts will be in subdirectories under out/host/doc in each build directory (with host being replaced with whatever you used for TARGET if you changed it).

Saturday, March 07, 2020

Considerations for Ethical Decision Making

JJ Snow, U. S. Air Force CTO, passed this along: considerations for ethical decision making for the U.S. Special Operations Forces from the Joint Special Operations University. I think it is a lot more broadly applicable than just to the SOF community: business, politics, governance, academia, science, etc. I'd argue that these are important guidelines for anyone who manages people in today's organizational environments. Substitute manager for SOF operator and management for SOF and see what you think. I found it made for thought provoking reading. (Cut and pasted from LinkedIn.)

She writes:
What should guide us in tough times? Critical thinking, ethics, compassion and morality.  
Joint Special Operations University proposed six SOF ethical decision-making truths: 
1. Individual moral character is neither inherent nor fixed. Ethical decision-making requires continuing education for even the most experienced SOF operators.   
2. SOF operators will be morally challenged when they are least prepared to deal with it. Ethical problem-solving skills must be developed and strengthened.  
3. SOF ethical decision-making must be developed with honest and frank consideration for the harsh realities of SOF environments and operational requirements. SOF units must see the world for the way it is, not for how they might want it to be.  
4. Binary ethical codes do not provide sufficient guidance in SOF environments. In fact, strict adherence to binary ethical codes can be harmful in some SOF environments.  
5. SOF leaders should not be na├»ve or insensitive to human behavior and must recognize that people are not as ethical as they think they are. SOF operators need training to close the gap between the expectation and reality of what they must do.  
6. SOF culture must become an environment where conversations about ethical decisions, good and bad, are a natural occurrence.


Tuesday, January 28, 2020

Frames of Reference III

Light travels about a foot in a nanosecond. That's a very rough approximation. But it makes it easy for users of the old Imperial system of measurements to grasp this idea.

I look across the table at my lovely wife of nearly thirty-six years, Mrs. Overclock. I think I'm seeing her in the present. But I'm not. She's about three feet away from me, so I'm really seeing her as she was three nanoseconds in the past.

Behind Mrs. Overclock, I see the far wall. It's about four feet behind her, or about seven feet from me. I'm seeing the far wall as it existed seven nanoseconds in the past.

Through the window in the far wall I see the Rocky Mountains in the distance. The closest mountain I can see, Table Mountain near Golden Colorado, is about three miles west of us, or in the neighborhood of 16,000 feet away. So I'm seeing a version of Table Mountain that's 16,000 nanoseconds, or sixteen microseconds, in the past.

In the night sky, I can see the Moon above the mountains. The Moon is an average of about 240,000 miles away. That's more than 1.2 billion feet. So I'm not seeing the Moon as it is in the present - which by now you've figured out is a highly abstract concept. I'm seeing the Moon as it was more than 1.2 seconds ago. It's even possible it no longer exists in this present, and the shock wave and debris field just haven't arrived yet. Unlikely. But possible.

Astronomers have recently speculated that the distant star Betelgeuse is about to end its life by going nova. Betelgeuse is seven hundred light years away. So when we observe Betelgeuse, we are not seeing it as it is today, but instead are looking seven hundred years into the past. Whatever happened to Betelgeuse, it happened a long long time ago, and the light from that event is just arriving here. This kind of time shift is true for all of the stars we see in the nighttime sky to varying degrees depending on their distance from us.

So the present that I have fooled myself into believing I see is not only not the present, it's not even the same time in the past. It's a mural of perceptions taken from a broad ensemble of times in the past. This is why Albert Einstein asserted that there was no such thing as absolute simultaneity in his Theory of Special Relativity. And none of this takes into account the latency of our nervous system, or the processing time in our brains.

Real-time systems work like this too, perceiving at best an approximation of the reality around them.

See Also

Chip Overclock, "Frames of Reference", 2018-03-14

Chip Overclock, "Frames of Reference II", 2019-04-19

Tuesday, January 14, 2020

Backing Up Raspberry Pis

It happened gradually. I built a GPS-disciplined clock/NTP server using a Raspberry Pi, a two-line LCD display, and a GPS receiver. Then I built another one incorporating a chip-scale cesium atomic clock. Then a WWVB clock. Then another GPS clock using a simpler USB-attached GPS dongle. Then an NTP/GPS monitoring station. Then a differential GPS base station. Then a web server.

Somehow, over the past few years, I have ended up with ten Raspberry Pi single board computers scattered around the Palatial Overclock Estate, all running 24x7, most on uninterruptible power supplies, almost all headless. And that doesn't count the ones in the basement that I get out from time to time for ongoing projects, like my IPv6 testbed or my differential GPS rover.

Having lost one of them to a failed micro-SD card, the Pi's boot media, it occurred to me that maybe I should start thinking about a way to back them up. You would think this was a solved problem. But not so much.

The Wrong Solution

The most common mechanism - and the one I had been using - was to simply to copy the entire raw disk image on the micro-SD card to a handy backup disk as a flat file using the Linux dd command. Then, in the unlikely event that the need arises, restore it by doing a reverse image copy to an unused micro-SD card using a similar dd command.

But as common as this approach seems to be on the Salmon of Knowledge, it turns out to be uglier than it sounds. Even though those micro-SD cards from SanDisk and Samsung both claim to contain sixteen gigabytes, they aren't actually identical, differing very slightly in terms of the actual number of logical blocks they contain. You can dd a smaller image to a larger card, but obviously not vice versa. This is all beside the fact that a full raw disk image copy takes a long time.

I addressed this in the past in two ways.

First: restoring, for example, a sixteen gigabyte image to a thirty-two gigabyte card. Sounds simple, but it's not scalable: when I make changes to the system, I have to make another image copy. This one will be thirty-two gigabytes. So if I have to restore it, it will be to a sixty-four gigabyte card, even though I was wasting most of that thirty-two gigabyte card; the image backup includes all of the blocks on the card I was backing up whether they were used or not.

Second: I went on Amazon.com and ordered some micro-SD cards identical in brand, model, and capacity to the one I was backing up. Also a simple solution, as long as I can find those cards, but it ignores the growing collection I have of unused micro-SD cards.

The Right Solution

Thanks (as usual) to the web site Stack Exchange, and specifically the user goldilocks from which I drew inspiration, I have written a set of shell scripts that use the rsync utility to make a file-by-file incremental backup of a Raspberry Pi. And to format an unused micro-SD card of any suitable capacity and restore those files to it, recreating the boot media. Which I've tested.

The first backup takes a long time because it is a full backup. Subsequent backups - done after I make a change - take almost no time at all. The approach I took was to run the backup on the Pi itself a.k.a. online (rsync can also be run remotely, but I elected not to do that). I bought a one terabyte solid state disk (SSD) with a USB interface. I hook it up to the Pi while it is running, mount the SSD, and run the backup script. Formatting and restoring to an unused micro-SD card is done on one of my desktop Linux servers a.k.a. offline.

The Gory Details

These scripts can be found here: https://github.com/coverclock/com-diag-bin . They are licensed under the GPL. When I install these scripts on a system, I typically clone the repository, then create soft links from a bin directory to the scripts I need in the repo directory. During that process I drop the .sh suffix from the script name. So in the repo the script is called pilocalbackup.sh but in the bin directory it is just pilocalbackup.

In the examples below, /dev/sdx is a stand-in for the device name of the micro-SD card from the Raspberry Pi, and /dev/sdy is a stand-in for the device name of the backup SSD. Your mileage may vary. The standard Raspbian micro-SD card will have two partitions, the boot partition in /dev/sdx1 and the root partition in /dev/sdx2. There are files in both partitions that have hard coded references to these partition numbers.

I always assume I'm restoring to a two partition Raspbian card, but I can backup from a Noobs card for which Raspbian was selected at install time; A Noobs card will have something like seven partitions, five of which are unused in the Raspbian instance. The conversion from Noobs to Raspbian requires some editing of files in both the boot and root partitions of the card to change the boot partition to 1 and the root partition to 2If you are restoring a micro-SD card to create a duplicate of an existing system, you might also want to change the hostname and IP address of the Raspberry Pi that you are restoring. This is shown below.

In the examples below, the name framistat is a stand-in for the host name of the Raspberry Pi which is used by default by the backup script. The name doodad is a stand-in for the release name - e.g. jessie for 8.x, stretch for 9.x, buster for 10.x - of the Raspbian version I am dealing with; the partitioning of the micro-SD card differs slightly from release to release. You can find the Raspbian (based on Debian) release version number in the file /etc/debian_version.

The mount point names I use below are purely my own personal convention. I use /mnt for the backup SSD, /mnt1 for the boot partition (1) on the micro-SD card, and /mnt2 for the root partition (2) on the micro-SD card. The backup script backs up both partitions, and the restore script restores both partitions.

All of the scripts necessarily have sudo commands embedded in them, so I encourage you to inspect them carefully; sudo is only used when necessary. 

The README.md for the repo has a bunch of examples, including dealing with full raw image files. Below I'll show just a few germane cases. The repo also has a lot of other unrelated but useful (to me, anyway) scripts.

Determine the Raspbian Release from Various Media

cat /etc/debian_version # This is online on the Pi itself.

or

sudo mount /dev/sdx2 /mnt2
cat /mnt2/etc/debian_version # This is an offline Pi micro-SD card.
sudo umount /mnt2

or

sudo mount /dev/sdy1 /mnt
cat /mnt/pi/framistat/etc/debian_version # This is on the backup.
sudo umount /mnt

Backup Local Files Online On A Pi Using rsync

sudo mount /dev/sdy1 /mnt
pilocalbackup /mnt/pi/framistat # This is the default.
sudo umount /mnt

or

sudo mount /dev/sdy1 /mnt
pilocalbackup # This uses the default.
sudo umount /mnt

Restore Files Offline To An Unused Micro-SD Card Using rsync

piimageformat /dev/sdx doodad
sudo mount /dev/sdx1 /mnt1 # This is the boot partition.
sudo mount /dev/sdx2 /mnt2 # This is the root partition.
sudo mount /dev/sdy1 /mnt # This is the backup drive.
pilocalrestore /mnt/pi/framistat /mnt1 /mnt2
sudo umount /mnt /mnt1 /mnt2

Check, Verify, And Repair a Raspberry Pi Image Offline

piimagecheck /dev/sdx /dev/sdx1 /dev/sdx2 # These are the defaults.

or

piimagecheck /dev/sdx # This uses the defaults.

Customize a Raspberry Pi Image Offline After Restoring

sudo mount /dev/sdx1 /mnt1
sudo vi /mnt1/cmdline.txt # Change the boot partition.
sudo umount /mnt1

sudo mount /dev/sdx2 /mnt2
sudo vi /mnt2/etc/fstab # Change the / root and /boot partitions.
sudo vi /mnt2/etc/dhcpcd.conf # Change the static IP address.
sudo vi /mnt2/etc/hostname # Change the host name.
sudo vi /mnt2/etc/hosts # Change the host name resolution.
sudo umount /mnt2