Friday, March 30, 2012

Small Town Big City

As far back as 1982 I was teaching a senior and graduate level computer engineering course at Wright State University called "Real-Time Software Design". The students wrote code in assembler language and tested their projects on one of two PDP-11/05 minicomputers. Both systems were equipped with a teletype terminal, a line printer, two floppy disk drives, and the dreaded card reader. To pass the course, students wrote a spooler program that read cards, spooled the data to disk, read it off, and printed it. To achieve the necessary performance the software had to be multi-threaded and interrupt driven. The course was required for seniors to graduate and for graduate students to proceed to their upper level courses. It was much hated. And much loved.

Things change. By 1996 I was writing embedded and real-time software for commercial products in the Denver Colorado area for one Bell Labs spin off or another. Because of that organization's long history of inventing stuff that we now take for granted but which are the backbone of the Internet, like UNIX, C, and C++, my colleagues and I were developing low level software in C++ when there were still assembler programmers complaining that C wasn't good enough. C was plenty good. C++ is even better. (And I've argued in this blog that Java may be good enough for many embedded applications.)

Fast forward to 2005 when I began to passionately and aggressively purse two goals.

  • Find an economical hardware and software platform on which to teach embedded and real-time software development. It must support a gradual learning curve, accomodating beginners needing a gentle introduction, but also experienced developers who need to learn the advanced concepts used in bleeding edge commercial development environments.
  • Demonstrate that the advanced object-oriented software techniques, idioms, patterns, and architectures that I have seen evolve for embedded software development over the past several decades can be applied to this small and inexpensive platform.

This has resulted in a number of projects that long-time readers of this blog (if there should be any) will recognize.

Desperado was my effort to capture many of the useful C++ architecture, design, and implementation patterns and idioms I encountered over my years of commercial product development. (Although no longer under active development, I have used Desperado as a foundation for other projects, like Hayloft, my C++ library for Amazon Web Services' Simple Storage Service. And it continues to serve as an inspiration and a working example for much of my subsequent paid C++ embedded development.)

Diminuto and Arroyo were my attempt to build a teaching environment on a commercially available evaluation board using an Atmel AT91RM9200 system on a chip with an ARM9 processor, a minimally configured Linux kernel, and a small set of utilities including BusyBox. (Diminuto turned out to have useful collateral in the form of a C-based systems programming library for Linux that I have leveraged in a number of other projects for paying clients.)

Arroyo/Diminuto with Three "Disk" Drives

Cascada reflected my decision to move this effort to a more capable yet less expensive C4 BeagleBoard with a Texas Instruments OMAP system on a chip with an ARM Cortex-A8 processor, a Zippy2 expansion board, and a standard Linux distribution.

BeagleBoard C4 and Zippy2 Expansion Card

Contraption had me upgrading to a xM BeagleBoard and moving to a standard Android distribution, which had the added benefit of a bundled development environment and support for Java, with a GNU environment co-hosted along side it.

BeagleBoard xM Rev C

Horsefly was where I tried using a commercial toy, the Parrot AR.drone quad-rotor helicopter, which featured a custom ARM processor, as a platform to illustrate how to reverse engineer an existing system.

Saleae Logic Analyzer and AR.drone

Although I would call all of these projects successful, none of them really satisfied my goals. The hardware was too expensive. The software was too complex for beginners. They were too closed. Or they presented a target that was moving too quickly in the market place.

Amigo is my latest attempt, using Arduino boards. These inexpensive platforms, available all over the web or even at your local Radio Shack, were originally built for teaching just the kinds of things I'm interested in. They have also proven immensely popular with hobbyists, do-it-yourselfers, and the Maker subculture, resulting in a huge hardware and software ecosystem growing up around them. They use low power but extremely capable Atmel megaAVR eight-bit microcontollers. The run-time software is already C++ based, and is simple enough that you can completely understand everything that is going on right from power up to running your application. The Arduino software distribution includes an easy-to-use integrated development environment (IDE) that makes it simple for beginners to get embedded projects up and running quickly.

Arduino IDE on the Mac


But can you start with Arduino and its small town ways but graduate to the big city of learning about commercial product development? Can you use Eclipse? Build using make? Use source control control? Unit testing? Can you do multi-tasking? Write interrupt service routines? Do systems programming in C++? Learn about code reuse across product lines?

Could you use the open source FreeRTOS multi-tasking real-time operating system, write an interrupt-driven object-oriented device driver in C++, build it all with a makefile, maintain the code base in a Subversion repository, and run it on an inexpensive but highly versatile Arduino-compatible board?

Can you do all of this so cheaply that a student could download the software for nothing off the web to their Windows, Linux, or Mac laptop? That they could conceivably buy a board at their college bookstore and keep it to use in a series of courses ranging from beginner to advanced?

Remarkably: yes.

The shirt-pocket sized Freetronics EtherMega 2560 board uses the same Atmel ATmega2560 microcontroller as the Arduino Mega 2560 board. It includes a USB-to-serial interface to one serial port (and three more serial ports just waiting to be hooked up), an Ethernet controller with an RJ45 jack, a microSD card slot, and a big batch of general purpose I/O (GPIO) pins. It can talk to the Arduino IDE and run all the standard Arduino software.

Freetronics EtherMega2560 and Atmel AVRISP mkII

But this EtherMega isn't running Arduino. It's running FreeRTOS with a couple of concurrent tasks. Its interrupt-driven serial device device driver, and indeed all of the software stack sitting on top of FreeRTOS, was written by me, in C++. I built it via make using the GNU tool chain from my code base maintained in Subversion. I developed it using Eclipse running on my desktop Mac. A tarball of my entire code base, including all of the FreeRTOS code, is available via a link on my Amigo project page.

You have to love the global scope of this. Arduino originated in Italy with founders Massimo Banzi and David Cuartielles . The Freetronics board was designed by Marc Alexander and Jonathan Oxer in Australia. FreeRTOS was developed by Richard Barry and Real Time Engineers, Limited in the United Kingdom. The FreeRTOS port to the EtherMega was done by Phillip Stevens in New Zealand. Arduino, its clones, components, and accessories, are manufactured, well, everywhere, including Italy and the United States.

I can't remember when I've been this excited about one of my projects. I'm not kidding, my hands are shaking at the thought of it. The potential of this as a pedagogical tool is fraking huge. If you want to learn, or even better, to teach, embedded and real-time software development, it is time to quit making excuses. It hasn't ever been this easy, this accessible, or this inexpensive. Start now. If I can help you, I will.

Update (2012-05-17)

I did most of the development of Amigo, my noodlings with FreeRTOS, C++, and the AVR eight-bit microcontrollers, on the Freetronics EtherMega 2560. Here's a more recent photograph of the EtherMega with a test fixture wired up to it that the substantial Amigo unit test suite uses to exercise its various features, including its interrupt-driven USART, SPI, and ADC device drivers all written in C++. The unit test suite includes tests of the networking capabilities of the EtherMega to act as a client or as a sever.

Freetronics EtherMega 2560

The EtherMega is, from a software point of view, equivalent to an Arduino Mega board with an Arduino Ethernet shield mated to it. It uses the same ATmega2560 microcontroller as the Mega, and the same WIZnet networking chip as the Ethernet shield. I had an Arduino Mega ADK, a version of the Mega designed to work with the Android Developer's Kit and interface to Android devices. And I had an Ethernet shield. So it made sense to test the same unit test suite on that configuration. I did. It worked. With no changes to the nearly three thousand line C++ unit test main program or anything else (in fact I didn't even recompile). Here's a photo of that hardware with the same test fixture wired up to it. (Yes, that Ethernet shield is pin-aligned correctly with the Mega ADK's headers despite having a couple of pins hanging out in space.)

Arduino Mega ADK with Ethernet Shield

Of course I had to try to get at least a basic multitasking configuration working on the far more resource constrained ATmega328p microcontroller used on boards like the Arduino Uno, and the Freetronics EtherTen. The EtherTen is equivalent to the Uno with an Ethernet shield. I got it working, but it wasn't pretty. The 328p has one-eighth the flash memory and one-fourth the SRAM compared to the 2560. I ran the same Amigo software but with a radically lobotomized unit test program, enough to verify that I was multitasking and that the device drivers were working. The issue was tuning the stack sizes for each task and the heap that FreeRTOS requires so that each task could run without blowing its stack, and still leave room for the variables used by the unit test suite. This would have been an issue in C as well as C++. Here's a photo of that board without any test fixture wiring but running Amigo. This particular EtherTen has the optional Power over Ethernet (PoE) module attached to it.

Freetronics EtherTen

So while the 328p might be usable for a very limited application using FreeRTOS, I can't recommend using it as a learning or teaching platform for multitasking. You'll spend all your time (as I did) trying to find that memory allocation sweet spot.

Update (2012-09-28)

I've written before here about using Android as a platform on targets like the BeagleBoard. Recently I did some work on the Hardkernel ODROID-A4 as part of my Conestoga project. The ORDOID-A4 is Samsung's recommended product development platform for their Android-based Galaxy smartphones. The ODROID is not cheap, and architecture-wise it's extremely complex compared to the Arduino targets. But if your goal is to develop below the Java layer for Android smartphones, it and the other ODROID models are an interesting choice. Here's a photo of my ODROID-A4 plugged into its debug board, when an accessory I/O board sitting alone just above it. You'll also notice a lovingly handcrafted universal reset tool (a bent paper clip).

ODROID-A4 with Reset Tool and Debug Board

At the opposite end of the spectrum, I've been working on a gig recently in which I've been developing firmware for a line of commercial products based on the Microchip Technology PIC16F1823 microcontroller. This tiny beast, yet another Harvard architecture, has 2048 words of flash for executable code and a mere 94 bytes of RAM. That's a little tight for a teaching platform: even though I write in C, I have to carefully consider the resource tradeoffs of every single line of code and variable. Legitimate pedagogical issues for beginning embedded developers get squeezed out pretty quickly. It turns out that important concepts like abstraction and dependency injection are not free when a single byte takes up more then 1% of the entire available memory. But it's remarkable how complex your software can be within constraints as tight as these. One of my colleagues taunts me as he writes code for his larger microcontroller: "I think I'll have a variable. It'll be used for just one thing. I think I'll make it four bytes. It doesn't need to be four bytes. But what the heck."


There has never been a better time to learn how to develop software close to bare metal. Moore's Law works to your advantage when you're scaling down just as it does when you're scaling up.

Update (2013-11-18)

I needed an ARM-based board that could host Linux to test the latest version of Diminuto, my C-based Linux systems programming tool kit that has found its way into a number of commercial products. Instead of pulling one of my BeagleBoards out of storage for this project, code-named Cobbler, I instead spent a (very) few bucks and bought a Raspberry Pi. This credit-card sized board features a Broadcom BCM2835 system-on-a-chip with an ARM11 core running at 700MHz and hosts a version of the Debian distribution featuring the Linux 3.6.11 kernel.

Here's a photo of my lab bench setup where I am testing the general purpose input/output (GPIO) handling functions in my tool kit. The tiny RPi is festooned with cables (clockwise from the top): HDMI to my lab monitor, USB power, a ribbon cable to the breadboard, USB keyboard and mouse, and Ethernet. The breadboard has an FTDI Friend providing USB to logic-level serial access to the processor console port.

Raspberry Pi in Diminuto Test Fixture

I still recommend the Arduino if your goal is teaching very low-level device handling; Linux places too many layers of abstraction between your code and the hardware to make a good teaching platform for beginners. But once students have outgrown the Arduino, the RPi is an interesting next step.

Update (2014-11-08)

Because so much of my paid work is on ARM-based processors of one kind or another, it pays for me to have an ARM-based reference platform. You can see this strategy in play going all the way back to the earliest platform discussed in this article. Here's the latest one: an Nvidia Jetson TK1 evaluation board, for a project code-named Stampede. The round device in the middle is the integrated cooling fan on top of the Nvidia Tegra TK1 SoC, which has four ARM Cortex-A15 cores, a fifth low-power Cortex-A15 core, and 192 graphical processing units (GPUs).

Figuring Out the GPIO Mapping

The fact that you can buy this much horsepower for only USD$192 is amazing to me. At that price, the GPUs are merely a nice to have and don't represent my core interest in the board. The ARM Cortex-A15 implements the 32-bit ARMv7 architecture. When Nvidia inevitably comes out with one of these boards using their new Denver processor, which has two ARM Cortex-A57 cores implementing the 64-bit ARMv8 architecture, I'll be in line for one of those too.

(A Google Nexus 9 tablet using the Nvidia Denver processor is already winging its way to me. I have three Google Nexus 7 tablets and a Google Nexus 5 phone, all 32-bit, that I use for development on the Android platform.)

Wednesday, March 07, 2012

Getting from Power Up to Running Main in Arduino

While trying to bring up FreeRTOS on a Freetronics EtherMega board (all the "Free" prefaces here being coincidental) as part of my Amigo project, I spent more hours than I care to admit figuring out how Arduino gets from power up to executing my software, a long and involved process whose explanation I found to be widely distributed and not easily forthcoming. This article details that journey.

I'll warn you right now: this is long and rambling, covers a lot of territory, and is fairly technical. Everything I'll be describing in this article relates to the EtherMega board and its ATmega2560 microcontroller, but is equally applicable, differing in only very minor details, to other Arduino boards and their eight-bit Atmel microcontrollers. Much of it applies even to non-Arduino uses of those microcontrollers. In fact, much of what I describe about the tool chain applies to any platform using GNU compilers, including Mac, Linux, and Windows.

The EtherMega Board

The EtherMega is a Freetronics clone of the standard Arduino Mega board, an Arduino board that use the ATmega2560 microcontroller. The ATmega2560 has significantly more resources than the ATmega328P used on the smaller, cheaper, and more common Arduino Uno board. The 2560 has 256 kilobytes (KB) of persistent flash memory compared to the thirty-two KB in the 328, eight KB of non-persistent static random access memory (SRAM) compared to two KB, and four KB of persistent electrically erasable read-only memory (EEPROM) compared to one KB. These additional resources make multitasking much more palatable since there is much more SRAM to hold the context, including the stack, for each task; although FreeRTOS has been ported to the 328, the restrictions on stack space, a resource that is especially critical if you are coding in C or C++, make it a little problematic for my purposes.

Here is a photograph of the EtherMega with a universal serial bus (USB) cable on the left connecting it to my desktop Mac, and a ribbon cable connecting it to an Atmel AVRISP mkII device that is off screen. More on that in a bit. In case the scale isn't apparent, this board is about four inches or ten centimeters in length. There's a lot packed in there. The ATmega2560 chip is the big one at the upper right.

Freetronics EtherMega2560 and Atmel AVRISP mkII

Besides all the usual general purpose I/O (GPIO) headers found on the Arduino Mega, the EtherMega also has Ethernet, a microSD card slot, and the hardware surround for each. This makes it a combination of the Arduino Mega and the Arduino Ethernet shield all on one board.

The Ethernet is provided by a WIZnet W5100 chip, an application-specific integrated circuit (ASIC) with an RJ45 connector. The W5100 provides not just the Ethernet physical interface (PHY), and Ethernet controller, but a pretty complete internet protocol (IP) stack, all on-chip. (You have to wonder if there is a microcontroller core with firmware in there somewhere, but WIZnet says it's all silicon.)

This bundling of network protocol stacks or other advanced functionality inside a chip is very common on small embedded systems. It is not uncommon for an peripheral chip to have a more powerful microcontroller core embedded within than the external microcontroller for which you find yourself writing software. In fact, external microcontrollers, and the software we write for them, are frequently little more than glue used to integrate many other powerful components like ASICs and field programmable gate arrays (FPGAs) together and provide some small level of digital control to their management or their interface to the outside world.

This makes systems like the EtherMega conceptually like a asymmetric multiprocessing (AMP) system in that there is a lot of intelligence all operating independently and concurrently. But in fact, all recent Arduino boards really are AMP systems: they (and the EtherMega) have a second Atmel microcontroller (on the EtherMega it's an ATmega8u2) programmed to be a USB-to-serial interface that connects to one of the serial ports (it has four) on the ATmega2560. Welcome to embedded development. (Stuff like this is why I and others claim that the skill sets for developing for embedded systems, for distributed systems, and for high performance systems are basically the same; it's just a matter of scale.)

The ATmega2560 Microcontroller

The ATmega2560, like the rest of the Atmel ATmega family, and indeed most small microcontrollers, is a Harvard architecture: the machine instructions that it executes are stored in a completely separate memory from that of the data on which those machine instructions operate. The machine instructions reside in persistent flash memory, which Atmel refers to as program memory, and the data reside in non-persistent SRAM or data memory. This is in contrast to the larger microprocessors found in servers, your desktop, or even your smartphone, which are Von Neumann architecture: instructions and data reside in a common non-persistent memory, typically RAM, having been copied from some persistent store during the process of booting.

(Exceptions: I've helped develop fairly large embedded products which executed their machine instructions directly from persistent read-only memory or ROM based on a silicon technology called NOR-flash. This is referred to as Execute In Place or XIP. Such systems are typically not thought of as Harvard architectures because the same machine instructions are used to access either ROM or RAM; the only difference is the range of addresses used. This is becoming rarer because because NOR-flash, whose physical interface resembles that of RAM, is significantly more expensive than NAND-flash technology, whose physical interface resembles, or even emulates, that of a disk. For that reason, most of what we call flash today is NAND-flash. However, given how it's used, I'm assuming what the ATmega chips call flash is NOR-flash.)

This physical separation of program and data memory makes for a slightly different style of programming than you may be accustomed to. Fortunately, many of the details of dealing with this are handled for us by the tool chain and its library, which in our case is the GNU C and C++ compiler, the GNU assembler, the GNU linker, several GNU utilities, and the non-GNU and decidedly non-POSIX compatible AVR-specific C library.

Just to make things even harder, while data memory is byte addressed, program memory is word addressed, where a word is two bytes. There is no way for a machine instruction to address an individual byte in program memory, because that lowest order bit of an address that might indicate an odd or even byte within a word isn't even encoded in the machine instruction or in a pointer into program memory. All program memory fetches are done by the word, and every machine instruction fetched is one or two words. In this article I'll use byte addresses and lengths to refer to program memory just to be consistent in nomenclature when I talk about data memory. If you are familiar with the AVR architecture, this might make the some of the lengths or addresses for program memory look wrong. (It is also possible I will have botched something.)

In addition to flash (program) and SRAM (data) memory, the ATmega2560 used in the Mega, the Mega ADK, and the EtherMega, as well as the ATmega328P used in the Uno, have a second memory, EEPROM. EEPROM can be used to persistently store application-specific read-only parameters. Unlike program memory, instructions cannot be executed from EEPROM nor can data be read directly from it; it has to be accessed more like an I/O device.

Finally, all of the AVR microcontrollers have a few additional bytes of persistent read-only memory: several bytes of signature that identifies the specific model of chip, lock bits that control who can write where and when into EEPROM and flash, and several bytes of option bits that control how the chip itself operates, including what it does when power is first applied, bytes that Atmel refers to as fuses. The ATmega2560 has three fuse bytes: an extended fuse byte, a high fuse byte, and a low fuse byte.

Like the flash memory and the EEPROM, these fuses and lock bits (and, I'm assuming, even the signature) can be modified using the right tool. But fortunately for us they are not easily changed by software. As we shall soon see, this is how the microcontroller defends itself from what my Bell Labs colleagues referred to as death by download: the reduction of a device into little more than a space heater by loading software into it in which a bug breaks not just the device's function but the software download mechanism itself, a catastrophic event that is known as bricking the device.

The internet is full of tales of people bricking their smartphone while trying to to jailbreak it: loading software on it unencumbered by vendor-imposed restrictions. Once bricked, a device frequently requires a specialized tool, like an in-system programmer (ISP) in order to recover it. The ISP is typically more expensive (sometimes much more so) than the bricked device.

It is for this reason that embedded developers worry a lot about death by download. It's bad enough when the device rendered useless is a smartphone. It can be much worse when it is a core component in a multi-million dollar telecommunications system. Or a mission critical public utility, medical, or defense system. Or a space craft. My friend and occasional colleague Steve (a.k.a. The Space Cowboy) worked on the NASA Deep Impact comet probe and tells me stories of unmanned space probes launched with just the flight control avionics and a downloader. The remaining mission software was written back on Earth while the probe was in route. Talk about a hard deadline. You can be sure that that was the most thoroughly reviewed and tested downloader ever written.

The fuse bits in the ATmega2560 that are relevant to our interests are BOOTRST, BOOTSZ1, and BOOTSZ0 in the high fuse byte, and BLB01, BLB02, BLB11, and BLB12 that are lock bits. (All of this stuff is defined in detail in the Atmel reference manuals for each microcontroller model.)

BOOTRST determines at what address the microcontroller initially starts executing code when power is first applied. The unprogrammed value of this bit is the value one (1). Unprogrammed means right off the assembly line, because when they are erased or before they have been programmed for the very first time, flash and EEPROM memory devices revert to a state of all one bits; this isn't really a design choice, it is a consequence of how they work at the silicon level.

A BOOTRST value of one causes the microcontroller to start executing code at address zero (0x0) in the program memory. Location zero is known as the reset vector: typically what you program at this address is a jump machine instruction to the start of your program elsewhere in program memory. It's called a vector because it points somewhere else. It's done this way because the reset vector is just the first of many successive words that are vectors for other special conditions detected by the hardware, like interrupts or requests for service from hardware peripherals or even from devices outside of the microcontroller but connected to it via an external interrupt signal.

The BOOTRST value on the EtherMega is set to zero (0). This causes the microcontroller to instead start executing code at a location specified by the BOOTSZ1 and BOOTSZ0 bits. Two bits give you four different choices. On the EtherMega, BOOTSZ1 and BOOTSZ0 are both set to zero, which logically partitions the 256 KB of program memory into two pieces: 248 KB for the application, and eight KB for a boot loader starting at location 0x3e000. (This partition is smaller on the ATmega328P because of its smaller program memory and is hence at a different location.)

The Boot Loader

It is in this special partition that the Arduino boot loader software resides. The boot loaders for the Uno and the Mega boards are different, by the way, and implement a different protocol: the Mega boards use stk500v2 (emulating Atmel's boot loader for their STK500 development board) and the Uno board uses the smaller optiboot. But they both operate the same way. When you power up your Arduino, the microcontroller starts executing not at the reset vector, but instead at the location of the boot loader. The boot loader listens for a while to the serial port to see if a software download is in progress. The timeout differs depending on the boot loader and the microcontroller for which it was compiled, but on the ATmega2560 it's about a second. (The source code as well as the Intel hex binary images for both boot loaders are part of the Arduino distribution. Check it out.)

If it sees a boot load protocol message it recognizes on the serial port, the boot loader uses the flash memory hardware that is part of the microcontroller to write (also known as burn or flash) the binary application image coming down the pipe into the application partition of program memory. If the timeout expires, suggesting there is no download in progress, or once the download completes, the boot loader just jumps to the reset vector, into which it assumes as been stored a jump instruction to the beginning of the application. (I haven't checked, but I'm guessing maybe that in a brand-new Arduino right off the shelf the jump instruction at the reset vector just leads the microcontroller back to the boot loader.)

This is where the boot lock bits come in. To avoid death by download, the BLB01, BLB02, BLB11, and BLB12 bits in the EEPROM are coded to prevent the boot loader from writing into the boot loader partition of program memory. The boot loader is electrically prevented from writing over itself. In order to overwrite the bootloader, you need a special tool like the in-system programmer, a device that can write to program memory or even the EEPROM inside the microcontroller without running software on the microcontroller itself.

The Atmel AVRISP mkII is one such device, although there are others capable of doing the same functions, and sometimes other useful things like debugging. Here is a photograph of an AVRISP connected to the EtherMega board.

Freetronics EtherMega2560 and Atmel AVRISP mkII

The ribbon cable from the AVRISP connects to a six-pin header on the EtherMega. (And on the Mega and Uno boards too. There is even a second six-pin header for programming the ATmega8u2 microcontroller; don't mix them up.) In the photograph below you can see the six-pin header on the EtherMega just below and right of the microSD card slot. If you squint you can barely see a tiny yellow "1" printed on the circuit board at the upper left of the header; this tells you how to orient the ribbon cable, which has a (barely visible) matching triangular arrow on its plastic connector near where the ribbon cable attaches to it.

Freetronics EtherMega2560 ATmega2560 ISP Connector

If you decide that you want your Arduino board to always execute your application and you don't intend to ever load other software into it (or if you do so, you'll need use an ISP instead of the bootloader software), you can actually use an ISP to change the fuse and lock bytes, and overwrite the program memory that would otherwise be occupied by the bootloader. Most of us will never have a reason to do this, but if you are using an Arduino in a fixed production application, this is a quite reasonable thing to do. Every time power is applied, the microcontroller will jump through the reset vector into your application.

The AVRDUDE Utility

I mentioned a protocol that the boot loader software running on the microcontroller understands. What's using that protocol to send the binary application image to the microcontroller? For most of us, that would be the open source AVR Downloader UploaDEr utility, or avrdude, although other similar proprietary Atmel tools serve the same function. avrdude runs on your desktop and speaks to the Arduino boot loader. It's what the Arduino integrated development environment (IDE) uses to download your compiled sketch onto your board.

It turns out that avrdude can also speak to any number of ISP devices, including the AVRISP mkII. And there is some reason for you to want it to do so. It turns out that part of the boot load protocol involves commands for reading, and even modifying, the signature, lock, and fuse bytes in the EEPROM. Unfortunately, the Arduino boot loaders don't implement the complete boot load command set that avrdude can generate. Specifically, when you try to read EEPROM bytes using avrdude, the Arduino boot loaders lie to you, returning a hard-coded signature no matter what the actual underlying device is, and returning zeros for all the fuse bytes. This confused an old man for some time and made me even more cranky. Hence the AVRISP mkII in the photographs.

Below are two screen shots of me using avrdude to query both the EtherMega and an Uno for their fuse bytes using the ISP. (As usual you can click on these images to eventually see a larger size.)

Screen Shot:  avrdude, AVRISP mkII, ATmega2560

Screen Shot:  avrdude, AVRISP mkII, ATmega328P

The Reset Operation

Pressing the tiny reset button on the EtherMega (visible in the photograph above) or Uno boards is equivalent to a power cycle: the hardware is reset and the entire process begins again with the microcontroller beginning execution at the boot loader. Remarkably, the same thing happens when you connect to your Arduino board via its serial port using avrdude, the Arduino IDE's Serial Monitor, or even a tool like PuTTY on Windows or the screen utility on the Mac; that's because the ATmega8u2 microcontroller is programmed and wired up so that when a serial connection presents itself over the USB connection, it toggles the reset pin on the main microcontroller. (There are ways to defeat this if this isn't the behavior you want.)

Jumping to to the start of the boot loader code in program memory, or even to the reset vector for that matter, is not the same thing as resetting the system, because it doesn't reset the hardware back to its initial state. That's why Atmel recommends that if you need to reset the system from inside your code, you use the watchdog feature on the microcontroller itself. Watchdogs are hardware functions built into many processors that reset the hardware in the event the software goes insane. Insanity is usually detected by the software failing to periodically perform some action on the watchdog hardware, like set a bit in a special register. Your software can deliberately stimulate the watchdog to reset the both the hardware and the software in your system.

(Update 2012-05-08: Important Safety Tip: after playing with the hardware watchdog timer (WDT) built into the ATmega2560 microcontroller, I don't think there's a safe way to make use of it when you are using the standard Arduino boot loader. I'd be happy to be proven wrong. When the microcontroller takes a watchdog reset (WDR), the WDT remains enabled as the microcontroller emerges from reset. No matter what timeout you may have set the WDT to originally -- the maximum is eight seconds -- it emerges from WDR with a the shortest possible timeout value, which is about sixteen milliseconds. The boot loader has a timeout of about a second, depending on the Arduino model, before it gives up talking to avrdude and transfers control to your application. This means there is no way to disable the WDT before another WDR occurs. This results in rolling reboots. I wasn't able to get out of this mode even by power cycling the EtherMega board. I had to use the AVRISP mkII to reflash the microcontroller from scratch. If I hadn't have had the in-system programmer handy, I would have bricked my EtherMega board. Not good. I found some discussions about this issue on the web. The fix is to have the Arduino boot loader disable the WDT in its own initialization.)

The Tool Chain and Library

So now we have software running on our microcontroller. Ah, not so fast; what code does the reset vector actually jump to? main()? Fat chance.

If you are developing in a higher level language like C or C++, there is a lot of stuff that has to be done under the hood before your main program ever begins. Stuff that maybe you've been taking for granted. But nothing happens by magic. Somewhere there has to be machine instructions that do it. These instructions are either generated by the compiler and reside inside the object files that result when you compile your source file, or they are already compiled and reside inside AVR-specific C or C++ library routines that are part of the run-time system and which are incorporated into your application when you link all of your object files into a single binary image to be downloaded to your Arduino. (A similar process occurs whether you're developing an application in C or C++ for an ARM processor in your smartphone, an Intel processor on your desktop, etc.)

For C this includes setting the values of variables that are not automatic (that is, global or static variables, not temporarily allocated on the stack while executing inside a function) and for which you coded an initializer. Here's an example from Arduino's EthernetClient.cpp.

uint16_t EthernetClient::_srcport = 1024;

For C++ this also includes running constructor methods (and when your main program exits, destructor methods) for objects that are neither automatic nor allocated on the heap (that is, via the C++ new operator, which, by the way, the GNU AVR C++ library doesn't implement). Here's an example from Arduino's HardwareSerial.cpp.

HardwareSerial Serial(&rx_buffer, &tx_buffer, &UBRRH, &UBRRL, &UCSRA, &UCSRB, &UDR, RXEN, TXEN, RXCIE, UDRIE, U2X);

To understand how all this initialization gets done (and how the corresponding finalization gets done at program termination) we have to first look at what the GNU linker actually does.

Going back into the dark ages well before Arduino, the segments or collections of bytes inside an object file generated when the compiler compiles your source code can be broadly classified into three different categories: text, which are executable machine instructions, data, which are non-automatic variables to which you have given an initial (typically non-zero) value, and bss, which are non-automatic variables which do not have initial values. (The term bss stands for Block Started by Symbol and is an ancient term going all the way back to the 1960s.)

You can see why it would be especially useful to do this on a Harvard architecture: bytes in the text segments must be loaded into program memory; bytes in the bss segments must be reserved in data memory and zeroed out during initialization; and bytes in the data segments must be stored in program memory and then copied into bytes reserved in data memory during initialization. But the process is similar even on Von Neumann architectures since the text segments are typically placed in memory pages in RAM that are marked as read-only using a memory controller.

Conceptually, the linker concatenates all of the text segments, all of the data segments, and all of the bss segments, from all the object files into one big binary image. It then looks through the text and data segments in that image for any unresolved external references, that is, pointers to external variables or calls to external functions. It tries to find those variables and functions inside your image, and if it finds them, it fixes up the addresses to point to the right places. If it doesn't find them, it starts looking through all of the libraries you've told it about. If if finds them there, it appends the appropriate library routines to your binary image, and again fixes up the addresses to point to the right places. Because the library routines may themselves have unresolved references, it may have to do this several times. If it successfully resolves all of the external references by iteratively repeating this process, then it's done. It it doesn't, you get a link error.

(Arduino programs are statically linked. Programs for other platforms may be dynamically linked against shared libraries or what Windows calls dynamic link libraries or DLLs. This means a library actually resides in memory and that one copy may be shared by all running programs, executing the very same copy of every library function at the very same memory location. But ultimately the process of linking is similar.)

But of course it isn't that simple. Each individual object file, whether it is one of yours or from a library, has its own requirements for initialization and finalization. All of those individual snippets of initialization code has to be done before your main program begins, and all of those individual snippets of finalization code has to be done after your main program returns. This is dealt with by associating those code snippets with named sections, which is just a way for the software developer to further classify byte snippets in the object file that is orthogonal to their segment classification.

By convention, the AVR tool chain supports sections with names .init0 through .init9, and .fini9 through .fini0. (The order there isn't accidental.) There are both assembler directives (if you are writing code in AVR assembler) and compiler directives (for C and C++) that tells the compiler which, if any, of these sections you want a particular code snippet to be identified with. For example, the code below, from the stk500v2 boot loader used on the EtherMega board, specifies that the function __jumpMain is to be in section .init9. This is just an instruction to the linker regarding special handling for this particular function.

void __jumpMain (void) __attribute__ ((naked)) __attribute__ ((section (".init9")));

Remember way back when we talked about the reset vector at location zero in program memory? (It seems so long ago.) When you compile, link, and download your program into your Arduino, that reset vector ends up being a jump instruction to the entry point of a routine from the AVR libc.a library and which can be seen in the assembler source file gcrt1.S. The entry point name for that assembler routine is __init, and the reset vector contains nothing more than the instruction jmp __init.

This GNU C Run-Time routine #1 executes all of the initialization code in section .init0, then all the code in .init1, and so forth all the way through .init9. By convention, .init2 contains code to set up the stack; .init4 has code to initialize the data and bss segments; .init6 has calls to the C++ static and global constructors; and the code in .init9 is as simple as the instruction call main.

Similarly, when your main program returns, all the code in section .fini9 is executed, then .fini8, all the way down to .fini0. By convention, .fini9 contains the code implementing the C _exit function; .fini6 has the calls to the C++ static and global destructors; on most platforms, code in .fini0 would signal the operating system that the application is complete. Since Arduino has no operating system, .fini0 goes into an infinite loop.

So, I guess that __init routine is in .init0, right? Yep. And it just calls all that other initialization code, right? Nope. It doesn't have to. In fact, for the most part, __init doesn't even know about any of that initialization code. It's actually simpler than that.

Every time you link your application the GNU linker uses a linker script. This is a script written in the linker's own language that tells the linker how to put together a executable binary image for your particular processor. You can provide your own linker script, but if you don't, a default one specific to your processor architecture is used. These scripts live in a directory inside the GNU tool chain called something like /lib/ldscripts, depending on where your tool chain is installed.

The ATmega2560 microcontroller on the EtherMega is an AVR6 architecture, and it uses the script name avr6.x. The ATmega328P on the Uno is an AVR5 architecture, and it uses the script avr5.x. Perhaps you perceive a pattern. In the article Hidden Variables: Arduino and AVR Microcontrollers, I described how to display the implicit preprocessor symbols inserted into your source program by the GNU compiler. One of these symbols tells you what your microcontroller architecture is. You can find these scripts on your own system.

Here's a snippet right from the avr6.x script from the GNU tool chain on my desktop.

*(.init0) /* Start here after reset. */
KEEP (*(.init0))
KEEP (*(.init1))
*(.init2) /* Clear __zero_reg__, set up stack pointer. */
KEEP (*(.init2))
KEEP (*(.init3))
*(.init4) /* Initialize data and BSS. */
KEEP (*(.init4))
KEEP (*(.init5))
*(.init6) /* C++ constructors. */
KEEP (*(.init6))
KEEP (*(.init7))
KEEP (*(.init8))
*(.init9) /* Call main(). */
KEEP (*(.init9))
. = ALIGN(2);
. = ALIGN(2);
*(.fini9) /* _exit() starts here. */
KEEP (*(.fini9))
KEEP (*(.fini8))
KEEP (*(.fini7))
*(.fini6) /* C++ destructors. */
KEEP (*(.fini6))
KEEP (*(.fini5))
KEEP (*(.fini4))
KEEP (*(.fini3))
KEEP (*(.fini2))
KEEP (*(.fini1))
*(.fini0) /* Infinite loop after program termination. */
KEEP (*(.fini0))

This snippet instructs the linker to append all of the code in .init0 to the application image, then all the code in .init1, all the way through .init9, then all of the code in .text segments that are not otherwise classified, then all of the code in .fini9, and all the way through .fini0. It merely concatenates all of these sections in the order specified by the script. So __init doesn't have to call the code in .init1 or even know about it. It merely falls through to the code in .init1.

When .init9 does the call main the very next machine instruction, the one that will be executed when main returns, is jmp exit. This is a little assembler routine in the C library that can be found in exit.S which does nothing more than disable all interrupts and execute the instruction jmp _exit. That's an entry point to a routine that can be found in the GNU libgcc.a library and which can be seen in the assembler source file libgcc.S. That routine is in section .fini9, from whence execution falls through all the way through .fini0.

The Arduino Sketch

If you write code for Arduino then you already know that, unlike most C and C++ based development platforms, you don't actually write a main program for your sketch. Instead, you just write setup() and loop() functions. That's because main() is already written for you. It resides in the file main.cpp. Here is the source for that file from the Arduino 1.0 distribution.

#include <Arduino.h>

int main(void)

#if defined(USBCON)
for (;;) {
if (serialEventRun) serialEventRun();
return 0;

You can see that main() calls its own init() function, then your setup() function, then calls your loop() function in an endless loop while checking for data on the serial console port.

The End

My adventures with Arduino continues. What I really like about Arduino, the GNU tool chain, and the AVR C and GNU GCC libraries, is that you can actually figure this stuff out. It is not opaque. It's complicated, but not too complicated. It's mysterious, but only for a while. For sure it's the best platform I've seen so far on which to teach embedded and real-time development.

(Thanks to Udo Klein who reviewed the early version of this document and suggested some corrections, which I have made. You can find his original remarks in the comments.)

Monday, March 05, 2012

Sunshine On My Arduino Makes Me Happy

Denizens of the Great State of Colorado like to boast that we have 300 days of sunshine per year. That's a little misleading: the Colorado Climate Center says that although the sun does indeed come out for at least an hour 300 days of the year in Colorado, only 115 days can really be said to be those classic clear sunny days.

But do we get enough sunshine to run an Arduino Uno board with an Xbee radio? There was only one way to find out.

I connected two six-volt solar panels (Radio Shack P/N 2770052) in series to get a voltage somewhere in the range of what the Arduino Uno would find acceptable, and connected the combination to an M-type coaxial DC power plug. I wrote a little Arduino code to just print a time stamp, loaded it into an Uno powered by its USB cable, and tested it. I asked one of the cats where the sunniest spot was right then; Petronius assured me that in the breakfast area near a flower pot was the place to be.

In the photograph below you can see the solar cells in series, the Arduino Uno, and a SparkFun XBee shield with an XBee Series 1 radio on top of it.

Solar Powered Andruino and XBee Radio

I had to position the solar panels just right to get enough juice to power both the Uno and the XBee radio. But with a little fiddling, all the right LEDs came on and started blinking. I went back to my office a floor above and pointed the Arduino IDE Serial Monitor tool at the XBee that was USB-attached to my Mac Mini. I was rewarded with this output from my Arduino program.

Screen Shot: Elapsed Time

Here's the simple little program that generates and prints a time stamp with approximately millisecond resolution. The Xbee shield is setup to route the serial output of the ATmega328P microcontroller over the wireless serial link between the two radios.

void setup() {

void loop() {
 unsigned long milliseconds = millis();
 unsigned int hours = milliseconds / 3600000UL;
 milliseconds %= 3600000UL;
 unsigned int minutes = milliseconds / 60000UL;
 milliseconds %= 60000UL;
 unsigned int seconds = milliseconds / 1000UL;
 milliseconds %= 1000UL;
 if (minutes < 10) { Serial.print('0'); }
 if (seconds < 10) { Serial.print('0'); }
 if (milliseconds < 100) { Serial.print('0'); if (milliseconds < 10) { Serial.print('0'); } }

I continue to look at low-power automation and wireless networking and alternate power sources for my Amigo project. In the long run what I would really like is a combination solar/LiPo battery combination. But this was a useful first step.

Update (2012-03-12)

The one pair of six-volt panels in series proved a little weak for reliable operation. So I built a second identical fixture and connected the two fixtures in parallel to get more power. Here's what it looks like now.

Four Panel Solar Array

Update (2012-05-30)

Here's the next iteration of this project. Clockwise from top left: a 7A solar charge regulator, an Arduino Uno with an XBee shield, a 12V sealed lead-acid rechargeable "gel cell" battery, and a 12V 1.5W solar battery charger.


I definitely went the cheap route on this new setup, the battery being the most expensive component. The idea is that the Arduino runs off the battery, which is charged by the solar panel. I'm experimenting now with seeing how long this can keep running, starting with a fully charged battery. The software on the Arduino wirelessly sends a heartbeat to my desktop, which logs it with a timestamp. The heartbeat message indicates how long the Arduino has been running since its last reset (which would happen if the power fails and later returns). The output is logged to a file so I can see what happens overnight. My guess is a (much) larger solar panel and maybe a better solar power controller would be necessary if this evolved beyond the experimental stage.

Update (2012-07-19)

I went to a larger 5W solar panel, then to much larger 15W solar panel. We'll see if that's enough to charge the 12V battery during the day and let the system run all night. I also had to go to Xbee Series 2 radios with external antennas on both the Arduino Uno with the Xbee shield, and on the Xbee Explorer that is USB connected to my desktop, in order to get the range I needed, which was just a few tens of yards from the south-western edge of my back yard to my home office on the south side of my house.

Here's the "instrument pod" with its external antenna visible at the lower left.

Instrument Pod: Cover Off

Here is the instrument pod in my back yard connected to the largish 15W solar panel.

Instrument Pod and 15W Solar Panel

Here is the tiny Xbee Explorer, with its own external antenna, USB attached to my desktop Mac.

Xbee Coordinator on Xbee Explorer

I'll monitor this for the next few days and see what happens.