Thursday, August 28, 2008

Diminuto Right Out of the Box

Diminuto is my attempt to put together a platform on which to teach real-time software design, systems programming, embedded software development and assembly language programming, all using real commercial hardware and open source software. In Choosing Hardware for Diminuto, I described how I chose the Atmel AT91RM9200-EK evaluation kit has my first cut at a hardware platform. In this article, I'll describe the EK in detail and show you how to get it up and running in about an hour, just with the software it comes with.

(Blogger sometimes truncates the right-hand portion of the photographs. You can always click on a photograph to bring a full-size version up in your browser.)

The EK is a single board computer (SBC) that is about 6.5" x 7.5" in size. It has an AT91RM9200 microprocessor, part of the Atmel AT91 family. The AT91RM9200 is a system-on-a-chip (SoC) that integrates an ARM920T RISC processor core with several special purpose cores that implement controllers for a variety of peripheral devices. The ARM920T is part of the ARM ARM9 family. It incorporates a real-time clock, a system timer, and a memory management unit (MMU). The EK has 8MB of flash, pre-programmed with the U-Boot boot loader, and 32MB of RAM. Peripheral devices include an Secure Digital (SD) card slot, two USB host ports, one USB gadget port, two DB-9 serial ports, one 10/100 Base-T Ethernet RJ-45 port, a video controller (which is the largest chip on the board) with a VGA port, a JTAG debugging connector, and some other goodies.

Here is a photograph of the EK hooked up to my development setup at the Palatial Overclock Estate (a.k.a. the Heavily-Armed Overclock Compound). You can see the beige serial console cable to the left, the blue CAT5 Ethernet cable front center, and the power supply to the right. Other accoutrements include a little fan (not really necessary, the EK runs pretty cool), a static mat and wrist strap (absolutely required), and a magnifying glass (because I'm old).

Board Setup

The EK is available retail in single quantities from any number of distributors for about $1250. (Yes, I know; I bought an absolutely immense Dell 530 quad-core server for less than this; economies of scale are at work here.) I purchased mine from Avnet. Here's what it looks like when the box arrives at your door.

Atmel AT91RM9200-EK Box

The box contains (left to right, top to bottom): a European electrical cord, a U.S. electrical cord, a 12V power supply, the EK board in a static bag, a DB9S-to-DB9S null modem serial cable, a Type A-to-Type B USB cable, three DataFlash cards with sample systems on them, a CAT5 Ethernet crossover cable, and two CD-ROMs, one with software for the EK, and a Windows CE demo.

Atmel AT91RM9200-EK Contents

You can use the CAT5 Ethernet crossover cable to connect the EK directly to the Ethernet port on another computer. Instead, I connected the EK to a nearby Ethernet switch using a standard CAT5 Ethernet cable. I used the DB9S-to-DB9S serial cable to attach the console port on the EK to an old Windows ThinkPad laptop, and I used PuTTY to talk to the board. For sure, you will need access to the serial console, since much of not all of the work you will be doing on the EK will be in single-user mode. In a teaching laboratory with several stations, you might consider connecting all of the EKs to a reverse telnet server, which is common practice in the field for serial console ports on all kinds of devices.

The EK has a configuration strap, J15, that needs to be set to cause it to boot U-Boot from flash. This is described in the documentation that comes with the EK. (And this is why I needed the magnifying glass.) If you hold the board while you are grounded using a static strap so that the printing is right-side-up, J15 is in the bottom left quadrant on the board. It should be set to the right so that it connects the center of three pins to the right-hand pin labeled EXT. The photograph below is kind of fuzzy, but it shows you the correct orientation of the board.

Atmel AT91RM9200-EK Close Up

The software CD-ROM contains a Windows Trivial File Transfer Protocol (TFTP) server utility, U-Bootable Linux 2.4 kernel images for both the EK and the older (and more expensive) AT91RM9200-DK boards, a corresponding initrd RAM disk root file system image, and a tarball of the tool chain. The tool chain includes the C cross-compiler environment which runs under Linux on a i386 platform, and generates code for the ARM9. I used an existing Linux server (a Dell 530) as both my development platform for the tool chain and as a TFTP server.

The U-Boot boot loader is widely used in embedded products. It is preloaded into flash on the EK. It is set up to boot the EK over TFTP. All you have to do is drop the linux-ek kernel image and the initrd ramdisk root file system image from the CD-ROM in appropriate directory on your TFTP server, administer U-Boot with the parameters for your network, and run some predefined U-Boot commands.

Here is the initial U-Boot dialog that you will see on the console when you hit the reset button on the EK. U-Boot is waiting for you to enter a command.

U-Boot Initial Dialog Upon Power-Up/Reset

You will need to use a series of U-Boot setenv commands to configure U-Boot by setting parameters in the form of keyword value pairs. Then use the U-Boot saveenv command to save your settings in non-volatile flash so that they persist across resets. Finally, you can use the U-Boot printenv command to see all of your saved settings. This is described in the Getting Started guide on the software CD-ROM. Use the printenv command first to see what parameters might already be set. (There is a typo on the first page of the guide: when setting the bootargs parameter, there is a space instead of a comma between 115200 and mem.)

You will likely want to replace the IP addresses in the guide for the parameters ipaddr, the static IP address of the EK, and serverip, the IP address of your TFTP server. Here are the commands I used.

setenv ipaddr
setenv serverip

The guide also shows you how to define long U-Boot command sequences, save them as variables, and then run them. I followed the manual and defined the commands kernel to download the kernel image, and ramdisk to download the ramdisk image. This is useful to save time in the long run, although don't hesitate to try using the raw U-Boot commands. (The guide doesn't show using single quotes to enclose strings with embedded spaces as the value portion of the U-Boot setenv command, but I've used that practice on U-Boot on other projects in the past.)

Below you can see how the U-Boot printenv command displays all the settings for my EK.

U-Boot printenv after provisioning using setenv

To boot the board, you need to tell U-Boot to use TFTP to load the kernel image into memory, then load the ramdisk image into memory, then finally to boot the board. U-Boot boots the board by literally calling the kernel image as a subroutine and passing to it the address of the ramdisk image as a parameter.

First, I download the kernel image linux-ek from the TFTP server to memory location 0x21000000 using the command kernel that I defined.

U-Boot run kernel

Next, I download the ramdisk image ramdisk from the TFTP server to memory location 0x21100000 using the command ramdisk that I defined.

U-Boot run ramdisk

Finally, I tell U-Boot to boot the kernel image at location 0x21000000 using the bootm command. You can see below that Linux 2.4 boots and I log into the system.

U-Boot bootm

Using the EK right out of the box, I was able to compile "Hello, World!" on my Linux PC using the arm-linux-gcc command in the provided tool chain, download the resulting ARM executable to the board using TFTP from the shell prompt on the EK, and run it on the EK. There is a lot of learning experience in just getting this far.

Of course, I wasn't satisfied. This was the older Linux 2.4. I wanted to run the current Linux 2.6 kernel. I wanted to use Windows and Linux file systems on SD cards and USB drives. I wanted to experiment with tools to reduce the memory footprint of the system. I wanted a richer set of commands on the EK than what was provided out of the box. I wanted to port my Desperado embedded C++ library and run the unit tests. I wanted Linux to configure the Ethernet port automatically during boot.

In upcoming articles, I'll describe how to get Linux 2.6 running on the EK to do all that other fun stuff.

Wednesday, August 27, 2008

Choosing Hardware for Diminuto

Diminuto is my attempt to put together a platform on which to teach real-time software design, systems programming, embedded software development and assembly language programming, all using real commercial hardware and open source software. Sure, it may be a stupid idea. But damn it, it's my stupid idea.

When I first started thinking about this, I scribbled down some basic goals.

The cost of entry has to be reasonable.

A university had to be able to put together a teaching laboratory with several stations at a reasonable cost. And although a university CS, CEG or EE department was the customer I had in mind, the cost shouldn't rule out the occasional individual from trying their hand at learning this stuff. This ruled out a lot of otherwise interesting evaluation boards and development kits that cost thousands of dollars or required specialized equipment to program. It also required the use of open source tools to keep the cost of the tool chain down. Using open source also admitted the possibility of modifying existing code, like the kernel or device drivers, as part of a class assignment.

(Disclaimer: if you are a professional embedded developer, or if you manage such, I would be remiss if I didn't tell you that paying for commercial tools for embedded development is, IMHO, well worth it. I've used tools like IAR Embedded Workbench and MontaVista DevRocket to extremely good effect, and would not hesitate to use them again. While embedded development is completely doable with open source tools, you will be trading time for money, and an essential element of happiness is knowing what your time is worth. If you are a university with a big endowment, you are certainly encouraged to use these tools also. Your students will no doubt encounter them in the real world of embedded product development.)

The hardware had to be commercially available.

No special developments or custom boards, nor processors that were far off the mainstream, nor stuff that was hard to find. Partly this was a cost issue, but also I wanted it to be likely that students using this platform would encounter something very similar in the work place.

The instruction set of the processor had to be amenable to learning assembly language programming.

The advent of reduced instruction set computer (RISC) architectures made this challenging. The PDP-11 had the simplest instruction set of any processor I've ever examined, but many current RISC machines make even the old IBM 370 instruction set seem simple in comparison. The platform had to have an instruction set that was rich enough to not be too painful to use and to illustrate all the necessary concepts, yet simple enough to be taught in sufficient depth in perhaps a ten week college course.

The processor had to be widely used in actual devices, especially consumer hand-held devices.

I wanted students to use processor architectures that they would be likely to encounter in the real world. I wanted them to look at their PDA, cell phone, MP3 player, or other hand-held, and be able say to themselves "I'm writing code that could run on this device".

The platform had to be sufficient to run Linux, but had to be resource constrained enough that developing for it wasn't like developing for a Pentium-based Linux desktop.

I've used a lot of real-time operating systems (RTOS) during my years in embedded development: roll-your-own, pSOS, VxWorks, PSX/C-Executive, and RTX are the ones that come immediately to mind. But Linux is becoming more and more common in embedded devices, as the devices become more powerful, and tools become available that allow Linux to run in a smaller memory footprint. I had already gotten Linux running on several embedded platforms, ranging from one that booted from ROM and ran only single-user with a RAM disk, to one that booted and ran multi-user from a persistent disk-like file system.

Still, embedded systems have resource constraints that make developing for them unlike developing for your typical white box PC. In fact, the resource constraints often get tighter and tighter, because companies continue to shoehorn microcontrollers into smaller and cheaper devices. The ability to run Linux on a resource constrained platform was the best of both worlds: familiar, relatively inexpensive, and challenging.

Realistically, this meant a 32-bit processor with a memory management unit (MMU).

The platform had to be capable of running a lighter weight real-time operating system without undue pain.

It couldn't just run Linux, because Linux isn't always the answer, particularly for hard-real-time or highly memory constrained applications. There had to be at least the possibility of running some other open source or at least inexpensive RTOS without a huge amount of pain writing a board support package to initialize the processor, the MMU, etc. The fact that the platform could run Linux meant that much of this code could be cribbed from the Linux boot code for the platform and used in other, perhaps standalone, applications.

The platform had to have a rich set of contemporary, useful devices, including Ethernet and removable storage media.

Students had to have access to devices both for accessing and running the system and for programming as part of their class assignments. Removable media would allow them to take their projects with them when they left their station in the laboratory.

For cost reasons, this almost certainly required a system-on-a-chip (SoC), in which I/O controllers are embedded in the processor chip itself. This has become common in real-life embedded development, with ARM-based and PowerPC-based processors having integrated controllers for technologies like serial, Ethernet, and even USB. It is also a good example of specialized multi-core architecture.

The platform must include a boot-loader in ROM.

I didn't care so much what the boot-loader was, although something mainstream like U-Boot, widely used in the embedded community, was very desirable. The boot-loader was necessary so that each station didn't require an exotic (read: expensive) tool, like as a JTAG hardware debugger, to load software. The ability to boot from removable media or the network was very highly desirable. Having a boot-loader whose source code was open source also meant that code to initialize the processor could be cribbed from this and used elsewhere.

The tool chain must provide C and C++ compilers and an assembler, and be expandable to include other languages.

C is the lingua franca of embedded development, and is an absolute requirement if you are going to go about hacking the Linux kernel. C++ is, in my personal experience, a lot more common in embedded application development than embedded development pundits would have you believe. If you have C and C++, chances are you are getting an assembler as part of the deal too, but in any case it's required if you are going to learn even a little bit about assembly language.

But I wanted students to be able to explore other options as well, like Java, or Digital Mars' D, or even scripting languages. A tool chain based on the GNU Compiler Collection (GCC) seemed like a pretty open-ended choice.

The platform must expose its devices physically and programmatically.

By this I mean I wanted the chips physically accessible, and didn't want the lower level device details hidden by a BIOS in ROM. I wanted the possibility of students getting experience using logic analyzers, oscilloscopes, or other diagnostic equipment, because that's part of the real-life embedded experience. Not having a ROM meant that existing Linux device drivers for the platform could be studied in all their glorious awful detail regarding the hardware interface and perhaps even modified as part of a class assignment.

The platform should be an unlikely theft target.

When I started contemplating this project, my first step was to Google around to see what others had done before me. (I also interviewed some of my embedded mentors from Bell Labs, Tam, Doug and Randy, quizzed my former colleagues from graduate school who were still in academia, Dale and David, and even pinged my former professors, most of whom had long since retired.)

I was particularly impressed with the work of Arnold Berger (University of Washington) and Dennis Brylow (Marquette University), both of whom tackled the problem of teaching embedded software development. (In fact, I will excuse you if you simply ignore everything else I have to say on this topic and go off and base a course around their work, so delighted I will be that you are doing anything at all.) Berger took a approach very similar to mine. Brylow did what I thought was pretty damn clever and used existing consumer devices like LinkSys routers that could be pried open and programmed and modified. Before embarking on this project, I also experimented with my own Nokia N800 hand-held to see if it could be used as platform.

But Berger did his work in 2000, so his choices for hardware were different from mine. Things change a lot in just a few years. And while consumer devices like the LinkSys and N800 were more cost effective, I didn't think they exposed the hardware sufficiently, and more importantly, I thought they would be attractive theft targets, having had some experience with that during my time in academia.

The platform should have a form factor that is typical of prototypes in the real world of embedded software development.

I just liked the idea of students laboring over a circuit board. It was nostalgic, particularly of the prototype phase of a project before the physical cabinet or chassis has been completed.

My next step was to choose some candidates for processors. I admit I had my own prejudices.

The Motorola (now Freescale Semiconductor) PowerPC was the original processor architecture of the IBM RS/6000, with which I've also had much experience during the supercomputing phase of my career. It was also the architecture of choice for almost every embedded telecommunications product I've ever worked on. I wasn't that crazy about its instruction set from a teaching point of view, having had to learn enough of it to debug the occasional ugly problem. Development kits were easy to come by, but a little more expensive than I was happy with. SoC implementations were common. I was very familiar with running Linux and other RTOSes on such processors, and was confident that the project was completely doable on the PowerPC.

The ARM series of processor architectures can be found in, I dunno, thousands of consumer devices. ARM doesn't actually produce hardware, they produce IP (intellectual property) processor cores that can be dropped into various logic devices. Which means everybody and his brother (247 licensees to be exact, including Freescale Semiconductor) makes processor chips that run the ARM instruction set. I've worked on a couple of projects that used ARM cores, one of which ran Linux. I probably own a dozen devices that contain ARM-based processors, several within arm's reach as I write this. The ARM instruction set is pretty nice, a little reminiscent of the PDP-11, particularly in its 16-bit Thumb mode. ARM-based development kits are widely available and reasonably priced, and SoC implementations are easy to come by.

The next step was crucial to making life easier for all of us: I downloaded the latest Linux 2.6 kernel, ran make menuconfig for ARCH=arm and ARCH=powerpc, and explored the kernel configuration menu for what development kits were already supported with board support packages by the kernel and its build environment. I limited my Google search for development kits to these choices.

No use in making life harder than it needs to be.

Eventually I settled on the Atmel AT91RM9200-EK evalutation kit. This board has an AT91RM9200 SoC based on the ARM9 core, has the U-Boot bootloader in flash, an MMU, 32 megabytes of RAM, 10/100 Base-T Ethernet, several serial ports, supports removable media in the form of both SD cards and USB drives, and has a JTAG debugging port.

In upcoming articles I'll describe the AT91RM9200-EK in detail, and discuss how to get Linux running on it.

Tuesday, August 26, 2008

Diminuto and the War Against Abstraction

I don't want to know how it works under the hood.
-- Former coworker and Java developer

Decades ago, back when software was programmed instead of developed, practitioners of the art were classified into two broad categories: application programmers and systems programmers. The former wrote programs that used databases and produced reports. The latter understood how everything actually worked under the hood and was able to fix it when it broke.

Today we just call all of them software developers. Most of them develop software that uses databases and produces reports (typically in HTML). And systems programmers, in the classical sense, appear to me to be few and far between.

Several years ago, in his blog Joel On Software, Joel Spolsky wrote about The Perils of JavaSchools (which I just re-read in his latest book More Joel on Software). Joel bemoaned the transition of many computer science curricula to exclusively teaching Java. CS programs want a simple programming language that allows them to teach important concepts without unnecessary pain. Joel's argument was that a little pain is a good thing: until you have to debug something where the only diagnostic you get is "Segmentation Fault", you aren't really going to understand what's going on under the hood.

Now don't get me wrong: I've made good money developing in Java, enjoyed it, and was glad to get the work. I helped develop a successful commercial telecommunications product whose firmware I wrote partly in Java. I paid my dues in the Enterprise Java wars by working on a team that developed a business process automation product based on the Java Business Integration specification. I've been known to sling some open source Java stuff out there too.

But even though I wrote a good portion of that firmware in Java, I wrote the underlying device drivers in C. And when that enterprise product got sideways, I had to dive in and figure out how the heck threads where being managed in the framework we were using. Even the relatively simple open source stuff required that I figure out how the JMX protocol interacted with internet firewalls.

For my typical daily work, much of which involves writing firmware for custom hardware that I've never seen before, I'd be glad to see "Segmentation Fault" on a display when my code fails. Hell, I'd be glad to see an LED change state. Sometimes all I notice is that things have gotten really quiet as the switching power supply gears down.

The software industry has seen immense growth since I started programming in 1970. The typical hardware platform for both the desktop and servers has undergone a brutal natural selection to where it seems like everything is a Pentium processor running either Linux or Windows, depending on your religion, nearly all user interfaces are now browser based, and even those that aren't are still graphical. Everything else was just too expensive, or too hard to use, and were winnowed out in a process that Darwin would have appreciated without understanding any of the technology. Even Apple was subject to this effect, moving to a version of UNIX and eventually into the house of Intel.

This evolution has served us well. It's why I can afford to have a quad-core server in the basement. And why Mrs. Overclock (a.k.a. Dr. Overclock, Medicine Woman) loves our DVR.

But, see, here's the thing: it only seems that way. In fact, the number of embedded processors which are not Pentium chips, frequently running an operating system other than Linux or Windows, has exploded as consumer devices like cell phones, internet tablets, PDAs, GPS receivers, MP3 players, and other hand-held devices have taken over our purses, briefcases, and messenger bags. Even more so is the introduction of microprocessors into our everyday lives in places where we aren't really aware of them, from our kitchen appliances to our vehicles to our office equipment.

Someone has to know how to develop software for these kinds of devices: weird-ass hardware that no one has seen before, stuff that has severe resource constraints in terms of CPU speed and memory footprint, complex interfaces to physical hardware, and form factors so small your hardware guy has problems getting a logic probe on to a pin to tell you what his hardware thinks it's saying to your software. You can't just grab a Java framework that you found via Google to solve these kinds of problems.

And with the advent of multi-core servers, and the breakage of simple consistent memory models on such systems, developers using C, C++, and even Java and C#, on white box Pentium servers now have to have a deeper understanding of threading models and how the hardware handles data coherency between processor cores. Simple errors like "Segmentation Fault" will be a fond memory. Failures will instead be of the nature of "it works most of the time" and "we can't reproduce it".

Yeah, I know I sound like a cranky old man: I remember when I had to walk three miles to the computer center, uphill both ways, through a blinding snow storm, in order to use a keypunch to write software in IBM 360 assembler. But to be a systems programmer, you must understand, to significant depth, what is going on under the hood, no matter whether you are writing in Java, C#, C++, C, or assembly language. You have to be both willing and able to tunnel through all the layers of code to figure out how something works. You must be fearless. You must be a foot soldier in the war against abstraction.

Alas, it will be an uphill battle. Because abstraction isn't just a good thing, it is a necessary thing. All human knowledge builds on what came before it. Engineering is the invention of abstractions and the layering of them on top of one another in the quest for conceptual simplicity. If, in order to have a new car, we each had to start by mining ore with our hands, we'd all still be walking.

But systems programmers must be able to operate on many levels of abstraction simultaneously. They must be able to appreciate that a Java framework supports inversion of control to connect several web services components, while at the same time understanding when the volatile keyword is necessary for the JVM to insert the necessary memory barrier machine instructions.

Believe it or not, universities used to teach systems programming. They didn't call it that, but that was the mindset that they were trying to instill. For me, it was CS431 (later CEG431), "Real-Time Software Design", which I took from Professor Robert Dixon, way back in 1976. In this course, students developed a tiny real-time operating system that supported semaphores and message passing, and then used it to develop a multi-threaded application with interrupt-driven memory-mapped device drivers. It was all written in assembler language. It had to run on an actual PDP-11 minicomputer with real devices. It had to run all of the devices at their rated speed. And to pass the course, you had to take a core dump of your running system and do a complete dump analysis to the satisfaction of the instructor.

CS431 was required for undergraduate computer science and computer engineering majors to graduate, and for graduate students in those majors to take upper division courses. It had a 50% attrition rate. I passed it. Barely. And, remarkably, when I went on to graduate school, I ended up teaching it. (And Bob Dixon became my thesis advisor and mentor.)

It was the best of courses. It was the worst of courses. For some students, it prevented them from ever completing their degree. For others, it was the first time in their academic careers they had ever been challenged, and they loved it. I was somewhere in the middle. It was a bitch of a course to pass, but since at the time I was working full-time in a job where my title actually was "systems programmer", I was compelled to understand it.

Looking back, I find that CS431 has informed a successful career on my part for the past 30+ years. No matter what I do for a living, from writing board support packages for custom hardware, to developing protocol stacks from reams of specifications, to designing network traffic management firmware, to writing Java frameworks, there's a little bit of CS431 in all of it.

If you were to teach such a course today, what would you use? The PDP-11 evolved into VAX which evolved into the Alpha, all of which died when the Digital Equipment Corporation merged with Compaq who then merged with Hewlett-Packard. What processor architecture would you choose? What hardware? What kind of software would you write for it? What language would you use? What kinds of projects could you assign?

Diminuto is my small, personal attempt to answer those questions. It is one megalomaniacal supervillain's vision for teaching a new generation of systems programmers. I'll be writing about it in the articles to come.

Friday, August 01, 2008

Large Hadron Rap

I've been amusing myself by telling my friends that the solution to the Fermi Paradox is that civilizations exist just long enough to build a Large Hadron Supercollider. (And check out these incredible photographs.) Even so, the destruction of Earth may just have been worth it for this, the Large Hadron Rap: