Wednesday, June 20, 2012

Welcome to the Major Leagues

I'm not the only person to have recognized that the skill sets for doing embedded development are the same as for developing for large distributed systems and multicore/multiprocessor architectures. And I'm sure I'm not the only one to have built a career on that common skill set based on dealing with concurrency and parallelism (and understanding why those are two different things), frequently working close to bare metal, developing scalable solutions, worrying about real-time and emergent behavior, and needing to understand what goes on under the hood.

But despite having spent decades on and off doing it, when I'm asked to define embedded development, I'm at a bit of a loss. Ask this question in an internet forum, and you will be surprised to find that there is no real consensus even among those who describe themselves as embedded developers.

Part of this is because the hardware target for which an embedded developer writes software ranges from a tiny inexpensive low-power eight-bit microcontrollers (which in fact may not even be independent chips themselves but cores implemented inside of another chip like an FPGA or some other surface-mounted device) with kilobytes of memory to a thirty-two-bit microprocessor with multiple cores and megabytes or even gigabytes of memory. The developer may code in assembler, C, C++, Java, or even scripting languages like Python and Perl. The underlying software platform that runs on the so-called embedded system ranges from none (remarkably, it is not uncommon to have no OS-like layer at all), to a simple task scheduler like FreeRTOS or even home grown, to a large commercial RTOS like VxWorks, to a full blown multi-user GNU/Linux or even Windows operating system.

This uncertainty about how to describe what we do for a living to our friends, children, spouses, and even colleagues is just going to get worse.

Yesterday I successfully built the entire Android "Ice Cream Sandwich" (a.k.a. ICS) software stack for my ODROID-A4 platform as part of my Conestoga project. It took awhile. I'm used to that, having built other Android releases like "Frozen Yogurt" and "Gingerbread" for the BeagleBoard as part of Contraption. While I was watching the build process scroll along in a window, I began to idly wonder just how big this software distribution was. So in another window I unloosed some of my favorite commands like find, wc, and awk.

ICS consists of 25,410 files ending in .java containing 5,325,103 lines of what is presumably Java code. It also includes 62,861 files ending in .cpp, .c, .hpp, or .h, containing 13,225,200 lines of presumed C or C++ code. This is just the Android stack. The same statistics for the Linux kernel and the U-Boot boot loader would make these numbers significantly larger.

To be fair, the ICS counts include some stuff that may not run on the Android target itself. Much of the code that does run on the target is what we would normally associate with the standard underlying C and C++ libraries on any other system. And one of the reasons not to count the Linux and U-Boot code bases is that they each contain vast amounts of code for targets completely unrelated to Android or the ODROID-A4 target. But even so, those are still big numbers, and developers working in the lower levels of Android close to bare metal may still end up indexing and searching all those unrelated files because there is no easy way to exclude them.

So just a naive census of the Android ICS code base for the ODROID-A4 yields 18,550,303 lines of code in 88,271 source files.

Surprised? Do you think those numbers are too big or too small? The answer may depend on whether or not you see Android as an embedded system. The A4 runs on a battery, connects wirelessly to a network, fits in my shirt pocket, and has tightly integrated hardware components. Any embedded developer would be completely at home perusing its circuit board. But it implements a complex user-facing graphical user interface and a bunch of applications. So desktop developers see a lot of stuff they recognize as well.

For sure these are pretty naive numbers. And I would be the first to say that lines of code is a kind of useless metric of just about anything. But if you are a developer tasked with working in this code base, then part of your job may be to reverse engineer portions of this code to understand it and to integrate with it, which means you'll be indexing, searching through, and reading this code, whether it's blank lines, comments, or the implementation of a cryptographic hash function.

What does this mean? I'm not sure. But I do believe that product and project managers who are looking at using Android, whether it's for a small embedded device or a large complex system, may need to recalibrate the mental models they carry around in their heads regarding the complexity of the tasks they are asking developers to take on. If you spend most of your time in the embedded world, or if you don't but you see shirt-pocket-sized mobile wireless devices as embedded systems, then the idea of a code base of more than eighteen million lines of code might seem surprising, not to mention a little daunting.

It's not as bad as I may make it sound. The Android APIs are well documented (albeit a moving target with each new release), and tools like the Android SDK and its Eclipse plug-in are very good.

But Android is not Minor League.

No comments: