Monday, March 07, 2011

Deconstructing the AR.drone: Part 3

Although the AR.drone is based on the Linux kernel, the way embedded systems use Linux typically differs significantly from how desktop or server systems use it due to resource constraints in memory, storage, CPU speed, and power consumption.

The AR.drone has 128 megabytes of memory, which is generous compared to other embedded systems of my acquaintance. But that is a fraction of what the typical Linux-based desktop or server would have.

The AR.drone has a read-write persistent file system implemented in NAND flash using the Unsorted Block Image File System (UBIFS) which is specific to flash devices. But the size of the flash partition in which the root file system is implemented (I'm guessing that's the Psystem partition in flash partition /dev/mtd3) is sixteen megabytes. That's smaller than any USB thumb drive you can buy these days. And I have implemented Linux-based embedded systems that had no read-write persistent file system.

And so on. Many of the constraints are due to the application. Many embedded systems are mobile devices which are handheld (so they have to be small and light) and battery powered (so they have to be extremely power efficient). But cost is also a driving factor. The number of processors manufactured and sold each year into embedded applications far far outstrips the number of processors sold into desktops and servers during the same period of time. Embedded processors, the smaller and less powerful of which are often referred to as microcontrollers but which may still run Linux, are ubiquitous. Most of the applications for embedded processors are extremely price sensitive. So given these factors and volumes, saving even fractions of a cent in the design of an embedded system can have significant economic consequences for the manufacturer.

As we continue to peer into the details of the AR.drone we will continue to see how clever its designers and engineers were. Although at US$300 it might seem to be on the high side for a toy, I find its design to be remarkable in its tradeoff between price and capability.

Now we'll look at how the AR.drone, the smallest flying Linux system of which I'm aware, gets to a shell prompt.

It is not unusual for Linux systems, embedded and otherwise, to have a multi-stage boot process such that several executable programs are run before Linux gets control of the CPU. All processors have to have some kind of rudimentary fixed mechanism to start running software. For some I've used, the CPU merely starts executing instructions at a fixed address, and it is up to the designer to have executable machine instructions, typically in read-only memory (ROM), present at that address. Other processors have sophisticated (relatively speaking) mechanisms involving reading instructions into memory over simple serial buses from some kind of persistent storage device that may be flash-based.

I actually don't know yet how the lowest level boot mechanism works on the AR.drone; although it based on an ARM-core, the drone has a proprietary processor manufactured especially for Parrot, the parrot-6. I'm guessing the parrot-6 is a System on a Chip (SoC): a processor core plus I/O controllers integrated into a single piece of silicon. The use of SoCs is very common in embedded systems.

In Part 2 of this series I mentioned that are two two flash partitions that look to be boot loaders: /dev/mtd0 which is named Pbootloader, and /dev/mtd1 which is named Pmain_boot. (I'm guessing the P stands for Partition, but it might also stand for Parrot, the AR.drone manufacturer.) What might these be? Running the strings utility against the read-only versions of these partitions might offer some clues. (I had thought I was going to have to FTP the contents of these two partitions from the AR.drone, move them to my desktop, and do a strings there; I was delighted and more than a little surprised to find the strings utility resident on the AR.drone itself. Even though /usr/bin/strings is, as you might have guessed, a link to /bin/busybox, many embedded systems omit that applet in their BusyBox configuration.)

In bootloader (after editing out a lot of cruft) we find some interesting snippets.

# strings /dev/mtd0ro
#IBU
!IBU
PLF!
return from stage1_boot
Parrotboot for target MYKONOS, built on Nov 18 2010
-> Change VDD2 reset value
volume %d : %s ok
volume %d is bad
read error on volume %d
scaning start %x end %x page shift %d eb size %d
skipping bad block %d
vid_hdr_offset %d, data_offset %d leb_size %d
image_seq is %x
unexpected image_seq %x
lnum(%d) >= used_ebs(%d)
new volume %d
can't alloc volume %d
not vtbl found
incomplete vtbl
no valid vtbl
ubi scan failure
Attempt booting on UBI volume with ID %d...
Failure (%d)
main_boot
alt_boot
i2cm0
Booting Linux...
0123456789ABCDEF
0123456789
0123456789abcdef
%08X:
%02X
%04X
%08X
uart0
uart1
uart2
uart3
sdram
dmac
nand
mpmc
i2cm1
spi0
spi1
spi2
2 x Micron MT46H64M16LF 6ns @ CL=3 BL=4 156 MHz - Parelia
Micron MT46H32M32LF 6ns @ CL=3 BL=4 156 MHz - P6dev
Micron MT46H16M16LF 6ns @ CL=3 BL=4 156 MHz - Palladium
Elpida EDD20323ABH 6ns @ CL=3 BL=4 156 MHz - RnB4
Micron MT46H64M32LF 6ns @ CL=3 BL=4 156 MHz - RnB4
-error during driver configuration
unknown error
no error
device is not responding
I/O error
write protection is on
an ECC error was detected and fixed
an incorrectable ECC error was detected
nand_flash: %s
nand_flash: By ONFI Manufacturer: %s, Model: %s
nand_flash: Support ONFI v1.0
nand_flash: Support ONFI v2.0
nand_flash: Support ONFI v2.1
nand_flash: Support ONFI v2.2
nand_flash: unknown device: manid=0x%02x, devid=0x%02x
unknown
large
small
1.8V
3.3V
nand_flash: %s %s page device, %d %cbits (x%d), %s
Fujitsu
Renesas
ST-Micro
National
Toshiba
Samsung
Hynix
Micron
Micron MT46H64M16LF 6ns @ CL=3 BL=4 156 MHz - P6idev
Micron MT48H32M16LF 6ns @ CL=3 BL=4 104 MHz - P6idev
Micron MT46H32M16LF 6ns @ CL=3 BL=4 156 MHz - FC6150 B/W
Micron MT46H16M16LF 6ns @ CL=3 BL=4 156 MHz - FC6050 B/W
Micron MT48H16M16LF 7.5ns @ CL=3 BL=4 104 MHz - FC6050
Micron MT48H8M16LF 7.5ns @ CL=3 BL=4 104 MHz - Voyager
Micron MT48H16M16LF 7.5ns @ CL=3 BL=4 104 MHz -FC6050-HW00
Micron MT48H4M16LF 8ns @ CL=3 BL=4 104 MHz - FC6000
Micron MT48H4M16LF 8ns @ CL=3 BL=4 104 MHz - FC6000-HW00
/dev/nand_ba315_boot
return from stage1_boot
Parrotboot for target MYKONOS, built on Nov 18 2010
-> Change VDD2 reset value
volume %d : %s ok
volume %d is bad
read error on volume %d
scaning start %x end %x page shift %d eb size %d
skipping bad block %d
vid_hdr_offset %d, data_offset %d leb_size %d
image_seq is %x
unexpected image_seq %x
lnum(%d) >= used_ebs(%d)
new volume %d
can't alloc volume %d
not vtbl found
incomplete vtbl
no valid vtbl
ubi scan failure
Attempt booting on UBI volume with ID %d...
Failure (%d)
main_boot
alt_boot
i2cm0
Booting Linux...
0123456789ABCDEF
0123456789
0123456789abcdef
%08X:
%02X
%04X
%08X
uart0
uart1
uart2
uart3
sdram
dmac
nand
mpmc
i2cm1
spi0
spi1
spi2
2 x Micron MT46H64M16LF 6ns @ CL=3 BL=4 156 MHz - Parelia
Micron MT46H32M32LF 6ns @ CL=3 BL=4 156 MHz - P6dev
Micron MT46H16M16LF 6ns @ CL=3 BL=4 156 MHz - Palladium
Elpida EDD20323ABH 6ns @ CL=3 BL=4 156 MHz - RnB4
Micron MT46H64M32LF 6ns @ CL=3 BL=4 156 MHz - RnB4
-error during driver configuration
unknown error
no error
device is not responding
I/O error
write protection is on
an ECC error was detected and fixed
an incorrectable ECC error was detected
nand_flash: %s
nand_flash: By ONFI Manufacturer: %s, Model: %s
nand_flash: Support ONFI v1.0
nand_flash: Support ONFI v2.0
nand_flash: Support ONFI v2.1
nand_flash: Support ONFI v2.2
nand_flash: unknown device: manid=0x%02x, devid=0x%02x
unknown
large
small
1.8V
3.3V
nand_flash: %s %s page device, %d %cbits (x%d), %s
Fujitsu
Renesas
ST-Micro
National
Toshiba
Samsung
Hynix
Micron
Micron MT46H64M16LF 6ns @ CL=3 BL=4 156 MHz - P6idev
Micron MT48H32M16LF 6ns @ CL=3 BL=4 104 MHz - P6idev
Micron MT46H32M16LF 6ns @ CL=3 BL=4 156 MHz - FC6150 B/W
Micron MT46H16M16LF 6ns @ CL=3 BL=4 156 MHz - FC6050 B/W
Micron MT48H16M16LF 7.5ns @ CL=3 BL=4 104 MHz - FC6050
Micron MT48H8M16LF 7.5ns @ CL=3 BL=4 104 MHz - Voyager
Micron MT48H16M16LF 7.5ns @ CL=3 BL=4 104 MHz -FC6050-HW00
Micron MT48H4M16LF 8ns @ CL=3 BL=4 104 MHz - FC6000
Micron MT48H4M16LF 8ns @ CL=3 BL=4 104 MHz - FC6000-HW00
/dev/nand_ba315_boot

The things that interest me here are Parrotboot, which suggests to me that this is a boot loader written especially for this custom processor (probably with its low-level boot mechanism in mind), main_boot, which is the name of our other flash partition of interest, and alt_boot, which suggests this boot loader can load a different secondary boot loader if it exists (could be a redundant backup or something used during development).

In main_boot, clues are not nearly so forthcoming, making me wonder if this partition is compressed (not unusual in multi-stage boot systems). But we do see a few snippets that pique our interest (after editing out hundreds of lines of cruft).

# strings /dev/mtd1ro
UBI#
UBI!
main_boot
alt_boot
parrotparts=nand0:256K(Pbootloader),
8M(Pmain_boot),8M(Pfactory),16M(Psystem),
98048K(Pupdate)
console=ttyPA0,115200 loglevel=4
ubi.mtd=Pfactory,2048 ubi.mtd=Psystem,2048
ubi.mtd=Pupdate,2048
root=ubi1:system rootfstype=ubifs
parrot5.low_latency=1
PLF!
-- System halted
ran out of input data
Malloc error
Out of memory
incomplete literal tree
incomplete distance tree
bad gzip magic numbers
internal error, invalid method
Input is encrypted
Multi part input
Input has invalid flags
invalid compressed format (err=1)
invalid compressed format (err=2)
out of memory
invalid compressed format (other)
crc error
length error
Uncompressing Linux...
done, booting the kernel.

We see the command line that in Part 2 we saw was passed to the Linux kernel from the boot loader as indicated in /proc/cmdline (so this is probably the final boot loader stage before the Linux kernel executes). We also see messages that in the past I have associated with the popular U-Boot boot loader. So I'm guessing this partition contains a compressed U-Boot executable image.

Notice the UBI# and UBI! strings in both boot loaders, which are byte swapped in the first one (that's interesting). I'm guessing these are magic numbers stored in the metadata by UBIFS or its lower UBI layer used to identify its own control data. UBIFS compresses its data using either the zlib (a.k.a. Deflate) or the Lempel-Ziv-Oberhumer (LZO) algorithms. The presence of the magic numbers supports the hypothesis that the main_boot partition is compressed. Why are these strings byte swapped in bootloader? I'm guessing that this partition isn't actually a UBIFS file system, but maybe it knows how to read a UBIFS file system. So the strings are actually part of the boot loader and not magic numbers in file system metadata. That would be consistent with it not being compressed itself, acting as a pre-stage to running U-Boot, and being amenable to whatever native boot mechanism is in the parrot-6 SoC.

Although I haven't necessarily shown it, in both of these flash partitions I see a lot of string sequences duplicated. That could indicate multiple redundant copies of the code in each partition. But it is more likely in my opinion that these are artifacts of system updates and I am just seeing older copies of the same boot loaders.

Eventually the Linux kernel is loaded and executed. The root=ubi1:system rootfstype=ubifs in the command line from the boot loader points the kernel to its root file system and tells it that it is a UBIFS file system, so we know the root file system in one of the flash partitions. The ubi.mtd=Psystem,2048 in the command line, which is the second (or number one counting from zero) such parameter, plus the mounted file systems we saw in Part 2, suggests that we were probably correct in our assumption that the Psystem partition is the root partition.

Once the Linux kernel is running, it has to bring the system up into single-user mode. For many embedded systems, this is the only state there is; multi-user mode may only makes sense for desktop and server systems. It will typically do this by finding the /sbin/init application and running it as the very first process, with process identifier (PID) 1. (If the kernel fails to find /sbin/init, it will also look for /etc/init, /bin/init, and even /bin/sh, this latter giving you some hope of recovering a seriously scrogged system, or, as I have done, building a really minimal Linux-based embedded system.)

If you have been paying attention, it will not surprise you to know that /sbin/init is simply a link to the Swiss Army Knife of embedded tools /bin/busybox. As it does with all its other applets, BusyBox looks to see the name by which it has been invoked, and then does the right thing. In this case, the right thing is to look for the /etc/inittab file for further instructions.

The /etc/inittab file will seem new and different to the youngsters in the crowd, since Linux distributions for mainstream desktop and server systems have evolved far more complex mechanisms for dealing with system initialization. My big quad-core Dell server running Ubuntu looks for initialization scripts in the directory /etc/init. Having the initialization scripts split into discrete files vastly simplifies the incremental installation of new features. But such a capability isn't necessary for most embedded systems. And in fact at one time, all UNIX systems took their initial marching orders from /etc/inittab, which contains a series of text lines indicating what programs should be run at what stage of the initialization process, and what programs should be restarted should they terminate. BusyBox implements a subset of the old /etc/inittab semantics.

Of particular interest, we find the following lines in the /etc/inittab on the AR.drone.

# Startup the system and run any rc scripts
::sysinit:/etc/init.d/rcS

# Put a getty on the serial port
ttyPA0::askfirst:/bin/sh

The first line directs the init process to run the program /etc/init.d/rcS, which as we shall see shortly is a shell script. (Historically, rc has stood for Run Control and S for Single-user mode.) The second line starts the shell /bin/sh (which, of course, will also turn out to be a link to /bin/busybox) on the serial terminal /dev/ttyPA0 after first asking the user (if such exists and has access to that terminal) to hit return. We know from /proc/cmdline that /dev/ttyPA0 is the system console, so we'll keep that in mind as we continue to peruse the AR.drone. (It is just sheer guess work on my part that the PA stands for Parrot Asynchronous.)

Perusing /etc/init.d/rcS we see lots of interesting stuff. Of particular interest is the fact that most of the network setup is in a second script.

/bin/hostname -F /etc/hostname
/sbin/ifconfig lo 127.0.0.1 up
/sbin/route add -net 127.0.0.0 netmask 255.0.0.0 lo
/bin/wifi_setup.sh

Here are some snippets extracted from /bin/wifi_setup.sh where we can see the AR.drone setting up its ad hoc WiFI network as well as starting its TELNET and DHCP daemons. We can also see it using arping with -D for duplicate address detection mode. I'm guessing it's verifying that it isn't stepping on any other device on the same ad hoc WiFi network. Note that the shell variable SSID contains the service set identifier of the ad hoc WiFI network, RANDOM_CHAN is set to a random WiFI channel number, BASE_ADDRESS is the IP network that the AR.drone will use, and PROBE is its initial try at an IP host number.

SSID=`grep ssid_single_player /data/config.ini | awk -F "=" '{print $2}'`

RANDOM_CHAN=`/bin/channelselector`

iwconfig ath0 mode ad-hoc
iwconfig ath0 channel $RANDOM_CHAN
iwconfig ath0 essid "$SSID"

BASE_ADRESS=192.168.1.
PROBE=1

ifconfig ath0 $BASE_ADRESS$PROBE
arping -I ath0 -q -f -D -w 2 $BASE_ADRESS$PROBE

/bin/pairing_setup.sh

iwconfig ath0 rate 54M
iwconfig ath0 rate auto

telnetd -l /bin/sh
udhcpd /tmp/udhcpd.conf

Although I didn't show it above, this script also creates the DHCP daemon configuration file in /tmp/udhcpd.conf. We can see that the DHCP daemon serves IP addresses in the range from 192.168.1.2 through 192.168.1.5 (reserving 192.168.1.1 for itself), with a subnet mask of 255.255.255.0, and a router (gateway) address pointing to itself. This all jives with what we saw in Part 1.

# cat /tmp/udhcpd.conf
start 192.168.1.2
end 192.168.1.5
interface ath0
decline_time 1
conflict_time 1
opt subnet 255.255.255.0
opt router 192.168.1.1
opt lease 1200

The /bin/wifi_setup.sh script in turn invokes /bin/pairing_setup.sh. Here are some snippets from that script. The logic in this script (not all of which is shown here) is kind of remarkable. Depending on the contents of /data/config.ini, a firewall may be established using iptables that restricts traffic to a particular user identified by a specific MAC address. Otherwise, all rules are cleared and one of the LEDs is flashed on the drone.

if [ $MAC_ADDR != $NULL_MAC ]
then
echo "Setting pairing for: $MAC_ADDR"
# Clearing all rules
iptables -P INPUT ACCEPT
iptables -F
# Allowing only owner's traffic
iptables -A INPUT -m mac --mac-source $MAC_ADDR -j ACCEPT
# allowing ICMP (ping), ftp and nfs traffic for everyone.
# Telnet is only allowed for paired user
iptables -A INPUT --protocol icmp -j ACCEPT
#iptables -A INPUT --protocol tcp --dport 23 -j ACCEPT
iptables -A INPUT --protocol tcp --dport 21 -j ACCEPT
iptables -A INPUT --protocol tcp --dport 2049 -j ACCEPT
# Blocking all incoming traffic by default
iptables -P INPUT DROP
else
echo "Clearing pairing rule"
# Switching rad LED on
gpio 63 -d ho 1

# Clearing all rules
iptables -F
# Allows incoming connections from anywhere outside
iptables -P INPUT ACCEPT

# Switching rad LED off
gpio 63 -d ho 0
fi

/data/config.ini is a parameter file containing keyword-value pairs. For example, it specifies the SSID of the ad hoc WiFi network. The shell variable NULL_MAC above is initialized to 00:00:00:00:00:00 in the script and MAC_ADDR is extracted from the owner_mac parameter in the file. Below are some snippets from the file that include information that is relevant to my interests.

[network]
ssid_single_player = ardrone_040582
ssid_multi_player = ardrone_040582
infrastructure = TRUE
secure = FALSE
passkey =
navdata_port = 5554
video_port = 5555
at_port = 5556
cmd_port = 0
owner_mac = 00:00:00:00:00:00
owner_ip_address = 0
local_ip_address = 0
broadcast_address = 0

The values of some of the keywords in the file change dynamically, for example when I connect the Parrot Free Flight iDevice app to the AR.drone. But the parameter pertinent to the firewall, owner_mac, did not change, nor as far as I can tell do any of the other networking-related parameters. This merits further investigation.

I confess at the moment I am less excited about the firewall than I am about the fact that I now know that there is a tool to manipulate General Purpose I/O (GPIO) pins on the AR.drone. I can't tell you how many times I've written a little tool to do this exact kind of thing (and have written about it here and here).

Did I try running gpio myself? Of course I did. It turns out the arguments used above turn the LED on the bottom of the body of the AR.drone from green to red and back again. Does gpio have a help menu? Of course it does.

Usage : gpio num_gpio options [output_value]
num_gpio : the gpio to configure.
options :
-h : Display this help.
-d x : configure the direction of the gpio, where x can be:
- i for input.
- lo for low output.
- ho for high output.
-r : read value on the GPIO.
output_value : 0 or 1. Required when configuring gpio as output.

By this time the AR.drone has its ad hoc WiFi network established, it has configured an IP subnet, the drone is running a DHCP server ready to respond to anyone who joins the network, and a TELNET server is running.

Remember the BASE_ADDRESS and PROBE variables in /bin/wifi_setup.sh? It might occur to you that you could trivially change the IP network and address that the AR.drone uses by editing that script and changing those variables. Could this possibly work? Yes. Yes, it does. But I can give you three good reasons not to do this.

  1. Doing so would almost certainly void the warranty on your expensive $300 toy.
  2. If you make a mistake, you are likely to brick your AR.drone by rendering unusable the only channel through which you can fix it, TELNETing though its ad hoc WiFi network.
  3. It probably won't do you any good. The iDevice apps I have used to control the AR.drone, including the one provided by Parrot, hard code the IP address of the drone to 192.168.1.1. This is also consistent with the example app source code Parrot provides on their web site. (One of the Android apps I've looked at does give you the option of using a different address.)

This is unfortunate, since for many of us the 192.168.1.0 subnet will conflict with our conventional LAN. Indeed, for most of us the 192.168.1.1 host address will conflict with our WiFi access point and router. Hard coding the drone's network and host address on the client side prevents me from integrating the drone into the LAN at the Palatial Overclock Estate (or what the media still insists on calling the Heavily Armed Overclock Compound). This is especially problematic, given the drone runs a DHCP server which would conflict with the server on my household AP and router. I don't really need my MacBook Air connecting unexpectedly and exclusively to the AR.drone when I'm trying to browse LOLCats. Yes, I could completely reconfigure the household LAN, which is a crazy-quilt mixture of CAT5, PowerLine, and WiFi technologies, but with several servers and other assets having static IP addresses, that's harder than it might appear.

Why would I want to integrate the AR.drone into the household LAN? Oh, um, ...

We noticed in Part 2 that the AR.drone runs klogd and syslogd, daemons that log status and error messages from the kernel and user applications respectively to a system log file. /var/log/messages is full of information about what the kernel is doing as it initializes itself and its device drivers but before it runs the init process.

Next up: we go back in time to briefly look at /var/log/messages.

12 comments:

Unknown said...

Integration of a drone fleet into the compound could provide mission-critical 24x7 feline surveillance.

Chip Overclock said...

I was going to ask whether you meant surveillance of felines, or surveillance by felines, but I realized it's all good either way!

Craig Ruff said...

I received a AR.drone for Christmas, and have had fun flying it around the yard. The dog, however, does not like it at all, probably due to the ultrasonic sensor and the fact that he likes to bark at birds. I haven't yet had time to hack on it, so I've read your accounts with interest!

Chip Overclock said...

Thanks for the kind words, Craig (and long time no see)! I'm a little behind on this project but hope to catch up Real Soon Now. Mrs. Overclock (a.k.a. Dr. Overclock, Medicine Woman) attended a medical conference in Maui, so of course I had to go along to carry her luggage. We had many adventures about which I could write, including my first experience with scombroid poisoning, but alas this margin is too small to contain it.

Nikhileswar reddy N said...

Hi, Is it possible to change name of the interface of drone ath0 to ath1???can u please tell me how??

Chip Overclock said...

I'm afraid this project has been mothballed for some time. But I'm guessing, based on experience with other network interfaces, it would take a kernel config and rebuild, and maybe even some changes to what I assume is the Atheros wifi chip driver code.

Oswin said...

Great analysis and very good explanations. Thanks for sharing this, it's very usefull for me!

Unknown said...

A very helpful series overall for understanding the Drone's architecture and hardware layout. For a newbie, how might one go about the virtualization of the AR Drone? Any insight would be greatly appreciated. Thanks!

Chip Overclock said...

Paul, I can already see your dreams are significantly bigger than mine. Are you talking about running a VM on the drone's processor itself, or running some kind of virtualized simulation of the drone on another platform (like a desktop)? Either way, interesting idea. My only thought is that the latter is more doable than the former, but either would be challenging. If you blog about your work I hope you'll post a URL as a comment here.

hadi said...

Hi John

Thanks for sharing your experience. Have you worked with AR.Drone 2? I have messed up the Kernel by changing libc in it. Now the bootloader stops at initializing Busybox. It gives kernel panic. I was wondering if you know how to fix this problem

Thanks

hadi said...

Hi John

Thanks for sharing your experience. I wondered if you played with AR.Drone 2. I messed up the kernel by changing libc. Now it does not boot correctly and gives kernel panic right at initializing Busybox. Do you have any suggestions?

Thanks

Chip Overclock said...

I'm so sorry to hear that. Alas, no, I haven't invested in an AR.Drone 2 (although I might the next time I have a big block of time open to reverse engineer it). I don't really have anything very useful to suggest, except that my experience with embedded systems over the past couple of decades leads me to believe that you may end up needing a JTAG or equivalent debugger. My company owns several of these for various processors, and which range in price from a couple thousand dollars to under a hundred dollars. Once you break the bootloader, you have often run out of options. Best of luck!