Chip Overclock®: Can what you don't know hurt you?

In my recent article Pulse-Width Modulation with Arduino for my Amigo project I showed a series of screen captures from a Saleae Logic to illustrate the kinds of square waves you could generate using Pulse-Width Modulation (PWM). The use of the Logic was critical to my understanding of what was going on under the hood in my software. The Logic is a logic analyzer: a tool that captures, interprets, and displays signals from digital circuits.

Logic analyzers have been a standard piece of kit for digital hardware designers for decades. They come in a broad range of prices, capabilities, and form factors, from units costing tens of thousands of dollars that support sub-microsecond measurements and require a dedicated equipment cart to wheel them around, to the latest generation of relatively inexpensive devices that connect to your laptop and fit in your shirt pocket. The Logic fits in the latter category. We'll talk more about it in a bit. I've become convinced that devices like the Logic need to become part of the everyday toolkit of software developers like me who routinely deal with real-time events and work close to bare metal.

If you've perused that article on PWM with Arduino, and if you have any experience with Arduino, you can be excused if you thought I had gone off the deep end with my lengthy and seemingly overly complex software to generate a PWM square wave. Under other circumstances, I would have thought so as well. I had a Logic sitting in one of my tool bags and I thought it would be a great idea (and less effort than the alternatives) to use it to create snapshots of the PWM square wave output from my program. My original program was only a dozen or so lines long and was dirt simple. What could possibly go wrong?

The first clue that something was seriously amiss was one of my first Logic traces taken when I was hard-coding the duty cycle, before I'd added the code to read the potentiometer using the analog/digital convertor. (You can click on any of these images to get access to larger sizes.)

There are several things not to like about this trace.

First, I had intended to generate a square wave with a 50% duty cycle, but in this trace the off phase is significantly longer than the on phase, by nearly 13%. The off part of the square wave had significantly more software latency: it was taking me 13% longer to get around to turning the GPIO pin back on. I couldn't account for that in my own code.

Second, the frequency was a lot lower than I thought it should be, given the tiny amount of code I thought I was executing and the real-time delays I had built into my code. I was using the Arduino digitalWrite function and it was taking a long time to turn the GPIO bit on and off.

It was time to take a deep dive into the open-source of the Arduino run-time system. I was reminded of the various bootloaders I've had to hack in times past: no OS, no multi-threading, everything running in the equivalent of kernel mode, software accessing hardware as directly as possible and routinely using privileged machine instructions. This means that delays can't be accounted for by the kinds of scheduling latency you see when running software under non-real-time multi-tasking operating systems like Linux. Delays mean that my software was doing something, even if that something was busy-waiting for some peripheral hardware to respond.

The first step was to see who called the loop function that is the equivalent to the main function in every Arduino program. A common design pattern in this kind of low level OS-less software is to have a work loop that is continuously executed. It's a loop because modern microprocessors don't have a halt instruction; the only mode the central processing unit knows is to continuously execute machine code, even if it's just a tight loop. Operating systems like Linux also have such a work loop, you just don't see it. It consists of the Linux kernel looking for a runnable task to execute. There is always at least one such a task, if nothing else a do-nothing task typically referred to as the idle task.

Here's the code in Arduino that calls the setup and the loop functions that are part of every Arduino application program. It is the main program in Arduino that is automatically invoked by the C++ run-time system once it has completed initialization, and it is the complete contents of the file main.cpp.

#include <arduino.h>

int main(void)

{

init();

#if defined(USBCON)

USB.attach();

#endif

setup();

for (;;) {

loop();

if (serialEventRun) serialEventRun();

}

return 0;

}

Pretty simple. It calls its own init function, it calls the application's setup function, and then it enters an infinite loop that calls the application's loop function followed by some code that checks for events on the serial port.

The function serialEventRun is declared elsewhere as a weak external reference: it's a function that may not be defined, and if it's not, its address will be set to zero (NULL) instead of causing an error when the Arduino is compiled and linked. This allows Arduino to be compiled for microcontroller chips that do not have a serial port, or in hardware speak, a Universal Asynchronous Receiver/Transmitter (UART), in which case the serialEventRun code is conditionally not included as part of the Arduino run-time system.

The ATmega328P microcontroller I'm using has a serial port, so this code was being executed. This explains why the timing of the off period in the square wave wasn't quite right: my original naive code turned the GPIO bit on, delayed for a while, turned the GPIO bit off, delayed for a while, and then the main program was doing some additional work. That's when I added the more complex mechanism that computed a variable delay.

I tackled the second problem by looking the implementation of the digitalWrite function.

void digitalWrite(uint8_t pin, uint8_t val)

{

uint8_t timer = digitalPinToTimer(pin);

uint8_t bit = digitalPinToBitMask(pin);

uint8_t port = digitalPinToPort(pin);

volatile uint8_t *out;

if (port == NOT_A_PIN) return;

// If the pin that support PWM output, we need to turn it off

// before doing a digital write.

if (timer != NOT_ON_TIMER) turnOffPWM(timer);

out = portOutputRegister(port);

uint8_t oldSREG = SREG;

cli();

if (val == LOW) {

*out &= ~bit;

} else {

*out |= bit;

}

SREG = oldSREG;

}

This function in Arduino is doing a lot of computation to derive a register address and bit mask from an pin number, and it does it every time. A pin number in Arduino is an abstraction that simplifies the use of the GPIO pins by mapping a number printed on the actual Arduino circuit board to a GPIO register address and bit mask. This mapping may not seem like much overhead, but Jean-Claude Wippler of JeeLabs has characterized the digitalWrite function as taking as long as fifty-eight times (update from Jean-Claude: twenty-nine times) more processor cycles than a direct register write. So I added the code to my own setup function to compute the register address and bit mask once, and then added code to do direct register reads and writes in my loop function. After that it was just some minor tweaks here and there to get the timing right.

Then I added the code using the Arduino analogRead function to sense the value of the potentiometer at the beginning of every square wave period, which was every one hundred iterations of my loop function. My next Logic trace looked like this.

You can see the bounce in the GPIO pin output at the beginning of every square wave. So the next step was to look at the Arduino analogRead function.

int analogRead(uint8_t pin)

{

uint8_t low, high;

#if defined(__AVR_ATmega1280__) || defined(__AVR_ATmega2560__)

if (pin >= 54) pin -= 54; // allow for channel or pin numbers

#elif defined(__AVR_ATmega32U4__)

if (pin >= 18) pin -= 18; // allow for channel or pin numbers

#else

if (pin >= 14) pin -= 14; // allow for channel or pin numbers

#endif

#if defined(__AVR_ATmega32U4__)

pin = analogPinToChannel(pin);

ADCSRB = (ADCSRB & ~(1 << MUX5)) | (((pin >> 3) & 0x01) << MUX5);

#elif defined(ADCSRB) && defined(MUX5)

// the MUX5 bit of ADCSRB selects whether we're reading from channels

// 0 to 7 (MUX5 low) or 8 to 15 (MUX5 high).

ADCSRB = (ADCSRB & ~(1 << MUX5)) | (((pin >> 3) & 0x01) << MUX5);

#endif

// set the analog reference (high two bits of ADMUX) and select the

// channel (low 4 bits). this also sets ADLAR (left-adjust result)

// to 0 (the default).

#if defined(ADMUX)

ADMUX = (analog_reference << 6) | (pin & 0x07);

#endif

// without a delay, we seem to read from the wrong channel

//delay(1);

#if defined(ADCSRA) && defined(ADCL)

// start the conversion

sbi(ADCSRA, ADSC);

// ADSC is cleared when the conversion finishes

while (bit_is_set(ADCSRA, ADSC));

// we have to read ADCL first; doing so locks both ADCL

// and ADCH until ADCH is read. reading ADCL second would

// cause the results of each conversion to be discarded,

// as ADCL and ADCH would be locked when it completed.

low = ADCL;

high = ADCH;

#else

// we dont have an ADC, return 0

low = 0;

high = 0;

#endif

// combine the two bytes

return (high << 8) | low;

}

There is again a lot of mapping from the abstract pin number to the actual hardware resources. But there isn't anything that obviously links this code to the square wave output pin, which is different from the input pin for the potentiometer. Next step was the 567 page Atmel reference manual for this family of microcontrollers. Also, the Arduino Uno circuit board schematic was worth a look.

567 pages is not long compared to other processor reference manuals. When you are working close to bare metal you quickly learn that more detail is better. You'll groan when you find the manufacturer's reference manual for the microcontroller you are using is only 100 pages long. You know it's going to be relatively useless. (Simpler devices may have perfectly serviceable data sheets of only a dozen pages.) You'll be tempted to send candy and flowers to vendors whose reference manuals have a good table of contents and index. A searchable PDF is also a very good thing indeed.

Even if you're not a hardware person, if you work close to bare metal you should get used to reading schematics. Regardless of your age and your eyesight, unless you have a printer that prints pages the size of movie posters, buy a magnifying glass and get used to using it. You will be amazed at what you learn when you start following circuit traces from a GPIO pin that is misbehaving and discover to what it is really connected. You might even earn some respect from your hardware colleagues. Maybe.

The Atmel documentation warns against making changes in digital output pins close in time to doing analog/digital conversions. Rather than hosing up my timing that I worked so hard to refine, I decided to limit the number of times I read the potentiometer to four times a second. I figured this would minimize the number of times I hosed up the square wave while still giving me a reasonable response time between turning the dial on the potentiometer and the electric motor speeding up or slowing down. That seemed to work okay.

So what's the point of all of this?

I wouldn't even have known of any of these issues had I not hooked up the Saleae Logic and looked in detail at what was happening on microsecond scales. In fact, even with my original naive software design, the electric motor sped up and slowed down appropriately when I turned the potentiometer. For most applications of Arduino, that's something we call success.

But that's not always the case, with Arduino or other real-time applications of microcontrollers and microprocessors. For many products on which I've worked, what happens at the microsecond scale matters. If it doesn't matter to me, it may matter to another piece of hardware that has a much higher frequency attention span than I do. It may be that you, like me, routinely work on real-time systems which are expected to have 99.999% uptime. Or on which lives depend. Or perhaps you just don't want to be humiliated when something you wrote goes pear-shaped in the field in a failure that's visible to hundreds of people.

That's why I think tools like logic analyzers in general, and for lower speed applications, PC and Mac based tools like the Saleae Logic in particular, need to be part of every real-time software developer's toolkit. The Saleae comes in eight and sixteen channel models at prices of around US$150 to US$300. The eight channel model samples at 24MHz and the sixteen channel model can sample at 100MHz. I have the less expensive model and it is likely to be all I will ever need for this kind of work. The Logic can also decode common serial communication and bus standards at signal levels of 5V and below such as asynchronous serial, I²C, and SPI. I used it to verify the pin-outs on the diagnostic serial port of the AR.drone.

The device has a tiny logic pod smaller than my penknife, barely big enough for a company property sticker, to which you connect a wiring harness with eight signal inputs plus ground. You can connect the wires on the harness directly to pins on a circuit board or use the detachable logic clips.

The pod is a rugged piece made of machined aluminum, which, let's admit, is kind of sexy. It all goes into a little zippered case that fits handily in my tool bag.

The pod connects via USB 2.0 to your PC or Mac on which the Logic software runs. I run Logic on both a Windows 7 netbook I use for field troubleshooting and on my Mac Mini desktop in my home office. Here's a photograph of me using the Logic with the netbook on the AR.drone.

Other similar tools provide comparable capabilities at competitive prices. There is really no excuse not to use one of these tools. Sometimes you really need to know what's going on under the hood. What you don't know might just hurt you.

2 comments:

Bob Nye said...: Hey Chip :) How do you like the AR.drone? I was looking at these to maybe do some stuff at work.

Can it navigate to GPS coordinates?

Bob; Tuesday, January 24, 2012 3:45:00 PM
Chip Overclock said...: Not natively, no integrated GPS.

I've seen someone in one of the discussion groups claim that they got this to work by connecting a GPS with serial output to the TTL (3.3v) level serial pins on the diagnostic port on the bottom of the drone. You'd have to draw power from the drone's battery too, which will shorten it's 9 or so minute flying time even more. So while I suspect it could be done, I don't think it's practical for any serious application.

Big drones with longer flying time can be had, but not cheaply. The CineStar 8 is an eight rotor drone that can carry a HD camera in a gimbal and is battery powered with about 25 minutes flying time, but cost Euro 13,000, just to give you an idea. The military and commercial models are frequently fuel-cell or even gasoline powered and require a team to maintain and fly them.

Fascinating stuff, though. Always good to hear from you, Bob!; Tuesday, January 24, 2012 3:59:00 PM

Chip Overclock^®

Tuesday, January 17, 2012

Can what you don't know hurt you?

2 comments:

Metadata

Copyright

Trademarks

Labels

Archive

Blogs

Links

Slides

Videos

Tools

Analytics

Cloud

Search This Blog