Sunday, December 21, 2014

Is the General-Purpose Processor a Myth?

In a recent article in ACM Queue, the Association of Computing Machinery's newsletter for practicing software engineers, David Chisnall argues that "There's No Such Thing as a General Purpose Processor" [12.10, 2014-11-06]. And he has some interesting stuff to say along the way.
It's therefore not enough for a processor to be Turing complete in order to be classified as general purpose; it must be able to run all programs efficiently. The existence of accelerators (including GPUs) indicates that all attempts thus far at building a general-purpose processor have failed. If they had succeeded, then they would be efficient at running the algorithms delegated to accelerators, and there would be no market for accelerators.
His argument is that modern processors implement a specific memory and processing model suited for the execution of C-like languages, which makes them unsuited for the kinds of applications for which we use various specialized hardware like graphical processing units (GPUs) and digital signal processors (DSPs). If modern CPUs were indeed general purpose, they would be able to run GPU- or DSP-style applications efficiently.

I don't agree with him -- I would say being general purpose means that modern processors can run graphical or digital signal processing applications at all, not that they are necessarily optimal for doing so -- but I get his point. Modern processors as as specialized as GPUs and DSPs in the sense that they are designed around a particular application model.
The ability to run an operating system is fundamental to the accepted definition. If you remove the ability to run an operating system from a processor that is considered general purpose, then the result is usually described as a microcontroller. Some devices that are now regarded as microcontrollers were considered general-purpose CPUs before the ability to run a multitasking, protected-mode operating system became a core requirement.
I kinda sorta like this too as the definition of a micro controller. True, I am among many who have run FreeRTOS on a eight-bit micro controller, but FreeRTOS isn't a protected-mode operating system. And, remarkably, Dmitry Grinberg actually booted Linux on an eight micro controller, although it wasn't pretty.
Parallelism in software comes in a variety of forms and granularity. The most important form for most CPUs is ILP (instruction-level parallelism). Superscalar architectures are specifically designed to take advantage of ILP. They translate the architectural instruction encodings into something more akin to a static single assignment form (ironically, the compiler spends a lot of effort translating from such a form into a finite-register encoding) so that they can identify independent instructions and dispatch them in parallel.
Any mention of single assignment gets my attention since it was the basis of my master's thesis a few decades ago. In a single assignment form, a variable can be assigned a value once and only once. Single assignment languages (like the one I implemented) sound impossible to write software in, but in fact it is simply a different coding style. For example, iteration can be done by recursion, where different values of a variable in each iteration are in fact held in different locations in the stack frame. I was surprised, years later, to discover that compiler writers depended upon translating conventional programming languages into a single assignment form for purposes of parallelization (which was exactly why I was using it in the early 1980s).
It's worth noting that, in spite of occupying four times the die area and consuming four times the power, clock-for-clock the ARM Cortex A15 (three-issue, superscalar, out-of-order) achieves only 75-100 percent more performance than the (two-issue, in-order) A7, in spite of being able (theoretically) to exploit a lot more ILP.
Chisnall is applying Amdahl's Law here: no matter how and at what level in the hardware architecture parallelism is implemented, an application has to be specifically designed to take advantage of it, and only a portion of it is likely to be able to do so. My former supercomputer colleagues would recognize his argument immediately, and would understand that algorithms designed to run efficiently on GPUs and DSPs are as different as those that run well on supercomputers, in the latter case by virtue of being embarrassingly parallel.

Chisnall's article is worth a read. He has made me ponder how we may be missing out on radical improvements in efficiency because of the blinders we have on as we continue to design around special purpose processors like Pentium and ARM.

Tuesday, November 25, 2014

One Prototype Is Worth A Dozen Meetings

When I founded my company in 1995, things were different. The legal and tax classification of limited liability company was brand new, and there was so little legal precedent concerning LLCs that the best legal and tax advice I could get was to stick with a subchapter-S corporation or S-corp. The World Wide Web existed, but barely, so none of the incorporation process was online. I had to register actual physical paperwork, my articles of incorporation, with the state of Colorado. It was the same when I applied to the Internal Revenue Service for an Employer Identification Number, which is like a Social Security Number except for companies instead of people. The process was complicated enough that I used a firm that specialized in dealing with all that bureaucracy to guide me through it.

And, most shockingly, Google didn't exist. And wouldn't exist for years. I outsourced my email handling to a local service provider to which I connected (and still do today) via arcane protocols like SMTP,  POP and IMAP. My desktop was an Apple Macintosh Classic II, and the Linux distro I ran on my Dell server was Yggdrasil.

The entire process, both the business side and the technical side, took weeks to get set up and running, culminating in the creation of the Digital Aggregates Corporation nineteen years ago this month.

Just the other day I had suggested to one of my long-time clients that it might be a good idea to create a different company under which to productize and market some of the awesome intellectual property they had created as collateral from one of their projects. The more I thought about it, the more I realized that I really had no idea the magnitude of what I was suggesting. But I was pretty sure that it would be a whole lot easier today than it was in 1995.

There was only one way to find out.

Here's the thing: I'm a very hands-on learner. For me to understand something, I have to actually do it, touch it, play with it. For sure, I want to understand the theory behind it, and build my own abstract mental model of it, but I'm not going to be able to internalize those things and make them a part of me until I have my hands on it.

And that includes code. Until I suck the code base into Eclipse, peruse it, build it, install it, run it, until I look at your schematics, and then lay my hands the actual hardware and apply power to it and see some LEDs blink, it's all too theoretical for me. By which I mean, I can't claim to have any expertise in it or knowledge about it.

I have thirteen projects hosted on GitHub -- and a few more that are not -- all of which are the result of me deciding something like "Well, I guess I'd better learn Python." Thousands of lines of working code later, I feel that I can claim with a straight face to know a little Python. For me, just reading a book is not going to cut it.

It is the same for me in business. That's why I created Digital Aggregates all those years ago. I had some vague notion that maybe it would be a way to earn a living someday, but mostly I did it just to see how it was done.

So I decided to create another company.

Disclaimer: I'm not a lawyer, nor a tax accountant.  But I know people who are, from whom I get good advice. Before setting off on a new business venture, do what I did when I first incorporated in 1995, and talk to some experts.

First, I got out a few of my favorite coffee table reference books, ones with beautiful color pictures of largely historical artifacts, and started picking out potential domain names. All the good ones are already taken, of course. I didn't want some high falutin' name like "Framistat Systems" or "Extremely Awesome Products". I wanted something short, one word. It would be nice if it were pronounceable, but it wasn't a deal breaker. And in order to have some business credibility, it had to be in the .com top level domain. I made a list.

Next, I went to my domain name registrar, Network Solutions, and starting searching. To no one's surprise, the very first one I tried, gladius.com (gladius being the Latin name for the short sword used to such good effect by the Roman soldiers of ancient times), was taken. By LucasFilm. Bet there's a story there. I only went through a handful before I found one on my list that was available. I immediately registered cranequin.com, plus some additional services, for an annual cost of around US$45.00.

Crossbow winder (cranequin) by Ulrich Wildisen, Switzerland, c. 1550 - Higgins Armory Museum
(credit: Wikimedia)
Cranequin, if you mangle French like I do, is pronounced something like "krwa-ne-ke(n)" where that final nasal "n" is not even really said aloud. A cranequin is a mechanical winding apparatus used to draw back the bow on the kinds of crossbows you might use to take down large game. Or a man in armor.

Cranequin (Rack & Pinion)
(credit: Wikipedia)
The limited liability company now has a long history of success as the preferred organization for small businesses. And for a long time now, the State of Colorado allows you to file pretty much all your business paperwork online, something I have to do once a year or so, so I knew exactly where to go. It has become obvious to me over the years that both the State of Colorado and the U.S. government really want you to start a business, and they make it remarkably easy to do so. I established that Cranequin LLC was available as a business name in Colorado, so I submitted the electronic form to create it. After just a few minutes and US$50.00, I had my Colorado business identification number. In the eyes of the law,  Cranequin LLC is a wholly owned subsidiary of Digital Aggregates.

Then I went to the Internal Revenue Service web site and with some judicious searching figured out that I probably didn't really need a separate EIN, but it would probably make my tax accountant happy. So with the information from the State of Colorado web site in hand, I registered Cranequin LLC with the IRS and had a federal tax identification number for it. No charge.

Next stop was to Google to set up all the technical infrastructure like electronic mail in their cloud, for the cost of US$5.00 per user per month. There's an annual plan too, which saves you US$10.00 per user per year, but I decided to stick with the pay as you go plan. I was already familiar with the Google's Apps for Work from using the service at one of my clients. It seemed like the obvious choice for a tiny startup (and maybe now for large organizations as well). By far the hardest part of this step was having to laboriously edit the MX DNS records via the Network Solutions website to point electronic mail addressed to cranequin.com to Google, and then learning patience because it takes a while for everything to propagate across the interwebs.

I had already purchased web forwarding from Network Solutions. I hacked at the Digital Aggregates web site to create a really basic home page, nothing more really than a place holder. Then it was back to the Network Solutions website to set up the web forwarding, and www.cranequin.com was born.

Cranequin

All told it took about six hours total, spread across three days. I registered the domain name on Sunday, did the stuff with the State of Colorado, the Internal Revenue Service, and Google on Monday, and the home page and web forwarding on Tuesday. Everything was done online over a web interface. No paper, no signatures.

The up front costs were US$50.00, and the annual costs look to be about US$160.00 for domain services and (so far) two email users. If you do your own tax returns, that's probably it. I piggy backed Cranequin LLC on the same mailbox I rent for a mailing and shipping address for Digital Aggregates Corporation.

Will I keep my little prototype limited liability company? For a while at least. After all, when I created Digital Aggregates Corporation as an experiment, I really had no idea that years later it would become the principle mechanism though which I made a very good living. Maybe I'll have similar success with Cranequin LLC.

Saturday, November 08, 2014

Unintended Consequences of the Information Economy

William Lynn, former Deputy Secretary of Defense in the Obama administration, writes in the journal Foreign Affairs on how the transition from a manufacturing economy to an information economy has affected the U.S. Department of Defense in "The End of the Military-Industrial Complex".
For more than a decade, U.S. defense companies have been lagging further and further behind large commercial companies in technology investment. Although the Pentagon historically exported many technologies to the commercial sector, it is now a net importer. Indeed, next-generation commercial technology has leapt far ahead of what the defense industry can produce in areas spanning 3-D printing, cloud computing, cybersecurity, nanotechnology, robotics, and more. In addition, commercial information technology dominates national security today as much as it does the private sector. Soldiers now use smartphones to gather real-time surveillance from drones and send messages to fellow soldiers. 
Keeping up with commercial innovations will be difficult, if not impossible. The combined R & D budgets of five of the largest U.S. defense contractors (about $4 billion, according to the research firm Capital Alpha Partners) amount to less than half of what companies such as Microsoft or Toyota spend on R & D in a single year. Taken together, these five U.S. defense titans do not even rank among the top 20 individual industrial investors worldwide. Instead of funding R & D, defense companies have been returning the overwhelming majority of their available cash to shareholders in the form of dividends and stock buybacks. As a result, from 2000 to 2012, company-funded R & D spending at the top U.S. defense firms dropped from 3.5 percent to roughly two percent of sales, according to Capital Alpha Partners. The leading commercial companies, by contrast, invest an average of eight percent of their revenue in R & D.
Lynn opens with the example of Google's purchase of Boston Dynamics, the robotics firm that designed, among other devices, the BigDog, the four-legged load carrying robot. BigDog was originally funded by the U. S. DoD. Google announced that while they would honor Boston Dynamics existing military commitments, they would not be seeking further work from the DoD. Google basically reached into their deep pockets and pulled the advanced robotic technology rug right out from under the Defense Department.

Lynn identifies several trends that may be ending the Military-Industrial Complex as we have come to know it. High tech companies are reticent to reveal what may be valuable intellectual property to the U. S. government. They don't see a reason to have to deal with the vast government procurement and contracting bureaucracy when more money can be made more easily in the commercial space. Commercial companies increasingly exploit globalization, manufacturing goods or building research facilities overseas where it makes economic sense, something the U.S. military is understandably reluctant to do for both national security and political reasons. Lynn talks about how the Defense Department is going to have to come to terms with these trends unless it wants to lose its technological advantage.

What Lynn doesn’t talk about (but probably knows): in the 21st century information economy, the key component to growth isn’t enormous capital investment in manufacturing capacity, something the military establishment used to good effect during World War II, but instead enormous people investment in innovation capacity.

Just yesterday National Public Radio ran a story on this very topic: "Future U. S. Manufacturing Jobs Will Require More Brain Than Brawn". Planet Money's Adam Davidson remarks on how this is affecting the world of work.
If you want to succeed for the coming decades, you don't just need to be trained and then a few years later retrained. You need a continuous improvement in your education. The main skill you need is the skill to learn more skills. The one certainty we have is manufacturing is going to look more and more like computer programming and engineering. It's going to involve a lot more brain work and a lot less brawn work. And that means probably a smaller number of people can benefit, but those who can benefit will probably benefit quite a bit.
The DoD can’t just dial up more innovation capacity by throwing money at the problem, like they did in WWII. Nor, in a free country, can the U. S. government just mandate for whom companies choose to work. Innovation capacity requires not only brilliant engineers, who are hard enough to come by, and who cannot be easily identified in the job market, but also a willingness to accept a lot of risk: to try and perhaps to fail, over and over. To old school 20th century managers, this looks a lot like waste, but in fact it’s a necessary part of the innovation process. The economics of conflict is changing just like the economics of everything else is changing. It’s as if one smart guy with a laptop, some open source software stacks, and a 3-D printer, can now manufacture hydrogen bombs.

The DoD has to start thinking more about leveraging not just globalization (like, as Lynn suggests, by buying German-made artillery), but also consumer technologies where there are enormous economies of scale (so they’re relatively cheap compared to specialized albeit low volume goods), not to mention more profitable than the DoD could make it for the contractor. The days of the DoD calling the shots in high-tech are over. Global market forces are going to make the decisions about who makes what and for whom.

My career, so far spanning four decades, has been distributed among academia and big science, defense contracting, and commercial high-tech product development. While the transition to the information economy has been very very good to me, I've been thinking about Lynn's article a lot lately, and what it means for my life, my colleagues, my clients, and my country.

Monday, September 29, 2014

What You Don't Know Can Hurt You

Below is a little snippet of C code from [Krebbers 2014]. Peruse it and see if you can predict what two values it will print. It's only a handful of lines long. Go ahead, take your time. I'll wait.

#include <stdio.h>
void main() {
    int x;
    int y;
    y = (x = 3) + (x = 4); 
    printf("%d %d\n", x, y); 
}

So let's compile it on my build server that's sitting a few feet away from me. It's a Dell x86_64 system with four 2.4GHz Intel cores running Ubuntu 14.04 with the 3.13 Linux kernel and the GNU 4.8.2 C compiler. It's old but still mighty.

coverclock@ubuntu:~/src/misc$ gcc -o foo foo.c
coverclock@ubuntu:~/src/misc$ 

Good; no warnings, no errors.

coverclock@ubuntu:~/src/misc$ ./foo
4 8
coverclock@ubuntu:~/src/misc$ 

Huh.

This code isn't multi-threaded. It's barely single threaded. In fact, the code snippet is so simple, it hardly qualifies as anything beyond the classic "Hello World!" program.

Here's the thing: you may have gotten completely different results, if you used a different compiler. Or a different version of the same compiler. Or maybe even different compiler options for optimization or debugging levels. As Mister Krebbers points out in [Krebbers 2014]:
By considering all possible execution orders, one would naively expect this program to print 4 7 or 3 7, depending on whether the assignment x = 3 or x = 4 is executed first. However, the sequence point restriction does not allow an object to be modified more than once (or being read after being modified) between two sequence points [ISO C, 6.5 p. 2]. A sequence point occurs for example at the end ; of a full expression, before a function call, and after the first operand of the conditional ? : operator [ISO C, Annex C]. Hence, both execution orders lead to a sequence point violation, and are thus illegal. As a result, the execution of this program exhibits undefined behavior, meaning it may do literally anything.
Okay, so maybe not a huge surprise to folks who have memorized the ISO C standard. Or who are tasked with debugging problematic code by occasionally resorting to looking at the assembler code. Using a symbolic JTAG debugger that monitors the program at the hardware level, I've seen the program counter single step backwards in a sequential piece of C code, as the debugger traced the execution path the processor took through the optimized machine code and then tried to correlate it to the original source.

This is why you don't write tricky C code, playing games like trying to smash as much stuff into a single statement as you can. Because it can belie any kind of rational analysis. Because it becomes a debugging nightmare for the developer tasked with current engineering who comes after you. Because its behavior may change with your next compiler update. Or when it's ported to a project using a different compiler suite altogether.

Because it can bite you in the ass.

References

R. Krebbers, "An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C", 41st ACM SIGPLAN -SIGACT Symposium on Programming Languages, January 2014

International Organization for Standardization, ISO/IEC 9899-2011: Programming Languages - C, ISO Working Group 14, 2012

Lambda The Ultimate, "An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C", September 2014

Saturday, September 20, 2014

The Very Big and the Very Small

In
K. Asanovic at al., The Landscape of Parallel Computing Research: A View from Berkeley, EECS Department, U. C. Berkeley, UCB/EECS 2006-183, December 2006
(a paper I've cited here before) its authors, which include David Patterson (as in Patterson and Hennesy), remark
Note that there is a tension between embedded and high performance computing, which surfaced in many of our discussions. We argue that these two ends of the computing spectrum have more in common looking forward than they did in the past. [Page 4]
That's been my experience too, although perhaps for different reasons than the authors cite. I’ve made an excellent living flipping back and forth between the high performance and embedded domains. It turns out the skill sets are mostly the same. In particular, developers in both domains are constantly concerned about very low level details in the realm where software runs close to bare metal, and are always dealing with issues of real-time, asynchronicity, parallelism, and concurrency. These are relatively rare skills that are hard to come by for both the employer and the employee.

I was reminded of this as my tiny one-man company, Digital Aggregates Corporation, buys its fourth Android tablet to use as a development system. These tablets contain powerful multi-core ARM-based processors as well as other embedded microcontrollers and devices. And increasingly I am seeing the embedded and mobile device domains adopt technologies originally developed for large-scale systems, like Security Enhanced Linux (SELinux) and OS-level containerization like Linux Containers (LXC).

I’ve seen large development organizations axe their firmware developers as the company decided to get out of the hardware business to focus on large multi-core server-side software applications. What a remarkable lack of insight into the nature of the technologies on which their businesses depend.

Thursday, September 11, 2014

I've C++11ed and I can't get up!

(Updated 2014-09-14)

C++11 is the latest iteration of the standard for the C++ programming language. This is the 2011 version of the standard that was known as C++0x in its draft form. (C++14 is forthcoming.) There were some new features of C++11 that I thought I’d play around with since I have a little bit of time between gigs. I'm a big believer in using C++ for embedded and even real-time applications whenever possible. But it's not a slam dunk. The language is complex, and growing more complex with every standards iteration.

Using C++ effectively has many benefits, even in the embedded/real-time domain. But it can place a burden on the development team; I have found it relatively easy to write C++ code that is nearly incomprehensible to anyone except the original author. Try debugging a complex problem in code that you did not write and that uses the Standard Template Library or the Boost Library to see what I mean.

My little test program that I've been futzing around with can be found here

http://www.diag.com/ftp/main.cpp

which is useful since Blogger seems to enjoy hosing up the angle brackets in my examples below that use templates.

I like the decltype but I wish they had used typeof to be consistent with sizeof. I cheated.

#define typeof decltype

    long long int num1;
    typedef typeof(num1) MyNumType;
    MyNumType num2;

    printf("sizeof(num1)=%zu sizeof(num2)=%zu\n", sizeof(num1), sizeof(num2));

I really like the ability for one constructor to delegate to another constructor (something Java has always had). I also like the instance variable initialization (ditto).

    class Thing {
    private:
        int x;
        int y = 2;
    public:
        Thing(int xx) : x(xx) {}
        Thing() : Thing(0) {}
        operator int() { return x * y; }
    };

    Thing thing1, thing2(1);

    printf("thing1=%d thing2=%d\n", (int)thing1, (int)thing2);

An explicit nullptr is nice although a 0 still works.

    void * null1 = nullptr;
    void * null2 = 0;

    printf("null1=%p null2=%p equal=%d\n", null1, null2, null1 == null2);

The new alignas and alignof operators solve a problem every embedded developer and systems programmer has run into and has had to resort to proprietary, compiler-specific directives to solve.

    struct Framistat {
        char a;
        alignas(int) char b;
        char c;
    };
    printf("alignof(int)=%zu sizeof(Framistat)=%zu alignof(Framistat)=%zu\n", alignof(int), sizeof(Framistat), alignof(Framistat));

    Framistat fram1[2];

    printf("Framstat.a=%zd[%zu],b=%zd[%zu],c=%zd[%zu]\n"
        , &fram1[1].a - &fram1[1].a, sizeof(fram1[1].a)
        , &fram1[1].b - &fram1[1].a, sizeof(fram1[1].b)
        , &fram1[1].c - &fram1[1].a, sizeof(fram1[1].c)

    );

I like the auto keyword (which has been repurposed from it’s original definition). You can declare a variable to be a type that is inferred from its context.

    auto foo1 = 0;
    auto bar1 = 'a';


    printf("sizeof(foo1)=%zu sizeof(bar1)=%zu\n", sizeof(foo1), sizeof(bar1));

You can use {} for initialization in many contexts, pretty much anywhere you can initialize a variable. (Yes, the missing = below is correct.)

    int foo3 { 0 };
    char bar3 { 'a' };

    printf("sizeof(foo3)=%zu sizeof(bar3)=%zu\n", sizeof(foo3), sizeof(bar3));

Here’s where my head explodes.

    auto foo2 { 0 };
    auto bar2 { 'a' };

    printf("sizeof(foo2)=%zu sizeof(bar2)=%zu\n", sizeof(foo2), sizeof(bar2)); // WTF?

The sizeof(foo2) is 16. 16? 16? What type is foo2 inferred to be? I haven’t figured that one out yet.

I like the extended for statement where it can automatically iterate over a container or an initialization list. The statements

    enum class Stuff : uint8_t {
        THIS,
        THAT,
        OTHER,
    };

    for (const auto ii : { 1, 2, 4, 8, 16, 32 }) {
        printf("ii=%d\n", ii);

    }

    printf("sizeof(Stuff)=%zu\n"sizeof(Stuff));

    for (const Stuff ss : { Stuff::THIS, Stuff::THAT, Stuff::OTHER }) {
        printf("ss=%d\n", ss);
    }

    std::list<int> mylist = { 1, 2, 3, 5, 7, 11, 13, 17 };

    for (const auto ll : mylist) {
        printf("ll=%d\n", ll);
    }

do exactly what you would expect. Also, notice I can now set the base integer type of an enumeration, something embedded developers have needed forever. And I can use a conventional initialization list to initialize the STL list container. But if there's a way to iterate across all of the values in an enumeration, I haven't found it.

I’m kind of amazed that I figured out the lambda expression stuff so easily (although I have a background in functional languages going all the way back to graduate school), and even more amazed that it worked flawlessly, using GNU g++ 4.8. Lambda expressions are a way to, in effect, insert a portion of control of the calling function into a called function. This is much more powerful than just function pointers or function objects, since the inserted lambda can refer to local variables inside the calling function when it is being executed by the called function.

const char * finder(std::list<std::pair
<int, std::string>> & list, const std::function <bool (std::pair<int, std::string>)>& selector){
const char * result = nullptr;

for (auto ll: list) {
if (selector(ll)) {
result = ll.second.c_str();
break;
}
}

return result;

}

    std::list<std:pair<int, std::string>> list;

    list.push_back(std::pair<int, std::string>(0, std::string("zero")));
    list.push_back(std::pair<int, std::string>(1, std::string("one")));
    list.push_back(std::pair<int, std::string>(2, std::string("two")));
    list.push_back(std::pair<int, std::string>(3, std::string("three")));

    for (auto ll : list) {
        printf("ll[%d]=\"%s\"\n", ll.first, ll.second.c_str());
    }

    int selection;
    selection = 0;
    printf("list[%d]=\"%s\"\n", selection, finder(list, [&selection] (std::pair<int, std::string> entry) -> bool { return entry.first == selection; }));
    selection = 1;
    printf("list[%d]=\"%s\"\n", selection, finder(list, [&selection] (std::pair<int, std::string> entry) -> bool { return entry.first == selection; }));
    selection = 2;
    printf("list[%d]=\"%s\"\n", selection, finder(list, [&selection] (std::pair<int, std::string> entry) -> bool { return entry.first == selection; }));
    selection = 3;
    printf("list[%d]=\"%s\"\n", selection, finder(list, [&selection] (std::pair<int, std::string> entry) -> bool { return entry.first == selection; }));
    selection = 4;

    printf("list[%d]=\"%s\"\n", selection, finder(list, [&selection] (std::pair<int, std::string> entry) -> bool { return entry.first == selection; }));

Lambda expressions appeal to me from a computer science viewpoint (there's that graduate school thing again), but I do wonder whether they actually provide anything more than syntactic sugar over alternatives like function objects whose type inherits from a predefined interface class. Lambdas  remind me of call-by-name and call-by-need argument evaluation strategies, both forms of lazy evaluation.

Where C is a portable structured assembler language, C++ is a big, complicated, high-level programming language that can be used for applications programming or for systems programming. It has a lot more knobs to turn than C, and some of those knobs are best left alone unless you really know what you are doing. In my opinion it is much easier to write poor and/or incomprehensible code in C++ than it is in C. And this is coming from someone who has written hundreds of thousands of lines of production C and C++ code for products that have shipped, and who was mentored by colleagues at Bell Labs, which had a long history of using C++ in embedded and real-time applications. One of my old office mates at the Labs had worked directly with Bjarne Stroustrup; I sucked as much knowledge from his brain as I could.

C++, and especially C++11, is not for the faint hearted. But C++ is an immensely powerful language that can actually produce code that has a smaller resource footprint than the equivalent code in C... if such code could be written at all. C++ is worth considering even if you end up using a limited subset of it; although having said that, I find even widely used subsets like MISRA C++ too restrictive.

Monday, September 08, 2014

Yellowstone

When I decided that it would be fun to ride my BMW R1100RT motorcycle from Denver Colorado to Cheyenne Wyoming to get a personalized tour of the new NCAR Wyoming Supercomputer Center (NWSC), it was 90°F, dry, and sunny. When the day came to actually do the ride, it was 48°F, raining, and dismal. At least I got to test my cold and wet weather riding gear.

The NWSC was built to house the latest supercomputers dedicated to climate research that are operated by the National Center for Atmospheric Research (NCAR), a national laboratory based in Boulder Colorado that is sponsored by the U.S. National Science Foundation (NSF). The Boulder Mesa Laboratory, where I worked for several years, still houses its own supercomputer center. But both space and electricity in Boulder is precious. So when the time came for NCAR to expand its supercomputing resources, a new facility was constructed about two hours north of Boulder and just a few minutes west of Cheyenne Wyoming. My old friend and colleague Marla Meehl was good enough to talk Assistant Site Manager Jim Vandyke into giving me a tour of the new facility. It's nice to have friends in high places (the NCAR Mesa Lab being at an altitude of 6109 feet above sea level).

The NWSC houses Yellowstone, an IBM iDataPlex compute cluster and the latest in a lengthy series of computers managed by NCAR for the use of climate scientists. NCAR has a long history of providing supercomputers for climate science, going all the way back to a CDC 3600 in 1963, and including the first CRAY supercomputer outside of Cray Research, the CRAY-1A serial number 3.

Yellowstone represents a long evolution from those early machines.  It is an IBM iDataPlex system consisting of (currently) 72,576 2.6GHz processor cores. Each Intel Xeon chip has eight cores, each pizza box-sized blade server compute node has two chips, each column has at most (by my count) thirty-six pizza boxes, and each cabinet has at most two columns. There are one hundred cabinets, although not all cabinets are compute nodes. Each compute node uses Infiniband in a fat-tree topology (like the old Connection Machines, which NCAR also used at one time) as an interconnect fabric, Fibre Channel to reach the disk and tape storage subsystem, and ten gigabit Ethernet for more mundane purposes. Yellowstone has an aggregate memory capacity of more than 144 terabytes, eleven petabytes of disk space, and (my estimate) at least a hundred petabytes of tape storage organized into two StorageTek robotic tape libraries.

It all runs on Redhat Linux. Major software subsystems include the IBM General Parallel File System (GPFS), the High Performance Storage System (HPSS), and IBM DB2.

The NWSC computer room runs, for the most part, dark. There is a full time 24x7 staff at the NWSC of perhaps twenty people, although that includes those who man NCAR's Network Operation Center (NOC). This is cloud computing optimized for computational science and climate research. Not only Yellowstone's large user community of climate scientists and climate model developers, but its system administrators, all access the system remotely across the Internet.

This is remarkable to an old (old) hand like me who worked for years at NCAR's Mesa Lab and was accustomed to its computer room being a busy noisy place full of operators, administrators, programmers, and even managers, running to and fro. But the NWSC facility has none of that. It is clean, neat, orderly, even quiet (relatively speaking), and mostly uninhabited. This is the future of computing (until, you know, it changes).

The infrastructure around Yellowstone was, for me, the real star of the tour. The NWSC was purpose built to house massive compute clusters like Yellowstone (which currently takes up a small portion of the new facility; lots of room for expansion).

Below are a series of photographs that I was graciously allowed to take with my iPhone 5 during my tour. I apologize for the photo quality. Better photographs can be found on the NWSC web site. All of these photographs, along with the comments, can be found on Flickr. Some of the photographs were truncated by Blogger; you can click on them to see the original. For sure, any errors or omissions are strictly mine.

* * *

Compute Node Cabinet (Front Side)

Yellowstone Cabinet - Front Side - Open

I told Jim Vandyke, assistant site manager, to "point to something". I count thirty-six pizza box "compute nodes" in each column, two columns, each node with dual Intel Xeon chips, each chip with eight 2.6 GHz execution cores, in this particular cabinet. There are also cabinets of storage nodes crammed with disk drives, visualization/graphics nodes with GPUs, and even login nodes where users ponder their work and submit jobs.

Compute Node Cabinet (Back Side)

Yellowstone Cabinet - Back Side - Open

Each pair of pizza boxes looks to have four fans. But what's interesting is the door to the cabinet on the right: that's a big water cooled radiator. You can see the yellow water lines coming up from under the floor at the lower right.

Radiator

Yellowstone Cabinet - Back Side - Closed

This is what the radiator door looks like closed. Again, note the yellow water lines at the bottom.

Interconnects

Yellowstone Cabinet - Front Side - Interconnects

Classic separation of control and data: the nodes use an Infiniband network in a Connection Machine-like fat-tree topology for the node interconnect, Fibre Channel to access the storage farm, and ten gigabit Ethernet for more mundane purposes.

Sticky Mat

Sticky Mats

You walk over a sticky mat when you enter any portion of the computer room. The tape libraries are in an even more clean room environment (which is why I didn't get to see them); tape densities (five terabytes uncompressed) are so high that a speck of dust poses a hazard.

Bug Zapper

Bug Zapper

Here's a detail you wouldn't expect: I saw several bug zappers in the infrastructure spaces. And they were zapping every few seconds. As they built the building out in an undeveloped area west of Cheyenne, where land and energy is relatively cheap compared to NCAR's Boulder supercomputer center location, all sorts of critters set up housekeeping.

Cooling Tower

Cooling Tower

This is looking out from the loading dock. There are a number of heat exchanges in the computer room cooling system. Ultimately, the final stages goes out to a cooling tower, but not before it is used to heat the LEEDS Gold-certified building.

Loading Dock

Loading Dock

The loading dock, the hallway outside (big enough to assemble and stage equipment being installed), and the computer room are all at one uniform level, making it easy to roll in equipment right off the semi-trailer. I dimly recall being told that Yellowstone took 26 trailers to deliver. You can see a freight elevator on the right to the lower infrastructure space. The gray square on the floor on the lower left is a built-in scale, you can verify that you are not going to exceed the computer room floor's load capacity.

Heat Exchanger

Heat Exchanger

There are a number of heat exchanges in the cooling system. This is the first one, for water from the computer room radiators. I had visions of the whole thing glowing red, but the volume of water used in the closed system, and the specific heat index of water, is such that the temperature going into this heat exchanges is only a few degrees hotter than that of the water leaving it. It wouldn't be warm enough for bath water.

Fan Wall

Fan Wall

This is a wall o' fans inside a room that is part of the cooling system.These fans pull air from space above the computer room through a vertical plenum and down to this space below the computer room where it is cooled through what is effectively gigantic swamp coolers. All of the air conditioning in the building is evaporative. Each fan has a set of louvers on the front that close if the fan stops running to prevent pressure loss.

Power

Power and Power Conditioning

This is right below a row of Yellowstone compute node racks in the computer room above. If you lifted a floor tile in the computer room, you would be looking twelve feet down into this area.

Filter Ceiling

Ceiling of Air Filters

If you want to run a supercomputer, everything has to be super: networking, interconnect, storage, power, HVAC. This is part of the air filtration system, sitting right below the computer room floor into which the clean cool air flows.

Vertical Air Plenum

Air Plenum

This photograph was taken from a window in the visitors area, looking into a vertical air plenum big enough to drive an automobile through. The wall of fans is directly below. The computer room is visible though a second set of windows running along the left. Through them one of the Yellowstone cabinets is just visible. Air moves from the enormous space (another story of the building, effectively) above the computer room, down through this vertical plenum, through the fan wall, through the cooling system, up into the ceiling of filters and through the computer room floor. I didn't get a photograph of it, but the enormous disk farm generates so much heat that its collection of cabinets are glassed off in the computer room with dedicated air ducts above and below.

Visitor Center

Visitors Center

This is where you walk in and explain to the security folks who you are and why you are here. Before I got three words out of my mouth the guard was handing me my name badge. It's nice to be expected. There's a quite nice interactive lobby display. The beginning of the actual computer facility is visible just to the right.

Warehouse of Compute Cycles

Computer Facility

Typical: simple and unassuming, looks like a warehouse. You'd never know that it contains what the TOP500 lists as the 29th most powerful computer in the world, according to industry standard benchmarks.

* * *

A big Thank You once again to Jim Vandyke, NWSC assistant site manager, for taking time out of his busy schedule for the tour (Jim is also the only network engineer on site), and a Hat Tip to Marla Meehl, head of NCAR's networking and telecommunications section, for setting it all up for me.

Thursday, September 04, 2014

War Stories

Nearly forty years ago I was taking a required senior/graduate level computer science course for which we had to write in assembler a multitasking system with device drivers and such for a PDP-11/05. I would go into the lab first thing in the morning before work and my software would work flawlessly. I could go into the lab in the evening after work and could not get my DMA disk device driver to work at all. This went on for days. I was nearly in tears, pulling my hair out.

I had to demonstrate my software to the professor to pass the course. So I signed up for a morning slot in his schedule. My software worked. I passed.

After the term was over I had reason to go back into that lab during the break and I ran into the hardware support guys taking the system apart.

"What's the deal?"

"We think there's a loose solder joint or something in the disk controller. It quits working when it gets warm."

I smiled and nodded and went on my way.

(I would go on to teach this class as a graduate student, the original professor would be my thesis advisor, and what I learned in that class formed the basis for my entire career since then. It would also form the basis for Mrs. Overclock's Rule: "If Mr. Overclock spends too much time debugging his software, he should start looking at the hardware.")

* * *

Decades ago I ran a systems administration and lab support group at a state university. It was the end of the academic term and I was deleting the course accounts to clean up the disk on a VAX/750 running Berkeley Unix 4.2 in one of the labs I was responsible for. This is something my student assistants normally did, but I thought I would get started on it.

The clean up actually took a long time to execute, so I was going to run it as a background process so I could do other stuff on the system console as it ran. I logged in and executed the following commands.

su
cd /
rm -rf home/cs123 &

I noticed that I didn't get a shell prompt back as I expected to. I waited for a moment or two more, began to get concerned, then started looking more closely at exactly what I had typed.

Have you ever noticed that the * character and the & character are right next to each other on the QWERTY keyboard?

I tried to cancel the command but it was too late. I had just started a process that would delete the entire root file system — including the operating system — from the disk.

One of my student assistants walked by and noticed me staring at the console. She asked "How is it going?"

I sighed and said "Could you please go fetch me the backup tapes?"

(I would go on to automate the process of creating and deleting student directories with shell scripts so that this would be unlikely to ever occur again.)

* * *

Back in my Bell Labs days I was in a lab struggling to troubleshoot some complex real-time traffic shaping firmware I had written for an ATM network interface card that had an OC-3 fiber optic physical interface. Using fiber optics meant the test equipment was all horrendously expensive.

I was working late one night — and truth be told a little peeved at myself for taking this long to debug my code — when it suddenly dawned on me that between the ATM broadband analyzer, the ATM protocol analyzer, the multi-cabinet telecom equipment under test, the network traffic generators, and all the fiber optic cable I had strewn all over the place, I was probably using a million dollars worth of equipment, just to debug my code. It was a major insight: I could never had done that kind of work in a smaller organization.

With all the emphasis these days on cheap computers and free open source software (which much of my current work certainly takes advantage of), that's something I think is often unappreciated: there are some problems you just can't tackle without a million dollars worth of equipment.

* * *

A long time client asked me to come in for an afternoon to one of their labs to help debug some cellular telecom equipment that had been returned from the field and for which I was one of the principal platform developers. We sat at the lab bench watching log messages scroll by on a laptop connected to the unit while a technician got the unit to go into its failure mode.

"Okay", I began, "this is likely to be a hardware problem. There is a failure with the connection between the ARM processor and the PowerPC processor. It sure looks like an intermittent solder joint failure."

"Oh, no", said the technician, "we think this is a software problem. We were thinking you could..."

As he spoke I slammed my hand against the side of the cabinet and the problem went away.

"... oh... Okay, I'll mark that one down as a hardware problem."

Of course, I had no idea what was going to happen when I hit the side of the cabinet. I was just doing that as a diagnostic step.

But it did make me look like a fraking genius.

Wednesday, September 03, 2014

Forward Looking Infrared

I'm futzing around with a FLIR Systems E8, a forward looking infrared (FLIR) camera. My goal is to use it to, among other things, characterize heating in semiconductor components in embedded systems. Or anything else I can get my clients to pay for.

Here's an Nvidia Jetson TK1 evaluation board. The round hot spot in the middle is the integrated cooling fan on top of the Tegra SoC which has four ARM Cortex-A15 cores, a fifth low-power Cortex-A15 core, and 192 GPUs. The fact that you can buy this much horsepower for only USD$192 is amazing to me. The Jetson is the next in a long (long) series of ARM-based evaluation boards that I've used to do reference ports of my company's software libraries whose code finds its way into my clients' products.

Nvidia Jetson TK1 with Tegra 124 SoC

I'm an Apple user principally, and I do most of my development work on Linux or Android. But I have to keep a Windows laptop around for one technical reason or another. (I also run Windows in a VM on one of my Macs.) This is my Lenovo Thinkpad T430s that I use for field troubleshooting. What's interesting is you can really tell where the exhaust vent for the fan is by that heat signature on the left hand side. Also note the power brick on the right.

Lenovo ThinkPad T430s

This stuff is tricky. This is my ghostly reflection in a Pella double-paned low-emissivity sliding glass door at the Palatial Overclock Estate. There is clearly a bit of an art at discriminating direct IR from reflected IR. I ordered a roll of emissivity calibration tape that is made to be used for just this purpose.

My IR Reflection In Patio Door

FLIRcat!

Leonidas

I chose the FLIR Systems E8 for a variety of reasons. My tiny company could afford it, although it was still an expensive piece of kit, even for a guy that's used to buying a new server, laptop, or evaluation board at the drop of a hat. I think the E-serie's MSX imaging technology that incorporates both an IR camera and a conventional camera makes it a lot easier to tell just what the heck you're really looking at, particularly when you are reviewing images taken perhaps days earlier. Ditto with the 320 x 240 pixel resolution, which I expect to come in handy when peering at components on a printed circuit board. And the unit is portable and easy to use so I won't hesitate to schlep it around to client sites.

I have used a relatively inexpensive IR thermometer with a laser sight for similar purposes for several years now. It will be interesting to see if the E8 replaces my IR thermometer completely or is merely an additional tool in my kit.

Wednesday, May 14, 2014

Revealing Developer Options on Android Jelly Bean

Oh, fer Pete's sake. Upgrade your Android device and now can't find the menu item Developer options? It's an Easter egg in Android 4.2.2 (Jelly Bean). From the home screen: Settings -> About device -> Build number and (srsly) click on Build number seven times. The Developer option menu item suddenly appears.

A big Thank You to http://gs4.wonderhowto.com/how-to/enable-hidden-developer-options-your-samsung-galaxy-s4-0146687/ .

Update (2014-05-15)

The original article I cited was specific to the Samsung Galaxy S4, but its technique worked on my Samsung Galaxy Tab 2 7.0. I now believe this behavior is universal to Jelly Bean and independent of the vendor. I've tweaked this article to reflect that.

Update (2014-05-19)

I had unrelated reasons to revisit the Android page on hardware devices and indeed this is clearly documented there (and probably elsewhere) for devices using Android 4.2 or newer.

Tuesday, April 15, 2014

Observations on Product Development: Part 5

  1. The K├╝bler-Ross Model (a.k.a. "The Five Stages of Grief": Denial, Anger, Bargaining, Depression, Acceptance) is also applicable to product development projects that are in trouble.
  2. Build servers must have monotonically increasing clocks; otherwise builds using tools like make are not reliably reproducible.
  3. Consider completely divorcing your product build system from whatever damn fool build system that generates your root file system; the latter is seldom optimized for the same things as the former should be.
  4. If you want to eventually generate revenue, you must first optimize for developer productivity; everything else is negotiable.
  5. If you don't believe #4, read your company's financial report and see where most of its money is being spent.
  6. Don't build the Taj Mahal on a foundation of sand and loose gravel, no matter what your precious agile methodology says about customer value.
  7. It almost never needs to be gold plated.

Saturday, April 05, 2014

No Comment


Macroscopic Quantum Effects and the P vs. NP Problem

This article on the Physics Archive blog cites a paper by Arkady Bolotin at Ben-Gurion University the premise of which is that the fact that we don't typically see quantum effects (like superposition of states) at the macroscopic level (roughly defined as those systems made up of a number of quantum particles above Avogadro's Number) may imply that P != NP.

https://medium.com/the-physics-arxiv-blog/7ef5eea6fd7a

I have only a dilettante's knowledge of either quantum physics or computational complexity theory, but this is a really interesting idea.

Richard Cook: How Complex Systems Fail

I reread Richard Cook's paper on How Complex Systems Fail this morning for maybe the third time. My only complaint is that the paper is a summary of a ton of safety research over the past several decades without a lot of references being cited. But it's a terrific executive summary, and it mentions the constant tension between safety and production pressure that is discussed at length in the work by safety researchers Jens Rasmussen and Sidney Dekker that I've cited.

http://www.ctlab.org/documents/How%20Complex%20Systems%20Fail.pdf

Cook's paper is just a few pages long and is worth your time. Cook is a medical doctor whose interest is specifically in patient safety, a topic that Mrs. Overclock (a.k.a. Dr. Overclock, Medicine Woman) and I frequently discuss.

Here's a video of Dr. Cook giving a talk on this topic. (If your device doesn't support Flash you can find the video on YouTube.)




Thursday, March 20, 2014

Python, Bash, and Embedded Systems

Lately I've been working on a little development project, Hackamore, as an excuse to learn the programming language Python. Hackamore is a multi-threaded framework that connects to one or more Asterisk PBXes via their Asterisk Management Interface (AMI) ports with the goal of dynamically modeling their channel and call states, including calls that cross PBXes via SIP trunks.

candidates = [ candidate for candidate in self.sources.values() if candidate.fileno() >= 0 ]
effective = 0.0
while candidates:
    # Service all pending I/O on every open Socket. Our goal here
    # is to consume data in the platform buffers as quickly as possible.
    for source in select.select(candidates, NONE, NONE, effective)[0]:
        source.service()
    active = False
    # Process queued events on every open socket. Should we process all
    # Events on each Source before moving onto the next one, or should
    # we round robin? There's probably no answer that will be right
    # every time. The code below does the former. Mostly we want to
    # stimulate the Model with each Event as quickly as possible
    # regardless of the Source.
    for source in candidates:
        while True:
            event = source.get(self)
            if event == None:
                break
            active = True
            message = Event(event, source.logger)
            yield message
    effective = 0.0 if active else timeout
    if effective > self.effective:
        self.logger.debug("Multiplex.multiplex: WAITING. %s", str(self))
    self.effective = effective

It's definitely a work in progress. Hackamore is more of a developer's and tester's tool than it is any real means for an administrator to monitor their PBX activity. It's output is just a straightforward ASCII report with some minimal ANSI terminal control that works with an X terminal. And mostly it was a way to figure out how to meet all my usual needs -- multithreading, synchronization, sockets, and general source code organization -- with a programming language I had never used before.


I like Python. It reminds me somewhat of the experimental languages like FP, BADJR, and BAFL that I played with the early 1980s when I was in graduate school and working in a research group that was investigating programming language and operating system architectures for hypothetical hardware inspired by the Japanese Fifth Generation project. Python's evolution started in 1989, so I sometimes wonder if it's creator, Guido van Rossum, had similar inspiration.

But there's a more pragmatic reason to like Python. Being the Type A sort of personality, I start every morning with a list of things I want to accomplish. The other day I settled in to my office around 0800 with my list in mind to work on Hackamore. By 1000 I was done.

"Huh," I thought, "guess I better start making a longer list."

That's happened a lot on this project. And it's why embedded developers need to start thinking beyond assembler, C, and even C++, and start considering the use of interpreted languages, byte code languages, and shell scripts whenever possible. While  my colleagues that work on tiny eight-bit micro-controllers may remain firmly in the assembler and C camps, and those that work on resource constrained platforms with an RTOS may never get past C++ (although I would encourage them to at least consider going that far), the rest of us need to think beyond int  main(int argc, char ** argv).

With the growing number of embedded systems that run Linux, it is becoming increasingly possible that a significant portion of our applications can be written in languages that feature an enormous productivity improvement. This improvement comes from a vastly shortened duration of the compile-test-debug iteration, from better development tools, from a capability to develop off target, from the availability of a large number of open source frameworks, and from an ability to work at a significantly higher level of abstraction where an application takes a few dozen lines of code instead of a few thousand. Python may not be your embedded tool of choice. But a shell script, even the relatively simple ash shell that is implemented in the ubiquitous BusyBox embedded tool, might be sufficient.

Years ago I remember talking to a colleague about some code we needed for a commercial Linux-based embedded telecommunications product. This little piece code was something that would only be run very occasionally, and only on demand by someone logged into the system. It became clear in the conversation that my colleague wanted to go off and start writing C code so we could have something to use in a few days. "Or, " I said, "we could write a twenty line bash script and be done in an hour." And that's what we did. It might have taken a wee more than an hour.

That happens a lot too. When all you have is a hammer, everything looks like a nail. And many a embedded developer's first instinct is to go straight to Kernighan & Ritchie. It doesn't help that most project managers are too clueless technically to know that this is a really expensive decision. I've had developers argue with me about the performance of scripting languages, but when you're talking about a program that will only be run occasionally and has no real-time requirements, the difference in total  cumulative execution time between a script and a compiled C program may only total to minutes over the entire lifetime of the commercial product in which it is used.

Even applications that talk to hardware can be scripted. That's why I wrote memtool, a utility written in C that makes is easy to read, write, and modify memory-mapped hardware registers from the command line. For sure, memtool is useful interactively. But where it really pays off is in shell scripts where you can do stuff like manipulate an FPGA or interrogate a status register without having to write a single line of new C code.  (The shell output below was scraped right off a BeagleBoard of mine running Android.)

bash-3.2# memtool -?
usage: memtool [ -d ] [ -o ] [ -a ADDDRESS ] [ -l BYTES ] [ -[1|2|4|8] ADDRESS ] [ -r | -[s|S|c|C|w] NUMBER ] [ -u USECONDS ] [ -t | -f ] [ ... ]
-1 ADDRESS    Use byte at ADDRESS
-2 ADDRESS    Use halfword at ADDRESS
-4 ADDRESS    Use word at ADDRESS
-8 ADDRESS    Use doubleword at ADDRESS
-C NUMBER     Clear 1<<NUMBER mask at ADDRESS
-S NUMBER     Set 1<<NUMBER mask at ADDRESS
-a ADDRESS    Optionally map region at ADDRESS
-c NUMBER     Clear NUMBER mask at ADDRESS
-d            Enable debug mode
-f            Proceed if the last result was 0
-l BYTES      Optionally map BYTES in length
-o            Enable core dumps
-r            Read ADDRESS
-s NUMBER     Set NUMBER mask at ADDRESS
-t            Proceed if the last result was !0
-u USECONDS   Sleep for USECONDS microseconds
-w NUMBER     Write NUMBER to ADDRESS
-?            Print menu

Even if your embedded target is too small to host even a simple shell interpreter, learning programing languages that are not natively compiled to machine code will prove valuable. This is true of Python in particular. Python is so easily interfaced with C-based  libraries that hardware vendors are starting to provide Python bindings for libraries that interface with their chips so that developers can trivially write code to monitor and manipulate their product. My friend and occasional colleague Doug Gibbons was just telling me the other day that he was using Python to monitor the performance of his signal processing code on DSPs. Python and other similar languages offer such an enormous productivity boost that I expect this trend  to continue upwards. I'm also seeing Quality Assurance testers using Python more and more to automate functional testing of the embedded systems on which I work. Knowing a little Python helps me relate to them.

If you're an embedded  developer, you are quickly running out of excuses for not learning some of the new programming languages,  even if you never expect to run those languages directly on the embedded target for which you're developing.

I'm old as the hills. If I can do this, so can you.