Chip Overclock®: Reverse Engineering gnireenignE esreveR

Reverse engineering is the process of figuring out how something works. It is a process that comes naturally to all engineers, maybe to all humans, as a result of hundreds of thousands of years of evolution. You started doing reverse engineering the first time you took something apart just for the pure joy of destruction, put it back together, and wondered "Hey, how come I have some stuff left over?" Here is what I've learned from decades of reverse engineering the products of others.

Their lawyers are at least as good as mine.

There are three principle reasons that I personally have reverse engineered someone else's product: the joy of figuring out how something works, the desire to build an interoperable product, and the desire to build a competing product. If you are reverse engineering something, the first goal probably only you care about, as long as you don't tell anyone you did it. The second is probably okay too, as long as you are not competing with the producer of the product with which you will interoperate. If your goal is to built a competing product, I advise you to tread carefully.

End user license agreements (EULAs) are full language forbidding you to use any of the information provided with the product for the purposes of reverse engineering it. I am not a lawyer. I suggest you consult with one before reverse engineering a commercial product with the goal of competing with it.

We all know there are all kinds of clever tricks you can do with disassemblers, byte code analyzers, and hardware debuggers to figure out how something works. I advise you steer clear of them. Treat the product as if it were a black box, and base your own implementation only on the information that would be publicly available to a normal consumer of the product. Even then you may be on shaky grounds. If possible, base your implementation purely on publicly available specifications. Depending on the domain, however, a good protocol analyzer is not only fair, but may be a necessity. Logic analyzers and oscilloscopes are a tougher call, depending on whether you are examining normal user outputs of the device or its internal implementation.

The open source advocates will tell you that this is exactly the problem with closed, proprietary systems and the current intellectual property climate that surrounds high tech. And they are right. But the fact that they are right still does not give you the right to violate the copyright, the trade secrets, or the patents of your competitor. Consider a Golden Rule approach: treat your competitor with the same respect you would hope to receive. As we shall soon see, poking into the innards of your competitor's product doesn't really buy you anything anyway.

Their documentation isn't any more accurate than mine.

The publicly available documentation for a commercial product is at best some poor technical writer's best guess on how the product works and is to be used. Because of the lead time necessary for producing user documentation, even in electronic form, the user documentation is frequently generated while the product is still under development. It is often based on early, incomplete, and ultimately erroneous, specifications and requirements. No product design ever survives its implementation. And it is no more likely that your competitor's documentation has kept up to date with changes in their product than yours is.

For this reason, it is important to get a working example of the product you are reverse engineering as soon as possible. On Day One of the reverse engineering project you should have the original product sitting on your desk or in your lab. Your goal is to develop such a level of expertise in the use of that product that it will occur to you that consulting on its use could become a handy secondary source of revenue.

Do not open the product up. Do not peek inside. Hook it up as appropriate and start using it just as a normal user would. Before you write a single line of code or bread board a single circuit, try the command or function in question on the original product and see what happens. Under no circumstances trust the manual to accurately describe what the product does. Otherwise you will end up implementing, at best, somebody's preliminary idea of how some hypothetical product might work which bears only a superficial resemblance to the actual product under study.

Their customers are just as smart as mine.

My whole career has been an exercise in quickly becoming an expert in some technology or problem domain with which I have never worked before. One thing I have learned over and over again: customers use the products I build in ways in which I never anticipated, in fact never could have anticipated. This is one of the reasons developers should get out more to visit customer sites. It is also the mechanism through which temporary flaws in products are transformed virtually overnight into necessary features.

The customers that use the product you are reverse engineering as just as smart as the customers of your own products. Those customers will find new and clever ways to use the product to meet their own business needs. Their use will drive the evolution of the product, what features are added, and how they work.

For this reason, whether you realize it or not, you desperately need a heaping helping of domain knowledge about the market into which the product you are reverse engineering is being sold (and there may be more than one). A common pattern for me over the past thirty years is to be thrown head-long into a new domain where I spend at least a while as the resident ignoramus. (You get used to it.) I've been lucky enough to have people with extensive domain knowledge working next to me that I could ask "Okay, why the heck does it work this way?" Almost inevitably, the answer will make perfect sense in the context of some specific use case or application scenario.

Be prepared however for the occasional "it's always been that way". This is perfectly legitimate. Some developer in 1963 may have implemented a new feature in a certain way, maybe at the request of a single customer, and a user culture, or even an industry, grew up around it. It's like the pattern of numbers on a touch-tone telephone pad. Sure, other patterns are possible. And from a technical point of view, they may even be desirable. But if you deviate from that pattern, the few people that buy your phone are going to end up cursing your name.

Their software isn't any more reliable than mine.

As you become the resident expert in the use the product you are reverse engineering, you will find bugs. If you are lucky, you will cause the product to reboot, lock up, or spontaneously combust. Lucky, because you probably won't be expected to reproduce that behavior. (Well, as in the example below, maybe the rebooting.)

If you are unlucky, you will try something that causes the product to exhibit unexpected, meaning undocumented, behavior. Now you are faced with the question: do I implement the equivalent feature in my product so that it works in a sane manner, or do I make my product bug for bug compatible with the original? For sure, this is a judgement call. However, the expectation of many (if not most) of the customers of the original product being reverse engineered is that your product will misbehave in the same way. One customer's misbehavior is another customer's critical feature upon which all joy depends. "It's handy that the device reboots when we send it this command because that's the only way we've discovered to return it to the factory defaults."

This is even more difficult than it sounds, because the behavior of the product in question is a moving target. The product being reverse engineered will undergo revisions and updates even as you are studying it. I have found it wise to develop a set of regression test cases to test not only my product, but the product under study as I update its firmware or software.

If you are especially unfortunate, it will never occur to you to even try that particular command sequence, protocol variation, or pattern of function invocation that causes such misbehavior, because, well gosh, it just doesn't make sense. In which case the lack of this bug in your product constitutes a bug in your product.

Hey, if it were easy, anyone could do it.

Their software is just a susceptible to entropy as mine.

As features are added over time to any product, the initial design, which likely did not foresee these features, becomes more and more problematic. Sometimes this is an issue that can be resolved with just major refactoring. "If I had known we were going to have to support more than one network protocol, I would have created a network abstraction layer." Sometimes this is a show stopper. "If I had known we were going to have to run multi-threaded on shared-memory multi-core processors, I would have... crap, we're hosed."

However, you have an advantage over the developers of the product you are reverse engineering. Where their product evolved organically over the span of several years, you at least know all of the requirements up front, and what the current environment is in which it will be used. Where as their implementation may be full of warts and bolted-on features (not that you'll ever know; see above regarding "lawyers"), you are starting with a clean slate. Plus, you can take advantage of faster processors and larger memory models, and several years of open source code development that you may be able to leverage (depending on the licensing).

Once you ship, of course, you're in the same trap as they are.

Their engineers aren't any smarter than me.

I've seen features in the product I was reverse engineering that looked like the work of a student intern, and not one of those smart interns that I knew would eventually replace me either. Sometimes it is really obvious that different features were specified by different architects, or designed by different engineers, because of the lack of consistency or a radical departure from the typical interface. Of course, it is your goal to build an equally inconsistent emulation no matter how bad it may smell.

But I've also seen features whose design seemed a completely mystery, yet smelled okay. I knew deep down that I just wasn't seeing the big picture. But I also knew that the engineers of the product I was reverse engineering weren't any smarter than me, at least, not by much. I was just missing some critical piece of domain knowledge (see above, "documentation"), or was not seeing the design pattern at work. If the former, a quick trip to the office of my domain expert was usually enough to clear things up. If the latter, sometimes just starting on the design and implementation was enough to clear things up, because it forced me to face the same design tradeoffs as the original developer. You're mileage may vary, but don't hesitate to do a little designing and coding under the auspices of "prototyping". It is surprising what clarity this can bring to the problem at hand.

My product will be at least as fun to develop as theirs.

Reverse engineering is often its own reward, teaching me all sorts of things, because there is nothing quite like learning from a working example. The forensic detective work that goes into reverse engineering has a real CSI quality to it. And when I produce a product that interoperates with or emulates another product, I feel like there is a weird kind of bond between me and the developers of that product. I feel like shouting "You magnificent bastard, I read your book!"

Chip Overclock^®

Thursday, June 28, 2007

Reverse Engineering gnireenignE esreveR

No comments:

Metadata

Copyright

Trademarks

Labels

Archive

Blogs

Links

Slides

Videos

Tools

Analytics

Cloud

Search This Blog