Monday, October 31, 2011

Up the Source Code Organization

I've spent decades working in humongous C and C++ source code bases. Humongous means many millions of lines of code. Some of it I even wrote. Most of it I didn't. Most of it didn't even originate in the organization in which I found myself working. Integrating large source bases in which major components weren't necessarily designed to work together can be a challenge.

With the plethora of open source software now available, it's not unusual that to use that new software stack that could save you months of development time, you discover you need to install a handful of other software stacks on which it depends. And those stacks have their own dependencies. And so on. Gone are the days in which you could just get by with the standard C library.

These days at Digital Aggregates I've been working on a project that incorporates not only several third-party software stacks written in C or C++, but some of my own software libraries, some of which I wrote years ago. My integration experience has caused me to revisit how I put those libraries together. Laziness being a virtue among software developers, I've come up with some organizational techniques to make my life easier at least when using my own software. These techniques exploit two things: the Digital Aggregates domain name, diag.com, and a project name unique within the company, and borrows from similar techniques used in the Java world.

Every Digital Aggregates project gets a project name. The name itself doesn't have to have any significance to the project, although it usually does in at least a pun-ish way that may only have meaning to me. The name is not an acronym, nor is it a name already used at the time by some well known (to me, anyway) software package

For example, today I wrote code using Desperado, a collection of C++ classes that implement design patterns I've found useful in embedded software development; Diminuto, a collection of C functions that implement design patterns I've found useful in systems programming on Linux and GNU based systems; Lariat, a small software wrapper around Google Test, Goggle's excellent C++ unit testing framework, that allows you to set resource limits like real-time execution duration on unit tests from the command line; and Hayloft, a work in progress. The project names help me keep everything straight.

As mundane as it sounds, the project name starts life on a manilla file folder. I keep at least some paper documentation around, either temporarily or permanently. The manilla file folder keeps all of it together and it can be easily identified as it lays on my desk or is filed in the file cabinet. On my desk in my home office right now I have file folders labelled Biscuit, Lariat, and Hayloft. Just minutes ago I consulted the file folder labelled Desperado in the file cabinet.

I keep lab notebooks that are organized not by project but chronologically and by client. I use the project name to identify notes I make in the notebook.

Obviously, I use the project name when I write about my work here in my blog, as in Automating Maintenance on Embedded Systems with Biscuits where Biscuit is the project name.

I also use it as part of the URL for the project page on the Digital Aggregates web site, for example

which of course incorporates the Digital Aggregates domain name as well. Furthermore, the project name becomes part of the tar ball name. For example

is a compressed tar ball for the Zinc distribution of Desperado.

I use the project name as the repository name in Subversion, my current source code control system of choice. My Subversion layout for every project follows the pattern I used for Desperado, and is more or less right out of the Subversion documentation.

desperado/trunk/Desperado
desperado/tag/Desperado
desperado/branches/Desperado

desperado is the Subversion repository name. The directory trunk contains the main development branch, tag contains a check point of each major release, and branches contains any temporary or ancillary development branches.

Directory names on disk incorporate the project name. For example, the implementation files for Desperado are in

${HOME}/src/Desperado

where this is just a check out of the main development branch from Subversion.

I'm a big fan of Eclipse, the open source GUI-based IDE, for C, C++, and Java development. The project name becomes the Eclipse project name, so the name Desperado shows up in the Project Explorer or C/C++ views in Eclipse, with all the source files underneath it.

Organizing header files for projects which use multiple components can be especially challenging, and I frequently see it done especially poorly. For my C and C++ projects that result in libraries intended to be used in other work, I've borrowed from some Java best practices and use both the domain name and project name as part of the header file path name. For example, a source file I was edited today had the following #include statements.

#include "com/diag/hayloft/Packet.h"
#include "com/diag/desperado/Platform.h"
#include "com/diag/desperado/Print.h"
#include "com/diag/desperado/Dump.h"

The domain name becomes part of the path used to include the header file for a specific project. This approach is used not just by source files using the libraries, but also in the source files that implement the library. This makes it perfectly clear from which project a header file is being included, and prevents any header file name collisions. For example, the Desperado header files are under

${HOME}/src/Desperado/include/com/diag/desperado

while the Hayloft header files are under

${HOME}/src/Hayloft/include/com/diag/hayloft

and the GNU g++ or gcc command line options

-I${HOME}/src/Desperado/include

and

-I${HOME}/src/Hayloft/include

are used to point the compiler to the right places. (You might choose to use the -iquote option instead.) Although these point to the source code directories where I do development, they could just as easily point to /usr/include or maybe /usr/local/include and the same naming system would prevent any conflicts with other header files.

I'm also a fan of namespaces in C++, where I use a similar approach. All of the Desperado C++ symbols are in the namespace

::com::diag::desperado

where as all of the Hayloft C++ symbols are in the namespace

::com::diag::hayloft

which results in code snippets that look like the one below.

namespace com {
namespace diag {
namespace hayloft {

/**
* Ctor.
*/
explicit Logger(::com::diag::desperado::Output &ro)
: ::com::diag::desperado::Logger(ro)
, mask(0)
{}

}
}
}

This is a constructor for the Hayloft class Logger that derives from the Desperado class Logger and uses a reference to an object of the Desperado Output class as an argument.
Furthermore, some projects require more complex namespace organizations, and this is reflected in both the implementation and header file directory hierarchies. For example, C++ symbols in the namespace

::com::diag::hayloft::s3

find their header files in

${HOME}/src/Hayloft/include/com/diag/hayloft/s3

and their implementation files in

${HOME}/src/Hayloft/s3 .

This may sound complex, but it is in practice easily done.

I predict now you are wondering what I do for C functions, where namespaces aren't an option. Using the Digital Aggregates domain name as part of C functions violates my laziness rule. It just seems to be more work than it's worth. But I do include the project name and a name roughly equivalent to a C++ class name into the function name. For example

void * diminuto_map_map(
  uintptr_t start,
  size_t length,
  void ** startp,
  size_t * lengthp
)

is a prototype for a C function in the Diminuto library that is part of the map feature for a function that maps a physical memory address to a virtual memory address.

I use a similar approach when defining macros that are expanded by the C preprocessor, resulting in names like

DIMINUTO_LOG_PRIORITY_NOTICE

or

diminuto_list_next

which can sometimes seem a little cumbersome.

I try not to use preprocessor macros at all when writing C++. But I do use the expanded naming system in both C and C++ when defining preprocessor guard symbols that prevent header files from being included more than once. This results, for example, in the guard preprocessor symbol

_H_COM_DIAG_HAYLOFT_S3_LOCATIONCONSTRAINT

for a C++ header file in the s3 sub-directory and sub-namespace.

I do use the domain name in the name of any environmental variables. For example, you can set the log level in Hayloft by setting the value of an environmental variable

COM_DIAG_HAYLOFT_LOGGER_MASK

which incorporates the domain name, the project name, and even the class name. Since environmental variables are in a global namespace that is be shared among pretty much all of the software being used by a particular user, and because you typically don't have to type the environmental variable name very often, this seems the safest approach.
In a future article I'll be talking about how I apply Google Test in projects like Hayloft, where I use a similar naming scheme that differs slightly to prevent collisions between the header files and classes under test and the test and mock classes themselves.

Update (2016-07-22)

Some time ago I converted from Subversion to Git, and started hosting my source code repositories on GitHub. I follow a similar convention as described above by giving my repositories names like com-diag-diminuto and com-diag-scattergun. This makes it really easy to keep track of stuff when I clone a repo in my src directory on my build server. When I use Eclipse, I give my projects the same name as the repo name. After using these scheme for almost five years now, I like it better and better.

No comments: