Monday, December 25, 2006

Traffic Management

Traffic management turns out to be no less important in real-time systems than it is in your daily rush hour commute, in the network to which your telephone (no matter what kind it is) is connected, or even in the plumbing from your toilet to the sewer system. Freeways are engineered assuming that not everyone will try to get on them at the same time, and busier entrance ramps meter traffic with timed stop lights. Telephone systems assume that not everyone will go off hook simultaneously. And if you want to play a prank, try flushing all the toilets in your school or office building simultaneously.

Actually, don't. Even your sewer system is based on some stochastic assumptions, and when those assumptions are violated, congestion occurs. In the case of toilets, I think you know what kind of congestion I mean.

Even in relatively simple messaging systems involving a single process or a single Java virtual machine, shared resources can be temporarily exhausted if many components try to firehose messages at each other with no feedback mechanism to affect some kind of rate control. At best, performance slows down for a spell until things sort themselves out. At worst, buffer pools run dry, the heap is exhausted, and the entire system may even deadlock. This is a well known failure mode in real-time systems, yet I routinely see developers writing code that assumes that they can fire away messages as quickly and as often as possible with little or no regard as to whether the recepient is in a position to receive them.

There are a lot of mechanisms used in real-time systems that can control how quickly a message producer may send a message so as to not overwhelm the message consumer that receives it. (It's important to understand that producer and consumer are roles played by communicating applications for the duration of a single message exchange. An application may and typically will act as both a producer and a consumer.) Below are brief descriptions of some of the traffic management mechanisms I've encountered, from the simplest to the most complex.

XON/XOFF

XON/XOFF is a form of traffic management used for flow control that will be familiar to anyone who remembers ASCII terminals. XON is the ASCII character produced by typing control-Q. XOFF is a control-S. The X is a common abbreviation for "trans" as in "transmit". When the consumer's buffer is full (or nearly so) it sends an XOFF to the producer, who suspends sending until it receives an XON. XON/XOFF is probably the most widely used traffic management mechanism on the planet by virtue of it being implemented in nearly every RS-232 serial device in existence.

The downside of XON/XOFF is that it doesn't scale well with latency and bandwidth. Just to make the problem obvious, let's assume that your producer and consumer are using a 155 megabit per second (OC-3) communication channel over a geosynchronous satellite link. Data transmitted via geosynchronous satellites have a round-trip time (RTT) of about half a second just due to the speed of light; it takes a quarter of a second to get up to the satellite, another quarter of a second to get down again.

By the time the consumer sends an XOFF to the producer, the producer operating at full channel speed will already have sent over thirty-eight megabits of data that is just travelling along the radio signal up to the satellite and back down to earth, and will send another thirty-eight megabits in the time it takes the XOFF to travel from the consumer to the producer over that same channel.

To use a simple mechanism like XON/XOFF, the consumer has to be able send the XOFF to the producer while it still has seventy-seven megabits of slack in its input buffer in order to receive all the data that is in transit, or will be in transit, before the producer can shut off.

ACK/NAK

ACKnowledge and Negative AcKnowledge is probably the most straightforward traffic management mechanism. The producer is blocked upon sending a message until the consumer either acknowledges or fails to acknowledge receiving the message. It reliably prevents the producer from overwhelming the consumer.

Its biggest downside is performance. Assume the worst case where the producer sends one bit of data over that same satellite link, then waits for the acknowledgement from the consumer. It takes the one bit of data a quarter second to go from the producer to the consumer. Ignoring any processing latency, it takes the acknowledgement another quarter of a second to go from the consumer to the producer. It took a half-second to transmit a bit from producer to consumer before a second bit could be sent, so your 155Mb/s channel has an effective bandwidth of two bits per second. This is why communications channels are more efficient with larger packet sizes.

Suppose the producer instead sends a megabit before it waits for a reply. The first bit of that megabit still takes a quarter of a second to get from the producer to the consumer, and the last bit takes more than six milliseconds at 155Mb/s. Then the acknowledgment takes another quarter of a second to go from the consumer back to the producer. Your channel now has an effective bandwidth of just under two megabits per second. You are still only utilizing a little over one percent of your channel.

What is the magic number for packet size? With a simple protocol like ACK/NAK, there is no magic number. No matter how many bits you send before turning the channel around, you are still wasting bandwidth when you stop transmitting and wait for an acknowledgement.

The ACK/NAK pattern shows up more often than you might think, even if your application isn't using a communications network at all. Synchronous message passing, where the producer blocks waiting for the consumer to receive the message, is a form of the ACK/NAK pattern. Synchronous message passing does not scale well as the complexity of the application grows, yet it is very common in web services or in any remote procedure call (RPC) mechanism like DCE, Corba, or Axis. Applications may unwittingly make requests concurrently to each other, blocking waiting for an acknowledgement that will never come because the two ends are now deadlocked waiting for the other end.

Synchronous message passing can also be a subtle form of priority inversion. While an application acting as a producer is blocked waiting for an acknowledgement from its consumer, other perhaps higher priority applications are blocked trying to send messages to the blocked application in its role as a consumer.

Windowing and Credit

To use the full capacity of your data channel, you would like to keep it full all the time. To do that, the producer has to be able to put enough new data in the channel to keep it full while an acknowledgement for prior data travels from the consumer. Communications protocols handle this by implementing a windowing or credit mechanism.

A window is a fixed amount of data that may be outstanding at any one time, as yet unacknowledged by the consumer. The producer is allowed to send this much data before being forced to wait for an acknowledgement. The producer guarantees that it has at least this much buffer space so that it can save the data it has already sent, in case it receives a negative acknowledgement from the consumer and has to send it again. Examples of windowing protocols are ISDN's Link Access Protocol for the D-Channel (LAP-D) which is used for signaling in non-IP digital telephony, and IP's Transmission Control Protocol (TCP). The producer and consumer may agree on a window size as part of their initial connection setup, or the window size may be fixed as part of the protocol itself.

A credit is the amount of free buffer space in the consumer signalled to the producer as part of an acknowledgement. This feedback mechanism lets the producer know how much more data it is allowed to send before having to stop and wait for more credit. The consumer guarantees that it has buffer space at least as large as the credit it sends. Because a consumer may be receiving messages from more than one producer into the same buffer space, the credit that a producer sees may bear little relation to the amount of data it has actually sent to the consumer. TCP/IP also uses a credit mechanism.

The optimal window or initial credit size is the bandwidth-delay product. The bandwidth-delay product of our satellite link is the 155 megabits per second bandwidth multiplied by the half second RTT, or more than nine megabytes. Although most communication channels don't suffer the RTT of a geosynchronous satellite link, many, such as gigabit Ethernet, have substantially higher bandwidth, still yielding a large bandwidth-delay product. To make effective use of channels with high bandwidth or high RTT (or both), both the producer and the consumer require large amounts of memory for buffering outgoing and incoming data. (Think what implications there might be regarding the reliability of your communications network when you have a high bandwidth-delay product. In an unreliable network, the producer may well spend most of its time and network bandwidth retransmitting a huge amount of buffered data.)

The windowing and credit patterns also show up more often than you would think. They are both a form of asynchronous message passing in which the a limit is placed on how far ahead a producer may get ahead of a consumer before being blocked waiting for an acknowledgement. Allowing applications to use unfettered asynchronous message passing does not scale well as load increases. Resources in the underlying message passing mechanism can be quickly exhausted by an enthusiastic producer with a slow witted consumer. This can lead to a deadlock of the entire system.

Traffic Shaping and Policing

All of the flow control mechanisms discussed so far require consent between the producer and the consumer. Because they all involve temporarily pausing the producer, they also introduce jitter into the data stream. Jitter is variation in the inter-arrival time (IAT) of successive packets in the data stream.

Jitter isn't generally a problem for data transfer using applications like FTP. It can be inconvenient when using interactive applications like TELNET. But it wreaks havoc with data streams that have real-time constraints like audio or video streams or VOIP. Jittered packets in real-time streams may arrive too early for play back and have to be buffered, or too late and have to be discarded completely. (What the audio or video player or VOIP telephone may play back in place of the discarded packet is an interesting problem left as an exercise to the reader.)

Technologies like Asynchronous Transfer Mode (ATM) impose traffic contracts on individual data streams. A traffic contract is a formal description of the bandwidth and burstiness characteristics of the data stream plus any real-time contraints it may have regarding jitter. ATM devices police traffic by marking incoming packets that exceed their data stream's specified traffic contract. Marked packets may be immediately dropped by the hardware, or dropped later down stream should congestion occur. Your local speed trap is a form of traffic policing. ATM devices may shape traffic by timing the introduction of outgoing packets into the network to conform to their data stream's traffic contract. Timed stoplights at freeway entrance ramps are a form of traffic shaping.

Although traffic policing and shaping to an arbitrary traffic contract may sound complicated, there are relatively simple algorithms, such as the virtual scheduling algorithm described the ATM Forum's Traffic Management 4.0 specification, that implement them. Such algorithms are efficient enough that in ATM equipment they are implemented in commercially available chipsets.

Traffic shaping is implemented solely by the producer, and traffic policing solely by the consumer. Traffic shaping smooths data flow, while traffic policing prevents congestion. I like traffic policing and shaping mechanisms not only because I've implemented a few in commercial products, but also because they allow me to reason about the maximum inflow and outflow rates of packets while troubleshooting a customer system.

Connection Admission Control

Connection Admission Control (CAC) in its simplest form is a function that returns true or false when given a traffic contract as an argument. CAC decides whether a network device can handle a new connection with the specified contract. CAC takes into account the traffic contracts of all existing connections, and its intent is to guarantee the fulfillment of existing contracts even if that means rejecting a new connection request.

In service provider networks, rejecting a new connection often means rejecting revenue, so this is not done lightly. The ATM switch or other network device that can accept more connections than another similar device and still fulfill all of its contracts has a leg up on the competition. This kind of intelligence is typically the result of complex statistical traffic models. I have implemented several CAC algorithms in commercial products, but my designs were based upon theoretical work by Ph.D.s in ivory towers. (That didn't keep me from giving the occasional talk on the topic.)

SOA and EDA

I have tried to stress here and there that traffic management is important even if you are not using an actual communications network. Lately where I have seen its need emerge is in projects implementing a Service Oriented Architecture (SOA) or Event Driven Architecture (EDA), often over some kind of Enterprise Service Bus (ESB). Whether their designers realize it or not, these systems are real-time messaging systems, and suffer from all of the traffic management issues of more traditional communications systems.

For the past year I have been working with a team of engineers on an application that uses an implementation of the Java Business Integration (JBI) standard (JSR-208), a specification for a Java-based ESB. JBI defines both synchronous and asynchronous messaging passing mechanisms, but traffic management and flow control are left up to the applications that use it, by virtue of not being discussed in the specification. Companies using JBI or any other ESB (or indeed, implementing any kind of SOA or EDA at all) would be well advised to take into account issues of traffic management in their systems.

Sources

Chip Overclock, "Gene Amdahl and Albert Einstein", October 2006

N. Giroux, et al., Traffic Management Specification Version 4.0, ATM Forum, af-tm-0056.000, April 1996

L. He, et al., "Connection Admission Control Design for GlobeView-2000 ATM Core Switches", Bell Labs Technical Journal, 3.1, January-March 1998

J. L. Sloan, "Introduction to TCP Windows and Window Shifting/Scaling", Digital Aggregates Corp., April 2005

J. L. Sloan, "ATM Traffic Management", Digital Aggregates Corp., August 2005

Ron Ten-Hove, at al., Java Business Integration (JBI), JSR-208, August 2005

3 comments:

Paul said...

Another large factor in shaping traffic is how aggressively policies are enforced. Most applications behave very poorly with even minimal packet drops (voice calls are an exception - the human ear and brain are remarkably adept at hearing in the absence of some data), so giving the producer the chance to slow things up before its packets are discarded is critical. The quality of the communication channel is also key, but most networks have gotten good at this this.

If you drive your car at high speeds and at a very close distance to the car in front of you, you can see the point. Better to slow down and speed up gradually than crash.

Oddly enough, some Service Level Agreements (SLAs) delivered by network vendors focus on latency, which is easy to deliver, versus actually delivering all the packets it receives, which is much harder. But delivering on a latency SLA can lead to very aggressive packet dropping, which in turn can cause incredibly bad application performance. Nice that the SLA is met (vendor doesn't pay you any penalty dollars), but your users are unhappy (which may result in you having to pay penalty dollars).

In any case, knowing exactly your traffic control mechanisms, and their effect on your particular application set, is more than worth the time and effort.

Chip Overclock said...

A former boss of mine, Bill Buzbee, was fond of saying (you have to imagine this in a Texas drawl) something to the effect "A vendor's published performance numbers are a guarantee that their systems will never exceed them." He was talking about supercomputers, but his wisdom is equally applicable (maybe more so) to networking, and I would guess SLAs as well.

I remember a few years ago participating in a conference call between application developers (I was one of them), a network service provider, the engineers from a network equipment manufacturer (who I once worked for), and the end-user who was our customer. Our customer, bless them, had allowed us to test their production system during off hours and we knew exactly at what load the network equipment started to fail. The manufacturer's engineer asked "What's the load at which the failure occurs?" The customer replied "Twenty thousand calls an hour." The engineer, anxious to point the finger away from their equipment, asked us "Do you test your application at that load?" Another one of the developers on our project replied "No, we test our application at one hundred thousand calls an hour." and then she could not resist adding "Twenty thousand calls an hour is kind of wimpy for us." As you might imagine, there was a awkward pause in the conversation, and the manufacturer's engineer said "We have no way to test at that kind of load." The conversation devolved from there.

I have read articles arguing that load testing is no longer really needed. The people that write such articles live in a different universe than I do.

Paul said...

Load testing can be an expensive and useless waste of time. Or it can be absolutely critical. Very situational to me.

Load testing is very worthwhile when introducing a new system/application/major upgrade. A recent web application failed miserably the first time out and was reintroduced after running a number of load tests that pointed out some obvious oversights.

In the business transactional systems I deal with we can more bang for the buck by testing individual transactions really well and look there for performance issues. Other than the first implementation of the application, in which load testing was performed, no other load testing has been done and the application performed as expected.

The more mission critical, the more of all types of testing.