Network Testing - Fault Emulation
Network Emulation and generating all kinds of losses or reallife scenarios can be a useful topic for testing . We could go low level and use libpcap packet injection capabilities to alter our network but that requires quite a bit of effort . A small and dirty iptables dropping packets shell script is quite not good enough . But if we look at the current Linux Kernel 2.6 distribution then netem is already enabled in the kernel and a current version of iproute2 is included. The netem kernel component is enabled under:
Netem is controlled by the command line tool 'tc' which is part of the iproute2 package of tools. The tc command uses shared libraries and data files in the /usr/lib/tc directory.
There are various capabilities that can be used with command line tc and can be setup in the following mode
Each WAN can run its set of tc tools and achieve a faulty network that he desires . Note this diagram is simplified and it requires the Linux Router to be running tc commands. This post is meant to display the capabilites of netem and non a Step by Step Guide
Examples
Emulating wide area network delays
This is the simplest example, it just adds a fixed amount of delay to all packets going out of the local Ethernet.
Now a simple ping test to host on the local network should show an increase of 100 milliseconds. The delay is limited by the clock resolution of the kernel (HZ). On most 2.4 systems, the system clock runs at 100hz which allows delays in increments of 10ms. On 2.6, the value is a configuration parameter from 1000 to 100 hz.Later examples just change parameters without reloading the qdisc
Real wide area networks show variability so it is possible to add random variation.
This causes the added delay to be 100ms ± 10ms. Network delay variation isn't purely random, so to emulate that there is a correlation value as well.
This causes the added delay to be 100ms ± 10ms with the next random element depending 25% on the last one. This isn't true statistical correlation, but an approximation.
Delay distribution
Typically, the delay in a network is not uniform. It is more common to use a something like a normal distribution to describe the variation in delay. The netem discipline can take a table to specify a non-uniform distribution.
The actual tables (normal, pareto, paretonormal) are generated as part of the iproute2 compilation and placed in /usr/lib/tc; so it is possible with some effort to make your own distribution based on experimental data.
Packet loss
Random packet loss is specified in the 'tc' command in percent. The smallest possible non-zero value is:
232 = 0.0000000232%
This causes 1/10th of a percent (i.e 1 out of 1000) packets to be randomly dropped.
An optional correlation may also be added. This causes the random number generator to be less random and can be used to emulate packet burst losses.
This will cause 0.3% of packets to be lost, and each successive probability depends by a quarter on the last one.
Probn = .25 * Probn-1 + .75 * Random
Packet duplication
Packet duplication is specified the same way as packet loss.
Packet corruption
Random noise can be emulated (in 2.6.16 or later) with the corrupt option. This introduces a single bit error at a random offset in the packet.
Packet re-ordering
There are two different ways to specify reordering. The first method gap uses a fixed sequence and reorders every Nth packet. A simple usage of this is:
This causes every 5th (10th, 15th, ...) packet to go to be sent immediately and every other packet to be delayed by 10ms. This is predictable and useful for base protocol testing like reassembly.
The second form reorder of re-ordering is more like real life. It causes a certain percentage of the packets to get mis-ordered.
In this example, 25% of packets (with a correlation of 50%) will get sent immediately, others will be delayed by 10ms.
Newer versions of netem will also re-order packets if the random delay values are out of order. The following will cause some reordering:
If the first packet gets a random delay of 100ms (100ms base - 0ms jitter) and the second packet is sent 1ms later and gets a delay of 50ms (100ms base - 50ms jitter); the second packet will be sent first. This is because the queue discipline tfifo inside netem, keeps packets in order by time to send.
Rate control
There is no rate control built-in to the netem discipline, instead use one of the other disciplines that does do rate control. In this example, we use Token Bucket Filter (TBF) to limit output.
Check on the options for buffer and limit as you might find you need bigger defaults than these (they are in bytes)
Non FIFO queuing
Just like the previous example, any of the other queuing disciplines (GRED, CBQ, etc) can be used.
Delaying only some traffic
Here is a simple example that only controls traffic to one IP address.
The commands makes a simple priority queuing discipline, then a TBF is added to do rate control, then attaches a basic netem. Finally, a filter classifies all packets going to 65.172.181.4 as being priority 3.
Netem is very powerful and Network administrators/Testers should definitely lay their hands on it !!