What is latency, throughput and degree of concurrency?

chrisapotek asked.

How do you define throughput and latency for your test?

There is not a simple question, so I have replied with a post.

Sustained Throughput

I consider throughput to be the number of actions a process can perform over a sustained period of time, between 10 seconds and day. (Assuming you have a quite period over night to catch up) I measure this as the number of actions per second or mega-bytes (MB) per second, but I feel the test needs to run for more than a second to be robust. Shorter tests can still report a throughput of X/s but this can be unrealistic because systems are designed to handle bursts of actively with caches and buffers. If you test one behaviour alone you get a figure which assumes nothing else is running on the system and the limits of these buffers are not important. When you run a real application on a real machine doing other things, they will not have full use of the caches, buffers, memory and bandwidth and you may not get within 2-3x the sustained throughput let alone the more optimistic burst throughput. A SATA HDD can report a burst throughput of 500 MB/second, but it might only achieve a sustained 40 MB/s. When running a real program you might expect to get 15-25 MB/sec.

Latency

There are two way to report latency. One way latency and round trip latency (or Round Trip Time). Often the first is reported because it is less, but it difficult to measure accurately as you need a synchronised clock at both ends. For this reason you often measure the round trip latency (as you can use just one accurate clock) and possibly halve it to infer the one way latency. I tend to be interested in what you can expect from a real application and the higher round trip latency is usually a better indication.

A common measure of latency is to take the inverse of the throughput. While this is easier to calculate, it is only comparable to other tests measured this way because it only gives you the most optimistic view of the latency. e.g. if you send messages asynchronously over TCP on loop back you may be able to send two million messages per second and you might infer that the latency is the inverse of 500 ns each. If you place a time stamp in each message you may find the typical time between sending a receiving is actually closer to 20 micro-seconds. What can you infer from this discrepancy? That there around 40 (20 us / 500 ns) messages in flight at any time.

Typical, Average and percentile latency

Typical latency can be calculated by taking the individual latencies, sorting them and taking the middle value. This can be a fairly optimistic value but because its the lowest, it can the value you might like to report. The Average latency is the sum of latencies divided by the count. This is often reported because its the simplest to calculate and understand which it means. Because it takes into account all values it can be more realistic than the typical latency. A more conservative view is to report a percentile of latency like 90%, 99%, 99.9% or even 99.99% latency. This is calculated by sorting the individual latencies and taking the highest 10%, 1%, 0.1% or 0.01%. As this represents the latency you will get most of the time, it is a better figure to work with. The typical latency is actually the 50% percentile. It can be useful to compare the typical and average latencies to see how "flat" the distribution is. If the typical and average latencies are within 10%, I consider this to be fairly flat. Must higher than this indicates opportunities to optimise your performance. In a well performing system I look for about a factor of 2x in latency between the 90%, 99% and 99.9%.

The distribution of Latencies often have what is called "fat tails". Every so often you will have values which are much larger than all the other values. These can be 10 - 1000x higher. This is what looking at the average or percentile latencies more important as these are the one which will cause you trouble. The typical latency is more useful for determining if the system can be optimised.

A test which reports these latencies and throughputs

The test How much difference can thread affinity make is what I call an echo or ping test. One thread or process sends a short message which contains a timestamp. The service picks up the message and sends it back. The original sender reads the message and compares the timestamp in the message with another timestamp it takes when the message is read. The difference is the latency measured in nano-second (or micro-seconds in some test I do)

Wouldn't less latency lead to more throughput? Can you explain that concept in mere mortal terms?

There are many techniques which improve both latency and throughput. e.g. using faster hardware, optimising the code to make it faster. However, some techniques improve only throughput OR latency. e.g. using buffering, batching or asynchronous communication (in NIO2) improves throughput, but at the cost of latency. Conversely making the code as simple as possible and reducing the number of hops tends to reduce latency but may not give as high throughput. e.g. send one byte at a time instead of using a Buffered stream. Each byte can be received with lower latency but throughput suffers.

Can you explain that concept in mere mortal terms?

In simplest terms, latency is the time per action and throughput is the number of actions per time.

The other concept I use is the quantity "in flight" or "degree of concurrency", which is the Concurrency = Throughput * Latency.

Degree of Concurrency examples

If a task takes 1 milli-second and the throughput is 1,000 per second, the degree of concurrency is 1 (1/1000 * 1000). In other words the task is single threaded.
If a task takes 20 micro-seconds and the throughput is 2 million messages per second, the number "in flight" is 40 (2e6 * 20e-6)
If a HDD has a latency of 8 ms but can write 40 MB/s, the amount of data written per seek is about 320 KB (40e6 B/s * 8e-3 s = 3.2e5 B)

Vanilla Java