Lies, statistics and vendors
Overview
Reading performance results supplied by vendors is a skill in itself. It can be difficult to compare numbers from different vendors on a fair basis, and even more difficult to estimate how a product will behave in your system.Lies and statistics
One of the few quotes from University I remember goes roughly like this
Peak Performance - A manufacture's guarantee not to exceed a given rating
-- Computer Architecture, A Quantitative Approach. (1st edition)
At first this appears rather cynical, but over the years I have come to the conclusion this is unavoidable and once you accept this you can trust the numbers you get in if you see them a new light.
Why is it so hard to give a trustworthy performance number?
There are many challenges in giving good performance numbers. Most vendors try harder to give trustworthy numbers but it is not as easy as it looks.
- Latencies and throughputs don't follow a normal distribution which is the basis of mathematically rigorous statistics. This means you are modelling something for which is isn't a generally accepted mathematical model.
- There are many different assumptions you can make, ways to test your solution and ways to represent the results.
- You need to use benchmarks to measure something, but those benchmarks are either a) not standard, b) not representative of your use case, or c) can be optimised for in ways which don't help you.
- Vendors understand their products and sensibly select the best hardware for their product. This works best if you only have one product to consider. Multi-product systems many not have an optimal hardware solution for all the products, even if your organisation allowed you to buy the optional hardware.
- It is easy to report the best results tested and not include results which were not so good.
Any decent vendor will use their benchmarks to optimise their solution. The downside of this is that the solution will have been optimised more for the benchmarks they report than use cases the vendor hasn't tested e.g. your use case.
BTW: I often find it interesting to see what use cases the vendor had in mind when they benchmark their solutions. This can be a good indication of a) what it is good for, b) the assumptions made in designing the solution, and c) how it is generally used already.
BTW: I often find it interesting to see what use cases the vendor had in mind when they benchmark their solutions. This can be a good indication of a) what it is good for, b) the assumptions made in designing the solution, and c) how it is generally used already.
Should we ignore all benchmarks?
This can lead people to give up on micro-benchmarks and benchmarks in general because they have been "lied" to many times before.
However, used correctly benchmarks can be a good guide even if they cannot give you definitive or completely reliable answers. As such I suggest you should be highly sceptical that small difference in performance give you any indication of what you would expect tot see, and only take note of wide variations in performance. By wide variations I mean 3 to 10 times differences.
Percentiles for latency
Customers generally remember the worst service they ever got and take the average service for granted. When looking at the latency of your systems, it is generally the higher latencies which cause the most issues if not customer complaints.
A common approach for modelling the distribution of latencies is to sort all the latencies and report a sample of the worst.
Percentile | One in N | Scale | Notes |
---|---|---|---|
50% | "typical" |
1x
| This is a good indication of what is possible. It is the most optimistic figure you could use |
90% | one in ten |
2x-3x
| This is a better indication of performance if tested on a real, complex system. |
99% | one in 100 |
4x-10x
| For benchmarks of simplified systems, this is a better indication of what you can realistically expect to achieve |
99.9% | one in 1,000 |
10x-30x
| For benchmarks of simplified systems, this is a conservative indication of what you can expect. |
99.99% | one in 10,000 | 20x-100x | This number is nice to have but difficult to reproduce, even for the same benchmark, let alone for a different use case. See below |
99.999% | one in 100,000 |
varies
| This number is almost impossible reproduce between systems. See below |
Generally speaking, the latencies escalate geometrically, as you get into the higher percentiles. The very high percentiles have limited value as you have to take more samples to get a reproducible number even on the same system from one day to next. They can vary dramatically based on the use case or system.
A guide to the number of samples you need for reproducible numbers
Java has a additional feature that it gets faster as it warms up. In the past I have advocated removing these warm-up figures, but given micro-benchmarks give overly optimistic figures, I am more inclined to include them if for no other reason than it is simpler.
Percentile | One in N | Simple test samples | Complex test samples |
---|---|---|---|
90% | one in ten | ~ 30 | ~ 100 |
99% | one in 100 | ~ 300 | ~ 10,000 |
99.9% | one in 1,000 | ~ 30,000 | ~ 1 million |
99.99% | one in 10,000 | ~1 million | ~ 100 million |
99.999% | one in 100,000 | ~ 30 million | ~ 10 billion |
99.9999% | one in 1,000,000 | ~ one billion | ~ one trillion |
Maximum or 100% |
never
|
Infinite
|
Infinite
|
Based on this rule of thumb I don't believe a real maximum can be measured empirically. Never the less, not reporting it all isn't satisfactory either. Some benchmarks report what is the "worst in sample" which is better than nothing, but very hard to reproduce.
To mitigate the cost of warm up in real systems, I suggest latency critical classes should be pre-loaded, if not warmed up on start up of your application.
In summary
If you are looking for a performance figure you can use, I suggest using the 99 percentile as a good indication of what you can expect in a real system. If you want to be cautious, use the 99.9 percentile.
If this number is not given, I would assume you might get about 10x the average or typical latency and 1/10th of the throughput the vendor can get under ideal conditions. Usually this is still more than enough.
If the vendor quotes performance figures close to what you need, or worse doesn't quote figures at all, beware !! I am amazed how many vendors will say they are fast, quick, fastest, efficient, high performance but don't quote any figures at all.
Comments
Post a Comment