Performance of inlined virtual method invocations in Java
OverviewOne of the benefits of dynamic compilation it the ability to support extensive method inlining on virtual method code.
While inlining the code improves performance, the code still has to check the type (in case it has changed since it was optimised) or select between multiple possible implementations.
This leads to the question; how does having multiple implementations of a method, called via an interface, impact performance.
BenchmarkingThis benchmark tests a trivial method invoked as elements of a list. It compares different numbers of classes in the list to see how it performs, but also varies the amount the loop is unrolled so that the number of possible classes is reduced. e.g. for a list of 12 different classes, in a repeating pattern, with out unrolling the method invocation could be to any of the 12 classes, but when the loop is unrolled to a degree of two, each call has only 6 possible classes (12/2). When unrolled to a degree of three, there is 4 possible classes for each call (12/3), for a loop unroll to a degree of six, there is only 2 possible classes. Note: the possible classes is different for each line of code.
These are the results on an 2.5 GHz i5. Times are in nanoseconds. The axis headings refer to the number of different classes used or called from a given line of code.
|1 used||2 used||3 used||4 used||6 used||8 used||12 used|
|1 per call site||48.4||46.6||46.9||43.7||48.7||54.9|
|2 per call site||115.8||80.5||92.8||87||112|
|3 per call site||285||283||271|
|4 per call site||669||281||275|
|6 per call site||562||275|
|8 per call site||498|
|12 per call site||530|
It appears that the number of classes loaded, or even the number of classes in the List is not as important as the number for possible classes called from a given piece of code. It also appears that different pieces of code in the same loop can be optimised independantly.
You can see that if there is only one possible class a given call can make, the number of classes used in the program doesn't matter.
Another implication is that if you are comparing two alternatives via a common interface, you have to be careful how you run tests so that the first run is not favoured by the fact only one type has been used. To address this I suggest; running all the tests multiple times for at least 2 seconds in total. This should reduce the bais associated with the order you perform the tests.