Low GC in Java: Use primitives instead of wrappers

Overview

There are two good reason to use primitives instead of wrappers where possible.
  • Clarity. By using a primitive, you are making it clear that a null value is not appropriate.
  • Performance. Using primitives is often much faster.
Clarity is often more important than performance, and is the best reason to use them. However, this article discussed the performance implications of using wrappers.

I have had a lot of interest in this article How to avoid Garbage Collection, however this was lacking in much practical detail. This is the first article in a series on ways to reduce demands on the GC.

A followup article Low GC in Java: Using primitives looks at using primitives and collections which support them.

Performance of using wrappers

The following micro-benchmark behaves in a way many application do.
Loop using Wrappers and Wrapper Collection
Map<Integer, Integer> counters = new HashMap<Integer, Integer>();
int runs = 20 * 1000;
for (Integer i = 0; i < runs; i++) {
    Integer x = i % 12;
    Integer y = i / 12 % 12;
    Integer times = x * y;
    Integer count = counters.get(times);
    if (count == null)
        counters.put(times, 1);
    else
        counters.put(times, count + 1);
}
This creates objects for each task. While it is common practice to use int for loop counters its is also common practice to use Iterator You can play around with the types and parameters of this micro-benchmark, however you get a memory profile which will be familiar to many developers who have tried to tune their application. Using VisualVM the heap usage looks like this over a five minute period.
There was 20 minor GCs in about 6 minutes.
The average time of each loop is sub-microsecond which is pretty fast.

Took 4,099 ns per loop
Took 559 ns per loop
Took 115 ns per loop
Took 240 ns per loop
Took 255 ns per loop
In the first test, the JVM hasn't warmed up.

Can using primitives really make much difference?

Performance of using primitives

The following benchmark behaves rather differently to most applications. Even though it is doing the same work as the previous benchmark, there is no objects created.
Loop using Primitives and array
int[] counters = new int[144];
int runs = 20 * 1000;
for (int i = 0; i < runs; i++) {
    int x = i % 12;
    int y = i / 12 % 12;
    int times = x * y;
    counters[times]++;
}
and the heap usage reflects this
There was no GCs over a period of 5 minutes. The test could have run longer and still not triggered a GC.
And the average time per loop is much lower as well

Took 198 ns per loop
Took 17 ns per loop
Took 16 ns per loop
Took 14 ns per loop
Took 15 ns per loop
In the first test, the JVM hasn't warmed up.

Conclusion

Using primitives will perform better. (Unless there is excessive boxing and unboxing)

Even in applications where performance is not critical, it will improve clarity, both of the code and when you do attempt to profile your application, it will reduce the level of "noise" making it clearer as to what the problem is.

Notes

Even in the test where few objects were created, you can see some object allocation. This is mostly due to VisualVM's polling. To reduce this I changed the polling interval from 3 seconds to 20 seconds.

The Eden size was increased to make the graphs clearer with -XX:NewSize=100m This value is not recommend (except perhaps for micro-benchmarks) but its is a parameter you may need to tune for your application.

Full code

Primitive benchmark
Wrapper benchmark

Comments

  1. What am I missing here?

    First test, with boxed values is using an HashMap and the second is by using an fixed size array.

    I think the entry in hashmap is mutable all right, but in no way should you be taking timings from hashmap vs. table access (that is the slowest part in first test).

    You should have at least made the test better by using an array in first as well.

    Next step could be to allow the heap grow way more than necessary, take a heap dump somewhere near it's maximum and inspect it. If it's full of Integer instances and those are not in eden, then you are correct. If not, you are wrong.

    I would say that hotspot should stop any boxing activities after a while, especially for your trivial loop.

    What java version were you using? How about explaining why the heap grows all the time, even when using the int array? I suspect this has more to do with JMX issues and leaks than autoboxing.

    ReplyDelete
  2. @joonas, Thank you for your thoughtful comment.

    You are right that there are a number of differences between the first and second test and any one of these could be important. However from the perspective of using wrappers vs using primitives its follows the expected. I should change the test so it is a clearer example. (i.e. with less differences)

    I don't see what difference it would make if more Integer or Map.Entry objects are created.

    Hotspot will cache certain autoboxed values, its not something which changes as the JVM warms up. (See the java.lang.Integer.Cache for more details).

    I am using Java 6 update 25.

    The first note near the end I mention why there is some garbage being produced and what I did to reduce it. The JMX is identical in both cases so it does not explain the difference.

    There is no leak in the first example, you can see after each GC the used memory is the same. In the second test, if I trigger a GC manually it goes back to the same value.

    ReplyDelete
  3. you can create the HashMap like "new HashMap(144)" so the map won't take time updating its internal structures, but I also think that an array is MUCH faster than an HashMap and it does not need initialization

    ReplyDelete
  4. @scanti, Thank you for your suggestions. HashMap will reach this size after the first 144 of 20,000 entries are updated.

    In term of garbage collection I don't expect the results to change significantly. I need to make the test more like for like and the performance difference should narrow.

    ReplyDelete
  5. I really think the problem is in HashMap too. When using wrappers, you only cause unboxing wich means, basically a method call (which is always the same). Internal representation of Integer is an ... int.
    But, with the HashMap, you first get() so you cause a search then put() so you cause an hash calculation then a store in internal entries. Besides, you cause a redimensionning of the internal entries.
    Using an array causes a direct access by JVM virtual memory adressing. That's arythmetic access, not hash search nor hash calculation. This is the cause of the low speed.

    Can you please indicate the % of time caused by GC ?

    ReplyDelete
  6. @waddle, There are a number of factors which cause low speed. I am writing a new article which will only use different hash maps.

    The main purpose of the comparison was to look at the memory usage profile.

    I will try to include some information on the time spent doing a GC. In the primitive example. 0% of the time was spent doing a GC. ;)

    ReplyDelete
  7. I agree primitives takes least memory if you have and that's the reason I prefer Array over any other collection wherever possible because with array you can get same performance but with less memory.

    Thanks
    Javin
    How GC works in Java

    ReplyDelete
  8. @Javin Paul, Similarly you can use use collections like Trove4j which use primitives in collections without wrappers.

    ReplyDelete
  9. In your second example, you should also consider moving the declaration of x, y and times to outside the loop.

    ReplyDelete
  10. @Ishwor, That raises a good question. I hope you don't mind me mentioning you in http://vanillajava.blogspot.co.uk/2012/12/local-variables-inside-loop-and.html

    ReplyDelete
  11. Could you export sources of example to GitHub? Thanks

    ReplyDelete

Post a Comment

Popular posts from this blog

Low Latency Microservices, A Retrospective

Unusual Java: StackTrace Extends Throwable

System wide unique nanosecond timestamps