Reading/writing GC-less memory
OverviewHow you access data can make a difference to the speed. Whether you use manual loop unrolling or let the JIT do it for you can also make a difference to performance.
I have included C++ and Java tests doing the same thing for comparison.
TestsIn each case, different approaches to storing 16 GB of data were compared.
In the following tests I compared storing data
- allocating, writing to, reading from and total GC times
- byte (smallest primitive) and long (largest primitive)
- arrays, direct ByteBuffer and Unsafe
- JIT optimised and hand unrolled four times
|C++ char||native||8-bit char||no||31 μs||12.0 s||8.7 s||N/A|
|C++ char||native||8-bit char||yes||5 μs||8.8 s||6.6 s||N/A|
|C++ long long||native||64-bit int||no||11 μs||4.6 s||1.4 s||N/A|
|C++ long long||native||64-bit int||yes||12 μs||4.2 s||1.2 s||N/A|
|byte||heap||byte||no||4.9 s||20.7/7.8 s||7.4 s||51 ms|
|byte||heap||byte||yes||4.9 s||7.1 s||8.5 s||44 ms|
|long||heap||long||no||4.7 s||1.6 s||1.5 s||37 ms|
|long||heap||long||yes||4.7 s||1.5 s||1.4 s||45 ms|
|ByteBuffer||direct||byte||no||4.8 s||18.1/10.0 s||14.0 s||6.1 ms|
|ByteBuffer||direct||byte||yes||4.8 s||12.2/10.0 s||16.7 s||6.1 ms|
|ByteBuffer||direct||long||no||4.7 s||6.0/3.9 s||2.4 s||6.1 ms|
|ByteBuffer||direct||long||yes||4.6 s||4.7/2.3 s||7.9 s||6.1 ms|
|Unsafe||direct||byte||no||10 μs||18.2 s||13.8 s||6.0 ms|
|Unsafe||direct||byte||yes||10 μs||8.7 s||8.3 s||6.0 ms|
|Unsafe||direct||long||no||10 μs||5.2 s||1.9 s||6.0 ms|
|Unsafe||direct||long||yes||10 μs||4.2 s||1.3 s||6.0 ms|
In each case, this is the time to perform 8-bit byte or 64-bit long operations on 16 GB of data in different structures as required. In C++ and using Unsafe, I single array/block memory was used. For Java array and ByteBuffer multiple objects were use to create the same total amount of space.
C++ test configurationAll tests were performed with gcc 4.5.2 on ubuntu 11.04, compiled with -O2
Java test configurationAll test were performed with Java 6 update 26 and Java 7 update 0, on a fast PC with 24 GB of memory. Timings are for 6/7. Where there one value they were the same.
All tests were run with the options -mx23g -XX:MaxDirectMemorySize=20g -verbosegc
For me the most curious result was the performance of the long which was very fast in Java, faster than using C++ or Unsafe directly.
The codeC++ tests - memorytest/main.cpp
Java tests - MemoryTest.java