Reading/writing GC-less memory

Overview

How you access data can make a difference to the speed. Whether you use manual loop unrolling or let the JIT do it for you can also make a difference to performance.

I have included C++ and Java tests doing the same thing for comparison.

Tests

In each case, different approaches to storing 16 GB of data were compared.

In the following tests I compared storing data
  • allocating, writing to, reading from and total GC times
  • byte[] (smallest primitive) and long[] (largest primitive)
  • arrays, direct ByteBuffer and Unsafe
  • JIT optimised and hand unrolled four times

storetypesizeunrolledallocatewritingreadingGC time
C++ char[]native8-bit charno31 μs12.0 s8.7 sN/A
C++ char[]native8-bit charyes5 μs8.8 s6.6 sN/A
C++ long long[]native64-bit intno11 μs4.6 s1.4 sN/A
C++ long long[]native64-bit intyes12 μs4.2 s1.2 sN/A
byte[]heapbyteno4.9 s20.7/7.8 s7.4 s51 ms
byte[]heapbyteyes4.9 s7.1 s8.5 s44 ms
long[]heaplongno4.7 s1.6 s1.5 s37 ms
long[]heaplongyes4.7 s1.5 s1.4 s45 ms
ByteBufferdirectbyteno4.8 s18.1/10.0 s14.0 s6.1 ms
ByteBufferdirectbyteyes4.8 s12.2/10.0 s16.7 s6.1 ms
ByteBufferdirectlongno4.7 s6.0/3.9 s2.4 s6.1 ms
ByteBufferdirectlongyes4.6 s4.7/2.3 s7.9 s6.1 ms
Unsafedirectbyteno10 μs18.2 s13.8 s6.0 ms
Unsafedirectbyteyes10 μs8.7 s8.3 s6.0 ms
Unsafedirectlongno10 μs5.2 s1.9 s6.0 ms
Unsafedirectlongyes10 μs4.2 s1.3 s6.0 ms

In each case, this is the time to perform 8-bit byte or 64-bit long operations on 16 GB of data in different structures as required. In C++ and using Unsafe, I single array/block memory was used. For Java array and ByteBuffer multiple objects were use to create the same total amount of space.

C++ test configuration

All tests were performed with gcc 4.5.2 on ubuntu 11.04, compiled with -O2

Java test configuration

All test were performed with Java 6 update 26 and Java 7 update 0, on a fast PC with 24 GB of memory. Timings are for 6/7. Where there one value they were the same.

All tests were run with the options -mx23g -XX:MaxDirectMemorySize=20g -verbosegc


Curiosity


For me the most curious result was the performance of the long[] which was very fast in Java, faster than using C++ or Unsafe directly.

The code

C++ tests - memorytest/main.cpp

Java tests - MemoryTest.java

Related Link

Collections Library for millions of elements

 

Comments

  1. Hi Peter,
    Thanks for all good analytic and numbers you provide in your blog... Can you also write some post on performance of "copyOnWrite" Collections in Java.

    ReplyDelete
  2. Not sure why java heap reading time byte[] would be similar to c++
    How about a microbench to access a 1024 byte array 1 million times, would it be having similar latency as accessing an large array of 1G bytes.
    Under the hood baload can be JITed into several calls per hotspot cpp source code.

    Not sure how byte[]->baload will be JITed, especially how many times will arrayOopDesc::base_offset_in_bytes be called, and wether JIT can compile *HeapWordSize into <<<3 for cases when HeapWordSize=8

    678 void TemplateTable::baload() {
    679 transition(itos, itos);
    680 __ pop_ptr(rdx);
    681 // eax: index
    682 // rdx: array
    683 index_check(rdx, rax); // kills rbx
    684 __ load_signed_byte(rax,
    685 Address(rdx, rax,
    686 Address::times_1,
    687 arrayOopDesc::base_offset_in_bytes(T_BYTE)));
    688 }

    ReplyDelete

Post a Comment

Popular posts from this blog

Low Latency Microservices, A Retrospective

Unusual Java: StackTrace Extends Throwable

System wide unique nanosecond timestamps