Is using Unsafe really about speed or functionality?

Overview

Around 6 years ago, I started using a class which up to that point was just a curiosity sun.misc.Unsafe.  I had used it for deserialization and re-throwing Exceptions but not used all it's capabilities or talked about it publicly.

The first open source library I saw which did use Unsafe in a serious way was Disruptor. This encouraged me that it could be used in a stable library.  About a year later I released my first open source libraries, SharedHashMap (later Chronicle Map) and Chronicle (later Chronicle Queue).  This used Unsafe to access off heap memory in Java 6.  This made a real difference to the performance of off heap memory, but more importantly what I could do with shared memory. i.e. data structures shared across JVMs.

But how much difference does it make today? Is using Unsafe always faster?

What we are look for is compelling performance differences.  If the difference is not compelling, using the simplest code possible makes more sense. i.e. using natural Java.

The tests

In these tests I do a simple accumulation of data which originates in off heap memory.  This is a simple tests which models parsing data (or hashing data) which originates off heap e.g. from a TCP connection or a file system. The data is 128 bytes in size.  The result below can be effected by the size of the data however this is assumed to representative.

I look at different sizes of access, either a byte, an int or a long at a time.  I also look at using ByteBuffer, or copying the data on heap and using natural Java (which I assume is how most programs do this)

I also compare using Java 6 update 45, Java 7 update 79, Java 8 update 51 to see how using different approaches has changed between releases.

Byte by byte processing

Something which has really improved in processor design is how fast it can copy large blocks of data. This means copying a large block of data so it can be processed more efficiently can make sense. i.e. a redundant copy can be cheap enough that it can result in a faster solution.

This is the case for byte by byte processing. In this example, the "On heap" includes the copy of copying the data on heap before processing it.  These figures are in operations per micro-second on an i7-3790X.

Java 6 Java 7 Java 8
ByteBuffer           15.8           16.9           16.4
Unsafe           17.2           17.5           16.9
On heap           20.9           22.0           21.9

The important take away from this is that not only does the "On heap" use natural Java it is also the fastest in all three versions of Java  The most likely explanation is that the JIT has an optimisation which it can do in the on heap case it doesn't do if you use Unsafe, directly or indirectly.

Int by int processing.

A faster way to parse verbose wire protocols is to read an int at a time. e.g. you can write an XML parser for a know format by read an int at a time instead of looking at each byte individually.  This can speed up parsing by a factor of 2 - 3 times.  This approach works best for content of a known structure.

Java 6 Java 7 Java 8
ByteBuffer           12.6           36.2           35.1
Unsafe           44.5           52.7           54.7
On heap           46.0           49.5           56.2

Again, this is operations per micro-second on an i7-3790X.  What is interesting is that using natural Java after a copy is about as fast as using Unsafe.  For this use case, there is no compelling reason to use Unsafe either.

Long by long processing

While you could write a parser which read a 64-bit long values at a time, I have found this to be rather harder than parsing using 32-bit int values.  I haven't found the result to be much faster either.  However, hashing a data structure can benefit from reading long values, provided the hashing algorithm was designed with this in mind.

Java 6 Java 7 Java 8
ByteBuffer           12.1           56.7           53.3
Unsafe           66.7           83.0           94.9
On heap           60.9           61.2           70.0
It is interesting to see how much faster using ByteBuffer has become.  The most likely explanation is the addition of an optimisation of swapping little-endian to the default big-endian in ByteBuffer.  The x86 has an instruction to swap the bytes around, but I suspect Java 6 didn't use it and instead used the more expensive shift operations.  To be able to confirm this, would require more testing and an examination of the assembly code generated.

In this case, using Unsafe is consistently faster, whether you think this improvement is worth the risk associated with using Unsafe directly, is another matter.

Additional notes

These tests assumed a uniform data types of bytes, or ints, or longs.  
In most real cases, there is a combination of these data types and this is where on heap struggles. e.g. if you need to parse a arbitrary combination of bytes, shorts, ints, longs, floats, doubles.  The ByteBuffer is a good way to do this, however it's the slowest option in each case otherwise.  Only Unsafe gives you the flexibility to mix and match types without overhead.

It is difficult to give a fair tests for on heap for these mixed types as natural Java doesn't support these operations directly.

Conclusions

Even if performance is your primary concern, there are cases where natural Java either performs better or is as fast as using Unsafe.  It often out performs ByteBuffer as the JIT is better at optimisation away overheads like bounds checks for natural Java code.

The natural Java code relied on the fact we could model the data as either byte[], int[] or long[]. There is no option for an array or a mixture of primitive types.

Where natural Java struggles is in it's range of support for either
  • arbitrary combinations of different primitive types e.g. a byte, int, long, double.
  • thread safe operations on shared/native memory.
Unfortunately this lack of support in natural Java makes it hard to create a fair benchmark to compare the performance.

In summary, if you can implement an algorithm in natural Java, it is probably the fastest as well as the simplest.  If you need to parse data with a mixture of data types or thread safe off heap, there is still no good way to do this from natural Java.

Note: this is an area where VarHandles in Java 9 should be able to help so watch this space for an update on VarHandles.




Comments

  1. Could you share sources of the benchmarks?

    ReplyDelete
    Replies
    1. and what those numbers actually mean?

      Delete
    2. and maybe Java 9 benchmarks seeing as how performance goes up maybe the safe methods have gotten better again

      Delete
  2. Re: "It is interesting to see how much faster using ByteBuffer has become. The most likely explanation is the addition of an optimisation of swapping little-endian to the default big-endian in ByteBuffer." The main difference between java 6 and 7, is that they added intrinsics on method calls. As per java 8 of Unsafe and ByteBuffer, it is strange to me. Can you show your test?

    ReplyDelete

Post a Comment

Popular posts from this blog

Java is Very Fast, If You Don’t Create Many Objects

System wide unique nanosecond timestamps

Comparing Approaches to Durability in Low Latency Messaging Queues