sun.misc.Unsafe and off heap memory

Overview

The class sun.misc.Unsafe allows you to many of the things you shouldn't be able to do in Java, but are still useful in very specific cases.  It is to be avoided 99% of the time, however there are rare occasions where this is the only solution which makes sense.

This post considers how it has been using in OpenHFT and what functionality I would like to see in Java 9. In particular, accessing large amount of memory without impacting the GC can be done this way. Sharing memory between processes, without significant overhead, can only be done this way in Java.

Allocating and freeing off heap memory.

public native long allocateMemory();
public native void freeMemory(long address);

These two method allow you to allocate any size of off heap memory.  It is not limited to Integer.MAX_VALUE bytes and you get raw memory where you can apply bounds checking as you need to. e.g. Bytes.writeUTF(String) calculates the length of the encoded string, and checks that the whole string would fit, once, not on each byte.

Java-Lang uses the same internal Cleaner class that DirectByteBuffer uses to ensure the memory is freed.  Ideally this wouldn't be so internal.

Raw access to memory

public native Xxx getXxx(Object, long offset); // intrinsic
public native void putXxx(Object, long offset);// intrinsic

In both cases, the Object is null when dealing with off heap memory and the offset is just the address. This allows you to perform RAW memory access using single machine code instructions for the JVMs which threat these as intrinsics.  This significantly improves performance for memory access.

The problem with this raw approach is you have to manage the layout of your fields in your data structures yourself.  The Java-Lang library addresses this by allowing you to define an interface of getters and setters (even for object types like String and enums) and it will generate the implementation at runtime. i.e. you can access the getter/setters without needing to know the "objects" are off heap.

Thread safe access to memory

public native Xxx getVolatileXxx(Object, long offset); // intrinsic
public native void putOrderedXxx(Object, long offset); // intrinsic

These two sets of methods allow you to use a field as volatile with a lazy set.  The lazy set is faster for the setting thread but could result in the same thread reading an old value if done too quickly.  The solution to this is don't read a value you just wrote.

These methods are particularly useful when sharing data between processes.

CAS operations

public native boolean compareAndSwapXxxx(Object, long offset, Xxx expected, Xxx setTo) // intrinsic

This method is essential for building locks off heap.  In particular it is the most efficient way to share data in a thread safe manner between processes.  In tests I have done on a Haswell processor i7-4500, the round trip latency of two processes on the same machine is typically;

TCP               - 9 micro-seconds.
FileLocks       - 5.5 micro-seconds.
CAS               - 0.12 micro-seconds.
Ordered write - 0.02 micro-seconds (Half round trip, if this pattern can be used)

On heap object allocation

public native Object allocateInstance(Class clazz);

When deserializing a class, you want to reconstitute the values in that class the way it was when serialized.  This doesn't work well with the current constructors as noted in JEP 187: Serialization 2.0 A work around for this is to avoid the constructors entirely and create an instance without calling one. This assumes much about trusting the data you have, but it has the advantage of being easy to use and places no assumptions about which constructors you have.

Conclusion

It has often been noted that embedded databases, without a network overhead, can out perform distributed databases in term of latency.  I believe the next generation of low latency databases will give the performance of embedded and be shared between processes and give both update and query response times well below a micro-second.

I see no reason these should not be implemented in Java.  For Java users, a native interface will perform best as it doesn't need JNI or translation from a C view of the world to a Java view.


Comments

  1. Hi, Peter "The lazy set is faster for the setting thread but could result in the same thread reading an old value if done too quickly. The solution to this is don't read a value you just wrote."

    It is the "same thread" or "other thread" reading an old value.

    ReplyDelete
  2. Hi Peter,
    Thanks for this nice article. Could you pls. also determine some generic usages of compareAndSwapxxx()?

    Do you find usage for park/unpark/monitorEnter/monitorExit?

    Thanks

    ReplyDelete

Post a Comment

Popular posts from this blog

Java is Very Fast, If You Don’t Create Many Objects

System wide unique nanosecond timestamps

Comparing Approaches to Durability in Low Latency Messaging Queues