Using JMH to find the fastest way to encode/decode UTF8

Overview

I had played with JMH before however this is the first time I have used JMH to solve a production problem.  I had some idea how to optimise the code involved but trying different combinations with JMH lead to a significant improvement.

In this test I am encoding a String as UTF8 to direct memory so it can be written to a TCP socket.  Also I need to take data written to direct memory via an NIO SocketChannel.read and encode it into a StringBuilder (which can be reused)

The tests

These tests involved a combination of using reflection to obtain the underlying data structure of String and StringBuilder but also using Unsafe to access the native memory.

To my surprise, access String via reflection appeared to be no faster, possibly slower. However @jponge pointed out that I needed to be returning a result from each benchamrk to avoid dead code elimination.  After I did this, the access via reflection was faster.

See encode_unsafeLoopCharAt  and encode_unsafeLoopCharAt

Another surprise was that accessing StringBuilder via reflection did make a difference.

See decode_usingSimpleLoop and decode_usingCharArray Accessing the underlying char[] was over 4x faster.  Note: this is clear an optimisation issue and future versions of Java might not have this problem.

All the results

The performance score is in operations per micro-second
Benchmark i7-3790X  i7-4770K E5-2650v2
DecodeMain.decode_fromUTF8       7.3      6.8   6.3
DecodeMain.decode_usingCharArray       8.4      8.1   8.8
DecodeMain.decode_usingCharArrayAndAddress     12.7    11.8 11.3
DecodeMain.decode_usingSimpleLoop       1.9      1.9   1.8
EncodeMain.encode_simpleToUTF8       6.5      6.6   5.9
EncodeMain.encode_unsafeLoopCharArray     19.8    17.0 17.5
EncodeMain.encode_unsafeLoopCharAt     15.3    12.0 13.7
EncodeMain.encode_unsafeLoopCharAtUnrolled     15.3    11.8 13.0
EncodeMain.encode_usingSimpleLoop       8.4      7.9   7.4
EncodeMain.encode_usingSimpleLoopUnrolled       6.3      5.6   6.3
 

Conclusion

In future this functionality might be built in to the JVM, however there is likely to be functionality which is not built in to the JVM which is causing a performance issues and having an alternative can make a difference.

Thank you

@jponge and @shiplev for your feedback.


Comments

Popular posts from this blog

Low Latency Microservices, A Retrospective

Unusual Java: StackTrace Extends Throwable

System wide unique nanosecond timestamps