How to get C like performance in Java

Overview

Java has many areas which can be slow. However for every problem there is a solution. Many solutions/hacks require working around Java's protections but if you need low level performance it is still possible.

Java makes high level programming simpler and easier at the cost of making low level programming much harder. Fortunately most applications follow the rule of thumb that you spend 90% of your time in 10% of the code. This implies you are better off 90% of the time, worse off 10% of the time. ;)

It makes me wonder why you would write more than 10% of your code in C/C++ for most projects. There will be some projects where C/C++ is the only sensible solution, but I suspect most C/C++ projects would more productive with the use of higher level languages like Java.

One way to get C-like performance is to use C via JNI for key sections of code. If you want to avoid using C or JNI there are still ways you can get the performance you want.

Note: Most of these suggestions only work for standalone applications rather than applets.

Note 2: Use at your own risk. You are likely to need to test edge cases which you wouldn't normally need to worry about when using low level Java.

Fast array access

One area Java can be slower is array access. This is because it implicitly does bounds checking. The JVM is smart enough to optimise checks for loops by checking the first and last element, however this doesn't always apply.

One work around is to use the Unsafe class (which is only available on some JVMs, OpenJDK JVMs do) This class has getXxxx() and setXxxx() for each primitive type and gives you direct access to an object, array or direct memory where you have to do the bounds checking. In native code, these are compiled to single machine code instruction. There is also a getObject(), setObject() methods however I suspect that they don't provide as much of a performance improvement (by the time you access the Object as well)

You can check the native code generated for a method by downloading the debug version of the OpenJDK and getting it to print the compiled native code.

Arbitrary memory access

You can use the Unsafe class again for arbitrary access, however a "friendlier" way is to use a DirectByteBuffer and change its address and limit as desired (via reflection or via JNI) This will give you a Buffer which points to a random area of memory such as device buffer.

Using less memory

This is not as much of an issue as it used to be. A 16 GB server costs $1000 and a 1 TB server costs about $70K.

However, cache memory is still a premium and for some applications and its worth cutting memory consumption. A simple thing to do is to use Trove which support primitives in collections efficiently. If you have a large table of data, you can store data by column instead of by row (if you have lots of rows of data, and a few columns). This can improve caching behaviour if you are scanning data by field but don't need all fields.

You can also use Direct memory to store data how you wish. This is what the BigMemory library uses.

Stream based IO is slow and NIO is a pain to use

How can use you have the best of both worlds? Use blocking IO in NIO (which is the default for a Channel) Don't use Selectors unless you need them. In many cases, they just add complexity. Most systems can handle 1K-10K threads efficiently. If you need more connections than that, buy another server, a cheap one cost about $500.

I suggest 1K to 10K connections per server as IMHO no business or web site which has 10K concurrent users will be using just one server. Scalability beyond 10K users/server doesn't buy you anything in the real world.

Fast Efficient Strings

Java 6 update 21 has an option -XX:+UseCompressedStrings which can use byte[] instead of char[] for the strings which don't need 16-bit characters. For many applications this saves memory but is slower. (5%-10%)

Instead you can use your own Text type which wraps a byte[], or get you text data from ByteBuffer, CharBuffer or use Unsafe.

Faster Startup times

Java tends to have slow startup times when you load in lots of bloated libraries. If this is really a problem for you load less libraries. Keeping them to a minimum is good practice anyway. Do this and your startup times will be a few seconds (not as fast as C, but likely to be fast enough)

Fewer GC pauses

Most Java libraries create objects freely and generally this is not a problem.

However this doesn't mean you can't pre-allocate your objects, use Direct ByteBuffers and Object recycling techniques to minimise your object creation. By increasing the Eden size you can have an application which rarely GCs. You may even reduce it to one GC per day (say as a scheduled over night job)

Related Articles

Low GC in Java: Use primitives instead of wrappers

Low GC in Java: Using primitives

Links to this page

http://www.javacodegeeks.com/2011/07/how-to-get-c-like-performance-in-java.html

http://java.dzone.com/news/how-get-c-performance-java

http://www.xydo.com/toolbar/24654886-how_to_get_c_like_performance_in_java

http://www.artima.com/forums/flat.jsp?forum=121&thread=331172

http://www.java.net/community-item/how-get-c-performance-java-0

http://www.jchk.net/2011/07/vanilla-java-how-to-get-c-like.html

http://sukhobor.noscart.com/archives/3033

Comments

  1. Very nice article. Thx for sharing. One thing i am interested in is the fact that you have mentioned loading libraries as you need on startup. How can one achieve this in the context of an application server deploying your EAR or WAR file?

    ReplyDelete
  2. Hi Peter,

    "You may even reduce it to one GC per day (say as a scheduled over night job)" - can you enlighten a little bit the correct way of scheduling the GC, please?

    Thank you.

    ReplyDelete
    Replies
    1. You could use a tool like quartz, however I had a thread which once a minute (scheduled executor service) which check if it has cross the 5 AM time line. i.e. last run was before 5 AM and it is now after 5 AM. The assumption is you can find an hour/time of the day when running a full GC is considered acceptable.

      Delete

Post a Comment

Popular posts from this blog

Java is Very Fast, If You Don’t Create Many Objects

System wide unique nanosecond timestamps

Comparing Approaches to Durability in Low Latency Messaging Queues