Can synchronization be optimised away?

Overview

There is a common misconception that because the JIT is smart and synchronization can be eliminated for an object which is only local to a method that there is no performance impact.

A test comparing StringBuffer and StringBuilder

These two classes do basically the same thing except one is synchronized (StringBuffer) and the other is not. It is also a class which is often used in one method to build a String.  The following test attempts to determine how much difference using one other the other can make.

static String dontOptimiseAway = null;
static String[] words = new String[100000];

public static void main(String... args) {
    for (int i = 0; i < words.length; i++)
        words[i] = Integer.toString(i);

    for (int i = 0; i < 10; i++) {
        dontOptimiseAway = testStringBuffer();
        dontOptimiseAway = testStringBuilder();
    }
}

private static String testStringBuffer() {
    long start = System.nanoTime();
    StringBuffer sb = new StringBuffer();
    for (String word : words) {
        sb.append(word).append(',');
    }
    String s = sb.substring(0, sb.length() - 1);
    long time = System.nanoTime() - start;
    System.out.printf("StringBuffer: took %d ns per word%n", time / words.length);
    return s;
}

private static String testStringBuilder() {
    long start = System.nanoTime();
    StringBuilder sb = new StringBuilder();
    for (String word : words) {
        sb.append(word).append(',');
    }
    String s = sb.substring(0, sb.length() - 1);
    long time = System.nanoTime() - start;
    System.out.printf("StringBuilder: took %d ns per word%n", time / words.length);
    return s;
}


at the end prints with -XX:+DoEscapeAnalysis using Java 7 update 10

StringBuffer: took 69 ns per word
StringBuilder: took 32 ns per word
StringBuffer: took 88 ns per word
StringBuilder: took 26 ns per word
StringBuffer: took 62 ns per word
StringBuilder: took 25 ns per word


Testing with one million words doesn't change the results significantly.

Using -XX:BiasedLockingStartupDelay=0 improves the situation where only one thread uses the lock.  This helps in the situation where locking has been used but isn't needed, but at the cost of locks which are used when they are needed.

StringBuffer: took 34 ns per word
StringBuilder: took 31 ns per word
StringBuffer: took 51 ns per word
StringBuilder: took 25 ns per word
StringBuffer: took 28 ns per word
StringBuilder: took 25 ns per word

Conclusion

While the cost of using synchronization is small, it is measurable and if you can use StringBuilder it is preferred as it states in the Javadocs for this class.

In theory, synchronization can be optimised away, but it is yet to be the case even in simple cases.


Comments

  1. Of course I am no expert, but possibly this is because JVM needs not a synchronized method to be "hot", but the calling method, the one holding local variable. In your test they are called only 10 times, so I increased that to 100 and re-run the test expecting performance improvement for StringBuffer. But this is what I got:

    StringBuffer: took 68 ns per word
    StringBuilder: took 29 ns per word

    Which was a bit surprising, but I realized that HotSpot needs more encouraging to optimize, so I added -XX:+AggressiveOpts and re-run. This time it was better:

    StringBuffer: took 34 ns per word
    StringBuilder: took 26 ns per word

    ReplyDelete
  2. the critical code is run 10*100000 times so I am not surprised that doing it 100*100000 times did make much difference but might produce more consistent results.

    it is interesting that aggressive opts shows an improvement. I will re-test with a shorter array as this may be more representative when you have only tens to hundreds of calls.

    ReplyDelete
  3. @raj can you add some comment in how this article is related?

    ReplyDelete
  4. Please re-run your benchmark with -XX:BiasedLockingStartupDelay=0

    and read this article:
    http://mechanical-sympathy.blogspot.fr/2011/11/biased-locking-osr-and-benchmarking-fun.html

    ReplyDelete
    Replies
    1. While the tests in the article are good for comparing locking performance they all suffer from the flaw that they would be much faster if you used just one thread with no locking. You can see that performance dropped as you added threads in every case and it wasn't compared with the performance of using no locking at all. i.e. it is using locking in a situation which is clearly unrealistic and potentially unrepresentative. Although there is allot of multi-threaded code out there which was never tested to determine whether using multiple threads was faster, this is not something which should be encouraged. ;)

      Delete
  5. @Jean I am assuming this might reduce the impact of synchronization when there are multiple threads, it wouldn't optimise away the synchronization.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. @Peter,

    Biased locking optimizes away most of the synchronization when there is no contention, as it is the case in your example.

    BiasedLockingStartupDelay=0 VM option is just to reduce the wait time before biased locking apply. By default in HotSpot, there is a 4 seconds delay before Biased locking can be applied. Since your example run in less than 4 seconds, this optimization can be performed. reducing the Startup Delay enables the optimization.

    ReplyDelete
  8. @Jean

    Shouldn't UseBiasedLocking be enabled along with BiasedLockingStartupDelay for this to work?

    ReplyDelete
  9. @Valery,

    Yes it should but it is enabled by default, unless you are using a JDK 5.

    ReplyDelete
  10. The point of this article is to determine if synchronization can be eliminated as an optimisation by the JIT.

    While you use options to reduce its impact (which may or may not be suitable for real programs), this is still not as good as not having it in the first place. In some ways these options make it harder to see that synchronization is still there so actually they are not a good idea in this situation either.

    ReplyDelete

Post a Comment

Popular posts from this blog

Java is Very Fast, If You Don’t Create Many Objects

System wide unique nanosecond timestamps

Comparing Approaches to Durability in Low Latency Messaging Queues