Recycling objects to improve performance

Overview

In a previous article I stated that the reason the deserialisation of objects was faster was due to using recycled objects. This is potentially surprising for two reasons, 1) the belief that creating objects is so fast these days, it doesn't matter or is just as fast as recycling yourself, 2) None of the serialisation libraries use recycling by default.

This article explores deserialisation with and without recycling objects. How it not only is slower to create objects, but it slows down the rest of your program by pushing data out of your CPU caches.

While this talks about deserialisaton, the same applies to parsing text or reading binary files, as the actions being performed are the same.

The test

In this test, I deserialise 1000 Price objects, but also time how long it takes to copy a block of data. The copy represents work which the application might have to perform after deserialising.

The test is timed one million times and those results sorted. The X-Axis shows the percentile timing. e.g. the 90% values is the 90% worst value. (or 10% of values are higher)

As you can see, the deserialisation take longer if it has to create objects as it goes, however sometimes it takes much much long. This is perhaps not so surprising as creating objects means doing more work and possibly being delayed by a GC. However, it is the increase in the time to copy a block of data which is surprising. This demonstrates that not only is the deserialisation slower, but any work which needs the data cache is also slower as a result. (Which is just about anything you might do in a real application)
Performances tests rarely show you the impact on the rest of your application.

In more detail

Examining the higher percentile (longest times) you can see that the performance consistently bad if the deserialisation has to wait for the GC.
And the performance of the copy increases significantly in the worst case.

The code

Recycling example code

Comments

  1. Hi, could you please explain what the x-axis in these graphs is?

    Cheers!

    ReplyDelete
  2. @Maksim Sipos, I have added this comment to the post

    "The test is timed one million times and those results sorted. The X-Axis shows the percentile timing. e.g. the 90% values is the 90% worst value. (or 10% of values are higher)"

    ReplyDelete
  3. This is an interesting benchmark.

    The thing I always try to keep in mind after reading an article like this is how this applies to a real life scenario in which your application is doing other things.

    You quote a common java developer belief: "creating objects is so fast these days, it doesn't matter or is just as fast as recycling yourself". I guess this is probably true in mose use cases.

    ReplyDelete
  4. @Andre, When I code I make sure all the data structures taken from input e.g. Sockets are recyclable with the convention that any mutable data to be retained has to be copied.

    Creating immutable objects is often simpler and fast enough for most use cases. But if you are creating too much garbage or need to go faster, there is something you can do about it.

    ReplyDelete
  5. Good article, but a question: even when you call 'readResolve', the object is already created through deserialization process, right? i.e., readResolve being an instance method, it has to be called on an object which means the process does create an object; so object reuse will only help reduce the cost of data copy, or am I missing something?

    ReplyDelete
  6. readResolve only reduces the size of the resulting object. It does mean you create temporary objects but these are relatively cheap.

    In this benchmark I use completely custom serialization and don't use ObjectInput/OutputStream so I don't have this restriction.

    ReplyDelete

Post a Comment

Popular posts from this blog

Low Latency Microservices, A Retrospective

Unusual Java: StackTrace Extends Throwable

System wide unique nanosecond timestamps