Trivially Copyable Objects in Java
Introduction
For low-latency systems, every microsecond has tangible business impact. In high-frequency trading, real-time analytics, and similarly time-sensitive workloads, even minor inefficiencies in serialisation and deserialisation can degrade throughput and responsiveness. The seemingly mundane act of converting objects into bytes and back often becomes a performance bottleneck.
This article explores how we can emulate a C++-like concept of Trivially Copyable Objects within Java to achieve far more efficient serialisation. By ensuring objects contain only fixed-size primitives, we can sidestep the traditional overheads of object graph traversal, reflection, and per-field copying. Instead, we can treat them as contiguous memory blocks, dramatically reducing the time taken to read and write data. We shall also consider how to get very close to this performance using Chronicle Bytes without strictly requiring trivial copyability. This technique blends the low-level efficiency with the safety and familiarity of Java’s ecosystem.
The Core Challenge of Java Serialisation
Most Java object graphs are composed of references linking scattered heap allocations. Serialising such objects typically involves visiting numerous memory locations, reading each field individually, and writing them out one at a time. This is akin to foraging around a warehouse for individual items whenever you need to pack a box. Although this might be fine at moderate data rates, it does not scale well when the volume and frequency of messages skyrocket.
In latency-critical domains—such as algorithmic trading, telemetry processing, or high-throughput messaging—this overhead is costly. Minimising the time to transform data from in-memory objects to a binary format and back can make the difference between meeting a service-level agreement (SLA) and missing it. The objective: reduce the cost of serialisation to something approaching a single memcpy
operation.
Emulating Trivially Copyable Objects in Java
C++ defines a Trivially Copyable Object as one that can be safely cloned by copying its memory representation directly (e.g. using memcpy
). This guarantees efficient and predictable memory layouts. Java does not expose this concept directly, but we can achieve a similar result.
If an object is composed solely of primitive fields—long
, int
, double
, etc.—with known fixed sizes, it resembles a flat, contiguous block of data. This allows us to perform a bulk copy of its contents in one go. The result is similar to how C++ handles trivially copyable types, but now we are in Java’s safe, garbage-collected environment.
Steps to Achieve a Contiguous Layout
- Restrict to Primitives Only: Avoid
String
,List
, or other reference fields. Instead, store identifiers, timestamps, and short textual representations as primitive longs (using converters like@ShortText
or@NanoTime
) to maintain a fixed-size layout. - Use Fixed-Size Fields: Ensure that all fields have a predictable size. This not only simplifies the copy but also makes the data layout predictable.
- Leverage Chronicle Utilities: Chronicle Bytes provides methods like
unsafeReadObject()
andunsafeWriteObject()
to perform bulk memory operations. These bypass the overhead of per-field reflection or loops.
By structuring your objects this way, you remove the CPU overhead of repeatedly checking field offsets and types. Instead, you treat the entire object as a known layout of bytes that can be slurped up or written out in a single shot.
Not every system can fully adopt trivially copyable objects. Some may need reference fields. In such cases, you can still improve performance incrementally:
- Explicit Serialisation Methods: By implementing custom
readMarshallable()
andwriteMarshallable()
methods, you can avoid reflection overhead. This alone can drastically reduce latency, as shown by the benchmarks below. - Direct Memory Access: For a moderate increase in complexity, explicitly reading and writing fields by their offsets using Chronicle Bytes lowers overhead even further. While more effort is required, this approach narrows the gap to trivial copyability.
- Full Trivial Copying: For the absolute best performance, treat the entire object as a single contiguous memory block. Bulk copying here is the key to near C++-style efficiency.
Code Examples
Consider a MarketData
DTO with many fields. A straightforward SelfDescribingMarshallable
class might rely on reflection or explicit read/write methods:
abstract class MarketData extends SelfDescribingMarshallable {
@ShortText
long securityId;
@NanoTime
long time;
int bidQty0, bidQty1, bidQty2, bidQty3;
int askQty0, askQty1, askQty2, askQty3;
double bidPrice0, bidPrice1, bidPrice2, bidPrice3;
double askPrice0, askPrice1, askPrice2, askPrice3;
// Getters and setters omitted for clarity
}
Default approaches often rely on reflection or self-describing fields. While convenient, this is not the fastest method. Explicitly coding each field read and write is faster:
public final class ExplicitMarketData extends MarketData {
@Override
public void readMarshallable(BytesIn bytes) {
securityId = bytes.readLong();
time = bytes.readLong();
bidQty0 = bytes.readDouble();
// ... repeated for all fields
}
@Override
public void writeMarshallable(BytesOut bytes) {
bytes.writeLong(securityId);
bytes.writeLong(time);
// ... repeated for all fields
}
}
For even lower overhead, we might write a DirectMarketData
class that manually calculates offsets:
public final class DirectMarketData extends MarketData {
@Override
public void readMarshallable(BytesIn bytes) {
BytesStore<?, ?> bs = ((Bytes<?>) bytes).bytesStore();
long position = bytes.readPosition();
// generated by GitHub Copilot
bytes.readSkip(112);
securityId = bs.readLong(position+0);
time = bs.readLong(position+8);
// ... repeated for all fields
}
@Override
public void writeMarshallable(BytesOut bytes) {
BytesStore<?, ?> bs = ((Bytes<?>) bytes).bytesStore();
long position = bytes.writePosition();
// generated by GitHub Copilot
bytes.writeSkip(112);
bs.writeLong(position+0, securityId);
bs.writeLong(position+8, time);
// ... repeated for all fields
}
}
Finally, a TriviallyCopyableMarketData
class uses Chronicle’s unsafeReadObject()
and unsafeWriteObject()
methods to perform a single bulk copy:
public final class TriviallyCopyableMarketData extends MarketData {
static final int START =
triviallyCopyableStart(TriviallyCopyableMarketData.class);
static final int LENGTH =
triviallyCopyableLength(TriviallyCopyableMarketData.class);
@Override
public void readMarshallable(BytesIn bytes) {
bytes.unsafeReadObject(this, START, LENGTH);
}
@Override
public void writeMarshallable(BytesOut bytes) {
bytes.unsafeWriteObject(this, START, LENGTH);
}
}
These methods bypass iterative per-field copying. Instead, they use knowledge of the object’s layout to copy memory in one go.
The Benchmark Results
Running benchmarks on a high-end CPU (e.g. a Ryzen 5950X) shows the progressive improvements:
Benchmark Mode Cnt Score Error Units
BenchmarkRunner.defaultWriteRead avgt 25 1204.359 ± 72.394 ns/op
BenchmarkRunner.defaultBytesWriteRead avgt 25 375.479 ± 6.066 ns/op
BenchmarkRunner.explicitWriteRead avgt 25 45.769 ± 0.661 ns/op
BenchmarkRunner.directWriteRead avgt 25 27.303 ± 0.867 ns/op
BenchmarkRunner.trivialWriteRead avgt 25 25.568 ± 0.228 ns/op
Here, trivialWriteRead
approaches raw memory copy speeds, slashing overhead by more than an order of magnitude compared to default approaches. The directWriteRead
is very close in terms of performance but isn't impacted by layout changes in the JVM.
Considerations and Caveats
- JVM Stability: While typically stable, relying on certain low-level assumptions may differ slightly between JVM versions or distributions. Test carefully if you need cross-JVM compatibility.
- Loss of Flexibility: Restricting fields to primitives means losing some convenience. Often, you can mitigate this by mapping strings or enumerations to integers, or converting short texts via
@ShortText
, and timestamps with@NanoTime
. - Schema Evolution: Changes to object structures require coordination. Both sender and receiver must remain compatible. Use versioning strategies and robust integration tests.
- Nearly Trivial Without Going Fully Trivial: If you cannot fully restrict yourself to primitives, consider direct copying of at least the performance-critical parts of the data, and handle the rest with explicit methods.
- Leverage Chronicle’s Tooling: Chronicle Bytes and Queue provide the building blocks. While they add complexity, the performance pay-off justifies it in latency-critical systems.
Key Points
- Treating objects as contiguous blocks of primitive fields significantly reduces serialisation overhead.
- Moving from self-describing, reflective approaches to explicit field reads/writes yields large gains.
- Using direct memory offsets or bulk copying is yet more efficient, approaching C++-like speeds.
- While not free of trade-offs, trivial copyability offers a compelling pattern for systems where latency and throughput trump convenience.
Try It Yourself
Why not measure the impact on your own workload? The benchmark harness is available here:
BenchmarkRunner.java on GitHub
Run it with JMH to see if trivial copyability can enhance your system’s performance. Experiment with different layouts, measure the impact, and adopt the approach incrementally.
This Article Is Based On...
This article is an update of two articles by Per Minborg How to Get C++ Speed in Java Serialisation and Did You Know the Fastest Way of Serializing a Java Field is not Serializing it at All? It builds on the original concepts and benchmarks, providing a fresh perspective on achieving ultra-low-latency Java systems.
Conclusion
Java may not natively support trivially copyable objects, but we can still achieve near C++-like serialisation speeds by restructuring data and using Chronicle’s low-level operations. By experimenting with these techniques and applying them judiciously, developers can build ultra-low-latency Java systems that confidently handle high-throughput workloads. If you have been searching for that extra edge in performance, give trivial copyability—or its direct-copy variants—a try. It might just be the key to unlocking new levels of efficiency.
Comments
Post a Comment