Performance Tip: Rethinking Collection.toArray(new Type[0])

Introduction

Have you ever considered the performance implications of converting collections to arrays in Java? It's a common task; your chosen method can impact your application's efficiency. In this article, I will explore different approaches to toArray(), benchmark their performance, and determine which method is optimal for various scenarios.

The Challenge

Converting a Collection to an array seems straightforward, but the standard practice of using collection.toArray(new Type[0]) might not be the most efficient. Understanding the nuances of this method can help you write more performant code.

Exploring the Approaches

Let's delve into four primary methods and a combination for converting collections to arrays:

1. Using toArray() Without Arguments

Object[] array = { "Hello", "world" };
String[] strings = (String[]) array; // Throws ClassCastException at runtime

While this approach avoids additional array creation and can be fast, it lacks type safety and requires casting, leading to potential runtime exceptions.

2. Passing a Zero-Length Array: toArray(new Type[0])

A common practice involves passing a new zero-length array to the toArray() method.

String[] notifTypesArray = notifTypes.toArray(new String[0]);

This code creates a new zero-length array every time, incurring unnecessary allocation and reflection costs, especially in performance-critical applications.

3. Pre-Sizing the Array: toArray(new Type[collection.size()])

return (String[]) v.toArray(new String[v.size()]);

This method eliminates the need for toArray() to internally create a new array, enhancing performance for collections with known sizes.

4. Using a Constant Empty Array

private static final String[] NO_STRINGS = {};
// later
return s.toArray(NO_STRINGS);

This approach minimises array creation when the collection is empty but may introduce reflection overhead when elements are present.

5. Attempt to Get the Best of Both Worlds

return s.isEmpty() ? NO_STRINGS : (String[]) s.toArray(new String[s.size()]);
private static final String[] NO_STRINGS  = {};

This way, an empty array is reused whenever there are no results, and a variety of the correct size and type is used when the size is greater than or equal to one.

The Benchmark

To evaluate these methods, I conducted a benchmark using JMH (Java Microbenchmark Harness), available here.

Collections Tested

  • ArrayList: Sizes of 0, 3, 7, and 16 elements.
  • HashSet and TreeSet: Created from the same elements as the ArrayLists.

Benchmark Configuration

  • Warmup: 2 iterations, 1 second each.
  • Measurement: 3 iterations, 10 seconds each.
  • Threads: Configurable via -Dthreads, defaulting to 8.
  • Forks: 7 separate JVM instances for accurate results.

Results and Analysis

The benchmark results on an 8-core Ryzen 5950X were illuminating:

  • Throughput: Between 210 million and 450 million operations per second.
  • Margin of Error: Approximately 15 million ops/sec for HashSet and ArrayList, and about 40 million ops/sec for TreeSet.
Performance Comparison of toArray() Methods

Practical Recommendations

Based on the results:

  • Avoid toArray(new Type[0]): It introduces unnecessary overhead without significant benefits.
  • Leverage Constant Empty Arrays When Appropriate: If collections are frequently empty, reusing a constant can save resources.
  • Or Use Pre-Sized Arrays: toArray(new Type[collection.size()]) is efficient and straightforward.

Conclusion

Avoid using Collection.toArray(new Type[0]) if you can. It’s probably not worth changing your code for, but if you use another approach, go with whatever you consider simplest. For me, that means using the NO_STRINGS constant.

What details about the benchmark would you like to know in the comments or a follow-up post?

Have you faced performance issues with the toArray() methods? How did you tackle them? Share your experiences and join the discussion!

About the author

As the CEO of Chronicle Software, Peter Lawrey leads the development of cutting-edge, low-latency solutions trusted by 8 out of the top 11 global investment banks. With decades of experience in the financial technology sector, he specialises in delivering ultra-efficient enabling technology which empowers businesses to handle massive volumes of data with unparalleled speed and reliability. Peter's deep technical expertise and passion for sharing knowledge have established him as a thought leader and mentor in the Java and FinTech communities. Follow Peter on BlueSky or Mastodon

Comments

Popular posts from this blog

Java is Very Fast, If You Don’t Create Many Objects

System wide unique nanosecond timestamps

What does Chronicle Software do?