Posts

Showing posts from November, 2024

Storing 1 TB in Virtual Memory on a 64 GB Machine with Chronicle Queue

Image
As Java developers, we often face the challenge of handling very large datasets within the constraints of the Java Virtual Machine (JVM). When the heap size grows significantly—often beyond 32 GB—garbage collection (GC) pause times can escalate, leading to performance degradation. This article explores how Chronicle Queue enables the storage and efficient access of a 1 TB dataset on a machine with only 64 GB of RAM. The Challenge of Large Heap Sizes Using standard JVMs like Oracle HotSpot or OpenJDK, increasing the heap size to accommodate large datasets can result in longer GC pauses. These pauses occur because the garbage collector requires more time to manage the larger heap, which can negatively impact application responsiveness. One solution is to use a concurrent garbage collector, such as the one provided by Azul Zing , designed to handle larger heap sizes while reducing GC pause times. However, this approach may only scale well when the dataset is within the available main ...

Unveiling Floating-Point Modulus Surprises in Java

When working with double in Java, floating-point representation errors can accumulate, leading to unexpected behaviour—especially when using the modulus operator. In this article, we'll explore how these errors manifest and why they can cause loops to terminate earlier than anticipated. The Unexpected Loop Termination Consider the following loop: Set<Double> set = new HashSet<>(); for (int i = 0; set.size() < 1000; i++) { double d = i / 10.0; double mod = d % 0.1; if (set.add(mod)) { System.out.printf("i: %,d / 10.0 = %s, with %% 0.1 = %s%n", i, new BigDecimal(d), new BigDecimal(mod)); } } At first glance, this loop should run indefinitely. After all, the modulus of d % 0.1 for multiples of 0.1 should always be zero, right? Surprisingly, this loop completes after 2,243 iterations, having collected 1,000 unique modulus values. How is this possible? The full code is available on GitHub. Understanding Flo...

Thread Safety Issues with Vector and Hashtable

Considered legacy since Java 1.2 (1998), Vector and Hashtable remain prevalent in many Java applications. A common belief is that their synchronised methods render them inherently thread-safe. However, this assumption can lead to subtle concurrency problems that are often overlooked. Have you encountered unexpected exceptions when iterating over a Vector in a multi-threaded environment? Let’s delve into why this happens and how to address it. The Misconception of Thread Safety Both Vector and Hashtable synchronise individual method calls, leading many developers to assume that these classes are safe to use concurrently without additional synchronisation. While each method is thread-safe, combining multiple method calls can introduce race conditions if not properly managed. Are Iterators Thread-Safe? The Iterator is not thread-safe for most collections, including Vector . Iterators are designed to fail fast by throwing a ConcurrentModificationException when t...

Exceptional Exception, StackTrace extends Throwable

Exploring Surprising Properties of Extending Throwable in Java In Java, most developers are familiar with extending Exception or Error to create custom exceptions. However, directly extending Throwable can lead to surprising and potentially useful behaviours. In this article, we'll delve into the nuances of extending Throwable and explore practical applications that can enhance debugging and monitoring in Java applications. The example code is available here Extending Throwable At first glance, extending Throwable might seem unusual. Unlike Exception , which is checked, or Error , which is unchecked, Throwable itself can be extended to create a new checked throwable that is neither an exception nor an error. public class MyThrowable extends Throwable { } public static void main(String... args) throws MyThrowable { throw new MyThrowable(); // Must be declared or caught } In this example, MyThrowable is a checked throwable, and the compiler enforces that it mu...

StringBuffer is Dead, Long Live StringBuffer

When Java 5.0 was released on 30 th September 2004, it introduced StringBuilder as a replacement for StringBuffer in cases where thread safety isn't required. The idea was simple: if you're manipulating strings within a single thread, StringBuilder offers a faster, unsynchronized alternative to StringBuffer . This is an updated article from 2011 From the Javadoc for StringBuilder : This class provides an API compatible with StringBuffer , but with no guarantee of synchronization. This class is designed for use as a drop-in replacement for StringBuffer in places where the string buffer was being used by a single thread (as is generally the case). Where possible, it is recommended that this class be used in preference to StringBuffer as it will be faster under most implementations. Is StringBuffer Really Dead? You might think that StringBuffer has become redundant, given that most single-threaded scenarios can use StringBuilder , and thread safety often require...

What can make Java code go faster, and then slower?

It's well-known that the JVM optimises code during execution, resulting in faster performance over time. However, less commonly understood is how operations performed before a code section can negatively impact its execution speed. In this post, I'll use practical examples to explore how warming up and cooling down code affects performance. The code is available here for you to run Warming Up Code When code is executed repeatedly, the JVM optimises performance. Consider the following code snippet: int[] display = {0, 1, 10, 100, 1_000, 10_000, 20_000, 100_001}; for (int i = 0; i <= display[display.length - 1]; i++) { long start = System.nanoTime(); doTask(); long time = System.nanoTime() - start; if (Arrays.binarySearch(display, i) >= 0) System.out.printf("%,d: Took %,d us to serialise/deserialise GregorianCalendar%n", i, time / 1_000); } This code measures the time taken to execute doTask() over multiple iterations, printing...

Overly Long Class Names in Java or Geeky Poem?

In Java development, clear and concise naming conventions are essential for code readability and maintainability. However, sometimes, we stumble upon class names that stretch the limits of practicality. One such example is InternalFrameTitlePaneMaximizeButtonWindowNotFocusedState . But did you know that in Java 6, this class name was even longer? Within the Java 6 JRE, there's a class with an astonishingly lengthy name: com.sun.java.swing.plaf.nimbus.InternalFrameInternalFrameTitlePaneInternalFrameTitlePaneMaximizeButtonWindowNotFocusedState This mouthful appears to be the product of a code generator that needed to be reviewed, leading to redundant and cumbersome naming. Or is it a geeky poem buried in the code? InternalFrame InternalFrame Title Pane, Internal Frame Title Pane. Maximize Button Window, Not Focused State. The moral of the story is always check the readability/sanity of generated code. In this Hacker News Discussion another class was also consid...

Unexpected Full GCs Triggered by RMI in Latency-Sensitive Applications

We observed an unexpected increase in Full Garbage Collections (Full GCs) while optimising a latency-sensitive application with minimal object creation. Despite reducing the frequency of minor GCs to enhance performance, the system began to exhibit hourly periodic pauses due to Full GCs, which was counterintuitive. Investigating the Source of Full GCs Upon closer examination, we discovered that the Java Remote Method Invocation (RMI) system was initiating Full GCs every hour. Specifically, the RMI Distributed Garbage Collector (DGC) checks if a GC has occurred in the last hour and, if not, forces a Full GC. This behaviour occurs even if the application does not actively use RMI, leading to unnecessary performance overhead. Understanding RMI's Impact on Garbage Collection The RMI DGC collects periodic garbage to clean up unused remote objects. By default, it is configured to trigger a Full GC if none has occurred within a specified interval (defaulting to one hour). This me...

How SLOW can you read/write files in Java?

A common question on Stack Overflow is: Why is reading/writing from a file in Java so slow? What is the fastest way? The discussion often revolves around comparing NIO versus IO . However, the bottleneck is usually not the read/write operations themselves, and the specific approach often has little significance in the bigger picture. To demonstrate, I’ll show one of the simplest (and perhaps slowest) ways to read/write text, using PrintWriter and Files.lines(Path) . The code is available here While it’s slower than writing binary using NIO or IO , it’s fast enough for most typical use cases. Example Output The program on a Ryzen 5950X running Linux outputs: Run 1, Write speed: 0.900 GB/sec, read speed 0.832 GB/sec Run 2, Write speed: 0.918 GB/sec, read speed 1.208 GB/sec Run 3, Write speed: 0.933 GB/sec, read speed 1.197 GB/sec If you find that 900 MB/s is more than fast enough for your application, the specific method of reading/wri...

Advanced Applications of Dynamic Code in Java

Dynamic code compilation and execution in Java offer powerful capabilities that can enhance application flexibility and performance. Back in 2008, I developed a library called Essence JCF , which has since evolved into the Java Runtime Compiler . Initially, its purpose was to load configuration files written in Java instead of traditional XML or properties files. A key advantage of this library is its ability to load classes into the current class loader, allowing immediate use of interfaces or classes without the need for reflection or additional class loaders. Why Use Dynamic Code Compilation? While dynamic code compilation didn't initially solve a pressing problem, over time, several practical use cases have emerged where it proves particularly beneficial: 1. Objects in Direct Memory By generating code dynamically, you can build data stores from interfaces that are either row-based or column-based, stored in the heap or direct memory. This approach reduces the number ...

Two Overlooked Uses of Enums in Java

Enums in Java are commonly used to represent a fixed set of constants. However, they offer more versatility than often realized. In this article, we'll explore two practical yet often overlooked uses of enums: creating utility classes and implementing singletons. 1. Using Enums as Utility Classes Utility classes contain static methods and are not meant to be instantiated. A typical approach is to define a class with a private constructor to prevent instantiation. Enums provide a more straightforward way to achieve this by leveraging their inherent characteristics. Here's how you can define a utility class using an enum: public enum MyUtils { ; public static String process(String text) { // Your utility method implementation return text.trim().toLowerCase(); } } By declaring an enum with no instances (note the semicolon after the enum name), you prevent instantiation naturally. This approach simplifies the code and clearly indicates tha...

How to avoid using a Triple Cast

OMG: Using a Triple Cast We’ve all faced situations where a simple task spirals into unexpected complexity. In 2010, I encountered such a scenario and wrote a triple cast! 😅 The Problem: Default Values for Primitive Types Dealing with default values for various Java primitive types can be surprisingly tricky. Primitives each have a well-defined default— 0 for numeric types, false for boolean , and '\u0000' (the null character) for char . Yet, when working with generics and requiring a "default value" factory, we may be tempted to do something unorthodox. I needed a method to return a given type’s default value. Here’s the first approach I wrote at the time: public static <T> T defaultValue( Class <T> clazz) { if (clazz == byte .class) return (T) ( Byte ) ( byte ) 0 ; // Other primitive types handled... return null ; } Yes, this is casting madness! Let’s break it down: (byte) 0 : Initialises the ...

Calculating an Average Without Overflow: Rounding Methods

Calculating the midpoint between two integers may seem trivial, but the naive approach can lead to overflow errors. Code sample MidpointCalculator is available here: Code sample MidpointCalculator . The classic midpoint formula: int m = ( h + l ) / 2 ; This is prone to overflow if h and l are large, causing the result to be incorrect. This bug appears in many algorithms, including binary search implementations. Understanding the Problem of Overflow In Java, the int type has a fixed range from -2,147,483,648 to 2,147,483,647 . If h and l are large, their sum might exceed this range, leading to overflow. When overflow occurs, Java wraps around the result to the negative range without warning, causing unpredictable results. Safer Approaches to Calculate a Midpoint Several alternative methods can be employed to circumvent the overflow issue. Below, we discuss three approaches, each with merits and use cases. 1. Using a Safer Formula A well-...