Showing posts from November, 2011

Java puzzle, low latency queue

There a number of ways to use queues. A simple and very fast queue is to use AtomicReference. (It stores 0 or 1 element) 1. Write a producer and consumer thread which communicates via an AtomicReference. [Click for sample solution] Took an average of 48 ns per object public class Main { public static void main(String... args) throws InterruptedException { AtomicReference<Long> queue = new AtomicReference<Long>(); final Consumer<Long> consumer = new Consumer<Long>(queue); Thread t = new Thread(new Runnable() { @Override public void run() { while(consumer.take() >= 0); } }); t.start(); Producer<Long> producer = new Producer<Long>(queue); long start = System.nanoTime(); int runs = 10*1000*1000; for(long i=0;i<runs;i++) producer.push(i); producer.push(-1L); // poison pill t.join();

Low Latency Slides

Last weekend was LJC Open Conference #4, and like many people I got a lot out of it. My talk was up first which meant I could relax for the rest of the day. Here are the slides Note: the message size was 16 longs or 128 bytes. This makes a difference at higher throughputs. In answer to @aix's questions On slide 12; what are we measuring here, elapsed time from the message hitting the buffer to the "echo" reply showing up on the client socket? In each message I add a timestamp as it is written. When each message is read, I compare the timestamp with the current time. I sort the results and take the middle / 50%tile as typical timings and the worst 0.01% as the 99.99%tile. What's causing the large difference between "nanoTime(), Normal" and "RDTSC, Normal" in the bottom half of the slide (2M/s)? The reason for taking the RDTSC directly (9 ns) is that System.nanoTime() is quite slow on Centos 5.7 (180 ns) and the later is a system calls

Ever decreasing cost of main memory

I have blogged about the relative cost of memory to the cost of your time. Memory in servers is so cheap that for small amounts of memory, it just isn't worth your time to introduce complexity into your application to try to save it. The average salary for a developer in the UK is £40,000 and the average cost of memory is £5 per GB. The cost to your organisation for both is higher and if you have many machines saving a 1 MB on 1000 machines saves 1 GB, however memory is so divisible you can quickly run into small amounts of memory which are not worth your time saving. Hard disk space is even cheaper and most people get its not worth keeping your disks completely clean all the time, because the cost of deleting a file you later need is high, but also the value of the space you save is not worth your time. A typical 2 TB of mirrored disk in a network appliance costs about £300. Your time Main memory to save in total     Disk space to save Blink of an eye (20 ms) 20 KB 1.7 MB

Stupid performance trick in Java

ByteBuffer has a compact() method which is useful for re-using a buffer if it is not full consumed. However, in the likely case it is consumed, it can perform surprising badly. while ( >= 0 || buf.position != 0) { buf.flip(); out.write(buf); buf.compact(); // In case of partial write } In my application, the throughput increased by 6% by replacing compact() with this method. public static void compact(ByteBuffer bb) { if (bb.remaining() == 0) bb.clear(); else bb.compact(); } Note: this is not the increase in a micro-benchmark, but across the entire application! Your Mileage May Vary of course

Java puzzle (Single threaded client server)

Without creating additional threads, create a ServerSocket on a random unused port, a client Socket connecting to localhost and a server Socket. Part 1) Send messages from client to server and back again. (This can be useful in unit tests) [Click for sample solution] ServerSocket ss = new ServerSocket(0); Socket c = new Socket("localhost", ss.getLocalPort()); Socket s= ss.accept(); ss.close(); // sends one byte. c.getOutputStream().write(1); int num = s.getInputStream().read(); s.getOutputStream().write(num+1); int num2 = c.getInputStream().read(); s.close(); c.close(); Part 2) Do the same with blocking NIO. [Click for sample solution] ServerSocketChannel ss =; ss.socket().bind(new InetSocketAddress(0)); SocketChannel c = InetSocketAddress("localhost", ss.socket().getLocalPort())); SocketChannel s = ss.accept(); ss.close(); // sends one byte. ByteBuffer bb1 = ByteBuffer.allocateDirect(1); c.write(

Java puzzle (Sum of even Fibonacci numbers)

Write a function which calculates the sum of even Fibonacci numbers. public static void main(String... args) { for(int i=1;i<22;i++) System.out.print(sumEvenFibonacci(i) + " "); } public static long sumEvenFibonacci(int n) { // add code here. return sum; } prints 0 0 2 2 2 10 10 10 44 44 44 188 188 188 798 798 798 3382 3382 3382 14328 For bonus points, optimise the sumEvenFibonacci method so it contains only one condition and doesn't test for even or odd. [Click for sample solution] public static long sumEvenFibonacci(int n) { long a = 1, b = 1, sum = 0; for (int i = n; i > 2; i -= 3) { long c = a + b; sum += c; a = b + c; b = a + c; } return sum; }

How arbitary is the start of System.nanoTime()?

System.nanoTime() is defined as having an arbitrary start time. However, there is no reason to make it random and it is based on an unspecified system call. On Centos 5.7 and other Linuxes I have tried, this appears to be based on the uptime. System.out.println("/proc/uptime " + new Scanner(new FileInputStream("/proc/uptime")).next()); System.out.println("System.nanoTime " + System.nanoTime() / 1e9); prints /proc/uptime 265671.85 System.nanoTime 265671.854834206

Why Concurrency examples are confusing

Overview I was recently at a talk were the audience was asked who used concurrency and who found it confusing. Nearly all who use concurrency found it confusing. This was a little surprising to me, and I was left wondering what the reason might be. I have been thinking about writing a book, so I have been watching a number of web talks and reading a number of books which talk about concurrency and I think I know at least one reason. Bad Examples The problem is; in a talk or a book you need a simple, easily understood example. Unfortunately, simple and easily understood examples almost always make very BAD examples for concurrency. What makes them bad examples is that concurrency code comes with an overhead both to the developer in terms of complexity of the code and in CPU performance. Simple examples are likely to have far more overhead than the work done making the performance of the concurrent example far worse than just writing it single threaded and also the code more c