The Exchanger and GC-less Java
Overview
The Exchanger class is very efficient at passing work between thread and recycling the objects used. AFAIK, It is also one of the least used Concurrency classes.As @Marksim Sipos points out, if you don't need GC less logging using an ArrayBlockingQueue is much simpler.
Exchanger class
The Exchanger class is useful for passing data back and forth between two threads. e.g. Producer/Consumer. It has the property of naturally recycling the data structures used to pass the work and supports GC-less sharing of work in an efficient manner.Here is an example, passing logs to a background logger.
Work (a log entry) is batched into LogEntries and passed to a background thread which later passes it back to the thread so it can add more work. Provided the background thread is always finished before the batch is full, it is almost transparent. Increasing the size of the batch reduces how often the batch is full but increase the number of unprocessed entries waiting at any one time. Calling flush() can push out the data.
The key line is the following which exchanges the batch in the current thread with the batch in the other thread. The producer fills up the batch while the consumer is emptying it.
The exchange when it occurs typically takes 1-4 micro-seconds. In this case, once every 64 lines.
entries = logEntriesExchanger.exchange(entries);
How does this compare to the LMAX disruptor pattern
This approach has similar principles to the Disruptor. No GC using recycled, pre-allocated buffers and lock free operations (The Exchanger not completely lock free and doesn't busy wait, but it could)Two keys difference are:
- there is only one producer/consumer in this case, the disruptor supports multiple consumers.
- this approach re-uses a much smaller buffer efficiently. If you are using ByteBuffer (as I have in the past) an optimal size might be 32 KB. The disruptor library was designed to exploit large amounts of memory on the assumption it is relative cheap and can use medium sized (MBs) to very large buffers (GBs). e.g. it was design for servers with 144 GB. I am sure it works well on much smaller servers. ;)
If you have dozen logs files (for different purposes) and you want to minimise memory foot print and you prefer the consuming thread to be blocking rather than busy waiting which consumes 100% of a thread (which adds a small latency of up to 10 us) then the Exchanger is better suited.
Exchanger example
import java.util.concurrent.Exchanger; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; public class BackgroundLogger implements Runnable { static final int ENTRIES = 64; static class LogEntry { long time; int level; final StringBuilder text = new StringBuilder(); } static class LogEntries { final LogEntry[] lines = new LogEntry[ENTRIES]; int used = 0; } private final ExecutorService executor = Executors.newSingleThreadExecutor(); final Exchanger<LogEntries> logEntriesExchanger = new Exchanger<LogEntries>(); LogEntries entries = new LogEntries(); BackgroundLogger() { executor.submit(this); } public StringBuilder log(int level) { try { if (entries.used == ENTRIES) entries = logEntriesExchanger.exchange(entries); LogEntry le = entries.lines[entries.used++]; le.time = System.currentTimeMillis(); le.level = level; return le.text; } catch (InterruptedException e) { throw new RuntimeException(e); } } public void flush() throws InterruptedException { if(entries.used > 0) entries = logEntriesExchanger.exchange(entries); } public void stop() { try { flush(); } catch (InterruptedException e) { e.printStackTrace(); // use standard logging. } executor.shutdownNow(); } @Override public void run() { LogEntries entries = new LogEntries(); try { while (!Thread.interrupted()) { entries = logEntriesExchanger.exchange(entries); for (int i = 0; i < entries.used; i++) { bgLog(entries.lines[i]); entries.lines[i].text.delete(0, entries.lines[i].text.length()); } entries.used = 0; } } catch (InterruptedException ignored) { } finally { System.out.println("Warn: logger stopping."); // use standard logging. } } private void bgLog(LogEntry line) { // log the entry to a file. } }
Shouldn't you be using Exchanger ?
ReplyDeleteOops, it lost my angle brackets. That should have been Exchanger<LogEntries>
ReplyDelete@Don, Thank you. I forgot generics appears as HTML tags which are ignored even if they are in a <pre> block.
ReplyDeleteFor the purpose of having the Producer produce log entries, and Worker consume them, wouldn't it be more appropriate to use ArrayBlockingQueue?
ReplyDelete@Maksim, A good question.
ReplyDeleteArrayBlockingQueues are simpler however, you have to create a new objects (3 in this example) to add to the queue and every time the consumer has to wait it adds another object (to record its a waiting thread)
There is no simple way to recycle the objects by returning them back to the producer without creating more objects. (Defeating the purpose)
If you want a GC-less application, (though most people don't) the Exchanger is a better solution.
@Maksim, I have added a comment at the start to include your point.
ReplyDelete@Peter: Thanks, I missed the point about GC.
ReplyDeleteGood post and interesting blog!
This reminds of an article about a Disruptor framework I read awhile ago http://code.google.com/p/disruptor/
ReplyDeleteBasically, the idea that makes the Disruptor so fast (the authors claim, not me) is that it uses:
1. Lock-free data structure (queue)
2. Avoid GC by create a huge array with entries created initially and *only* modify the entries. By not having to create/delete the entries the Disruptor thus avoids GC.
@Doug, I have added a section on a comparison with the Disruptor framework.
ReplyDelete@Peter, i didn't get your point about BlockingQueue.
ReplyDeleteWhat are those 3 objects? Are they details of queue implementation? waiting for produced entries to consume also good for me. In BackgroundLogger, if producer is faster it will ends with OutOfBounds exception in log(), which is unlikely desirable.
Thank you, Alex
@Alexander, The three objects in this example are the objects added to the queue. In your case it might not be 3, but it will be a minimum of one. The consumer added object is due to the queue implementation.
ReplyDeleteIf the producer is consistently too fast for the consumer, you have a design problem. You are likely to be trying to write logs faster than your underlying hardware can support. In this case buy a fast disk subsystem, or a better solution is to write less logs.
If you have high bursts of activity, the buffer should be large enough that the producer is not waiting for the consumer. On the rare occasion, the buffer is still not big enough, the producer will block instead of the consumer. This may be undesirable, but making the buffer too large can also be undesirable.
Peter,
ReplyDeleteIn the log( ... ), shouldn't these operations be atomic?
LogEntry le = entries.lines[entries.used++];
le.time = System.currentTimeMillis();
le.level = level;
Cheers
Vic
@Unknown excellent question. The way it works is that each thread has it own buffer its working on. When exchange() is called the buffers are swapped.
ReplyDelete"If the producer is consistently too fast for the consumer ..."
ReplyDeletePeter, I am curious if writing faster than the underlying hardware is a likely scenario?
As a test, I took your sample code and incessantly called the log( .. ) a million times.
My backing LogEntries array had a capacity of 1000 and buffers were exchanged when 25% of the buffer was full (rather than waiting till the buffer was 100% full).
Further, I was logging to the console (which is slower than logging to a File) and it worked perfectly.
Perhaps one could fill the buffer if multiple threads are calling log ( ... ), even then one can increase the buffer size and exchange quickly ( when 15% of the buffer is full? ).
Any thoughts?
Cheers,
Vic
Exchanging when the buffer is 25% full is the same as exchange when the buffer is 100% but 1/4 of the size.
ReplyDeleteIf the producer is faster than the consumer, the producer is forced to wait until the consumer is ready to exchange. Similarly if you write directly to the screen, the producer is slowed down to never be far ahead of the consumer.
In some models you can't allow the producer to be delayed and you need to use a different model.
Peter,
ReplyDeleteWhat modification would it require to support multiple producers?
Thanks