Posts

Showing posts from July, 2013

Tutorial on writing and monitoring low latency, high throughput systems in Java.

Image
I am providing a three hour tutorial at JAX London this year. In a short presentation similar to the one I will be giving at JavaOne 2013, I will be covering such topics as GC free, lockless coding Low latency coding (less than 10 microseconds ) High throughput (Over 100K request/responses per second on a laptop) Using Chronicle for low latency persistence Using sun.misc.Unsafe Using System.nanoTime() between machines Use cases in finance All using Apache 2.0 open source software. i.e. nothing to buy. About 80% of the time will be hands on examining a demonstration program which you can run on a laptop, lifting the bonnet and learning how to extend it for your purposes. What you should get out of this is to see how being able to log everything you could want changes your architecture and the way you test and monitor your applications in production (in particular its performance) If you are going to JAX London and you want to attend, sign up quick because this sessio

C++ like Java for low latency

Overview Previously I wrote an article on C like Java.  This is term I had come across before. However, on reflection I thought C++ like Java is a better term as you still use OOP practices, (which not C-like) but you put more work into managing and recycling memory yourself. The term I favour now is "low level" Java programming. Low latency and "natural" Java Many low latency systems are still programmed in what I call natural Java.  In most cases, this is the simplest and most productive.  The problems really start when you have large heap sizes and low latency requirements as these don't work well together.   If you have an established program and a rewrite is not reasonable, you need something like Zing which has a truly concurrent collector. Although you may see worst pauses of 1-4 ms this as about as good as a non real time system will get.  RedHat is rumoured to be producing a concurrent collector but as Oracle have found with G1, writing an e

How do I calculate the remainder for extremely large exponential numbers using Java?

This StackOverflow question asks         How do i calculate the remainder for extremely large exponential numbers using java ?         eg. (48^26)/2401 I though the answer was worth reproducing as there is a surprisingly simple solution. An answer is to modulus the value in each iteration of calculating the power ( a * b ) % n (( A * n + AA ) * ( B * n + BB )) % n | AA = a % n & BB = b % n ( A * B * n ^ 2 + A * N * BB + AA * B * n + AA * BB ) % n AA * BB % n since x * n % n == 0 ( a % n ) * ( b % n ) % n In your case, you can write 48 ^ 26 % 2401 ( 48 ^ 2 ) ^ 13 % 2401 as int n = 48 ; for ( int i = 1 ; i < 26 ; i ++) n = ( n * 48 ) % 2401 ; System . out . println ( n ); int n2 = 48 * 48 ; for ( int i = 1 ; i < 13 ; i ++) n2 = ( n2 * 48 * 48 ) % 2401 ; System . out . println ( n2 ); System . out . println ( BigInteger . valueOf (

OpenHFT Java Lang project

Overview OpenHFT/Java Lang started as an Apache 2.0 library to provide the low level functionality used by Java Chronicle without the need to persist to a file. This allows serializable and deserialization of data and random access to memory in native space (off heap)  It supports writing and reading enumerable types with object pooling. e.g. writing and reading String without creating an object (if it has been pooled).  It also supports writing and read primitive types in binary and text without creating any garbage. Small messages can be serialized and deserialized in under a micro-second. Recent additions Java Lang supports a DirectStore which is like a ByteBuffer but can be any size (up to 40 to 48-bit on most systems)  It support 64-bit sizes and offset. It support compacted types, and object serialization. It also supports thread safety features such as volatile reads, ordered (lazy) writes, CAS operations and using an int (4 bytes) as a lock in native memory. Testi

ext4 slower than tmpfs

While you might expect ext4 to be slower than tmpfs, I found it is slower to have buffered writes to ext4 than tmpfs. i.e. The writes are not actually made to disk in either case. I have a test which writes data/messages via a shared memory file as an IPC between two threads.  The data written is smaller than the dirty data cache size.  I have used these kernel parameters on a machine with 32 GB. The test only writes 8 - 12 GB so it should fit within space easily     vm.dirty_background_ratio = 50     vm.dirty_ratio = 80 I have mounted the file system with the following.  The disk is an PCI SSD, but this shouldn't make a difference.      noatime,data=writeback,barrier=0,nobh This is the results I get (using a beta version of Chronicle 2.0) filesystem tiny   4 bytes   small    16 byte     medium 64 byte large    256 byte    tmpfs  185 M msg/sec 96 M msg/sec 30.7 M msg/sec 10.9 M msg/sec   ext4 148 M msg/sec 71 M msg/sec 17.7 M msg/sec 7.2 M msg/sec

Micro jitter, busy waiting and binding CPUs

Image
Performance profiling a new machine When I work on a new machine, I like to get an understanding of it's limitations.  In this post I am looking at the jitter on the machine and the impact of busy waiting for a new PC I built this weekend. The specs for the machine are interesting but not the purpose of the post.  Never the less they are: i7-3970X six core running at 4.5 GHz (with HT turned on) 32 GB of PC-1600 memory An OCZ RevoDrive 3, PCI SSD (actual write bandwidth of 600 MB/s) Ubuntu 13.04 Note: the OCZ RevoDrive is not officially supported on Linux, but is much cheaper than their models which are. Tests for jitter My micro jitter sampler looks at interrupts to a running thread.  It is similar to jHiccup  but instead of measuring how delayed a thread is in waking up, it measures how delays a thread gets once it has started running.  Surprisingly how you run your threads impacts the sort of delays it will see once it wakes up. This chart is a bit dense.  I