Showing posts from July, 2011

Synchronized vs Lock performance

Overview There are a number of articles on whether synchronized or Locks are faster. There appear to be opinions in favour of either option. In this article I attempt to show how you can achieve different views depending on how you benchmark the two approaches. I have included AtomicInteger to see how a volatile field compares. What are some of the differences The synchronized keyword has naturally built in language support. This can mean the JIT can optimise synchronised blocks in ways it cannot with Locks. e.g. it can combine synchronized blocks. The Lock has additional method such as tryLock which is not an option for a synchronized block. The test As the JIT can optimise synchronized in ways a Lock cannot, I wanted to have a test which might demonstrate this and one which might "fool" the JIT. In this test, 25 million locks were performed between each of the threads. Java 6 update 26 was used. Threads 1x synch 1x Lock 1x AtomicInteger 2x synch 2x Lock

C++ or Java, which is faster for high frequency trading?

Overview There are conflicting views as to what is the best solution for high frequency trading. Part of the problem is that what is high frequency trading varies more than you might expect, another part is what is meant by faster. My View If you have a typical Java programmer and typical C++ programmer, each with a few years experience writing a typical Object Oriented Program, and you give them the same amount of time, the Java programmer is likely to have a working program earlier and will have more time to tweak the application. In this situation it is likely the Java application will be faster. IMHO. In my experience, Java performs better at C++ at detecting code which doesn't need to be done. esp micro-benchmarks which don't do anything useful. ;) If you tune Java and C++ as far as they can go given any amount of expertise and time, the C++ program will be faster. However, given limited resources and in changing environment a dynamic language will out perform. i.e.

How fast are Java Datagrams?

Overview Following my article How fast are Java sockets , this article follows the same tests except for Datagrams which use UDP rather than TCP. The timings The tests are the same except Datagrams don't support busy waiting in Java 6 which hurts the Threaded Ping latency UDP Pings per second 224 K/s UDP Pings latency was 1/50/99%tile 4.1/4.2/4.7 us Threaded UDP Pings per second 131 K/s Threaded UDP Pings latency was 1/50/99%tile 9.8/11.0/33.2 us Comparison Test  Threaded  Throughput Typical latency Datagram Ping no 224 K/s  4.2 μs Socket Ping no 170 K/s  5.8 μs Datagram Ping yes 131 K/s 11.0 μs Socket Ping yes 235 K/s 8.5 μs The Code Related articles How fast are Java sockets Send XML over a Socket fast

Why are the keys of a Map in a jumbled order

Overview When you first try to printout a HashMap, the order doesn't make any sense and when you add entries, they appear to jump around. What is going on? The order of Maps The order keys should appear is not defined in many implementations. Hash map uses a hash of the key to place the key/value in a pseudo random place in the underlying store (an array) How it does this is different in different implementations. For HashMap the size of the Map is always a power of 2. For Hashtable the size grows 11, 23, 47, 95. For LinkedHashMap, the order will be the order the keys were added and for TreeMap & ConcurrentSkipList, the order is based of the Comparable.compareTo() result (asciibetical order for String) For Hashtable and HashMap changing the initial size changes how the keys are hashed and thus their order public static void main(String... args) { populate(new Hashtable ()); populate(new Hashtable (47), " (47)"); populate(new Hashtable (95), &q

Size of an entry in a Map

Overview The have been some very good articles on the size of a map. However as a map grows, it initial size become less important and the size per entry is what matters. How are the sizes measured In these tests an int key and long values are used. This adds a small but realistic size to each entry. Size per entry of a medium sized Map The following are the size per entry in bytes. The Map has 1024 entries. Type of Map 32-bit 64-bit compressed 64-bit not compressed TIntLongHashMap 26.9 26.9 27.0 FastMap (recycled) 32.0 39.9 47.9 IdentityHashMap 48.0 56.0 80.0 ConcurrentSkipListMap 68.3 76.1 108.3 TreeMap 64.0 80.0 112.0 HashMap 64.0 80.0 112.0 SynchronizedMap 64.0 80.0 112.0 ConcurrentHashMap 65.2 81.4 114.0 Properties 68.0 84.0 120.0 Hashtable 68.0 84.0 120.0

Send XML over a Socket fast

Overview XML parsers are designed to handle any type of XML message you can send. However if you assume the XML will always follow a specific format how fast can you make it. This follows my article on How fast are Java sockets The format For this article I have written a self contained test which sends a heartbeat request and which expect a heartbeat response. <Root timestamp="{timestamp}"><HeartbeatRequest sequence="{sequence}"/></Root> <Root timestamp="{timestamp}"><HeartbeatResponse sequence="{sequence}"/></Root> In this test, the sequence number contains the System.nanoTime(). When the client gets back the heartbeat response it can calculate the round trip time. Messages are parsed and a listener interface called. static interface EventListener { void heartbeatRequest(long timestamp, long sequenceNumber); void heartbeatResponse(long timestamp, long sequenceNumber);

How fast are Java sockets

Overview How long a request/response takes and the rate requests can be performed in a Java application depends on a number of factors. The network, the network adapter, Java Socket and TCP layer, and what your application does. Usually the last factor is your limitation. What if you wanted to test the overhead Java/TCP contributes, here is a way you can test this. Latency and throughput The Latency, in this test, is the Round Trip Time (sometimes written as RTT) This is the time between sending a request and receiving the response. This includes the delay on the client side, the transport and the delay on the server side. The Throughput is a measure of how many request/responses can be performed in a given amount of time. How long each request/response takes is not measured, and has no impact unless its really large. The Results These results are for a fast PC doing nothing but pass data back and forth over loopback. This will be one of the limiting factors to using Java

Java: How much memory do different arrays consume

Overview Arrays can be large and consume a significant amount of memory. It can me worth choosing the most memory efficient array/collection. Comparing array sizes How much space does a `new int[1024]` take compared with an `new Integer[1024]`? int[] ints = new int[1024]; for (int i = 0; i < ints.length; i++) ints[i] = i; compared with Integer[] ints = new Integer[1024]; for (int i = 0; i < ints.length; i++) ints[i] = i; Note: 1/8th of Integer values will come from the auto-box cache and not use additional memory. All possible Boolean and Byte values are cached. array size in bytes 32-bit JVM size in bytes 64-bit JVM new BitSet(1024) 168 168 new boolean[1024] 1040 1040 new Boolean[1024] 4112 4112 new ArrayList<Boolean>(1024) 4136 4136 new LinkedList<Boolean>() with 1024 24624 24624 new byte[1024] 1040 1040 new Byte[1024] 4112 4112 new ArrayList<Byte>(1024) 4136 4136 new LinkedList<Byte>() with 1024 24624 24624 new char[1024] 2064 2064 new

Java and Memory Leaks

Overview The term "memory leak" is used in Java in a manner which is different to how it is used in other languages. What does a "memory leak" mean in general terminology and how is it used in Java? Wikipedia definition A memory leak, in computer science (or leakage, in this context), occurs when a computer program consumes memory but is unable to release it back to the operating system. The JVM reserves the heap as virtual memory on startup and doesn't gives that memory back until it exits. This virtual memory turns into main memory as it is used. This is why the virtual size and the resident size for a JVM can be very different and the resident memory can grow without the virtual memory changing. In object-oriented programming, a memory leak happens when an object is stored in memory but cannot be accessed by the running code. The GC can always find every object on the heap, even those which are not reachable to the application. As such there is no obje

Java: What is the limit to the number of threads you can create?

Overview I have seen a number of tests where a JVM has 10K threads. However, what happens if you go beyond this? My recommendation is to consider having more servers once your total reaches 10K. You can get a decent server for $2K and a powerful one for $10K. Creating threads gets slower The time it takes to create a thread increases as you create more thread. For the 32-bit JVM, the stack size appears to limit the number of threads you can create. This may be due to the limited address space. In any case, the memory used by each thread's stack add up. If you have a stack of 128KB and you have 20K threads it will use 2.5 GB of virtual memory. Bitness Stack Size Max threads 32-bit  64K 32,073 32-bit 128K 20,549 32-bit 256K 11,216 64-bit  64K stack too small 64-bit 128K 32,072 64-bit 512K 32,072 Note: in the last case, the thread stacks total 16 GB of virtual memory. Java 6 update 26 32-bit,-XX:ThreadStackSize=64 4,000 threads: Time to create 4,000 threads was 0.522

Java: Getting the size of an Object

Overview Java was designed with the principle that you shouldn't need to know the size of an object. There are times when you really would like to know and want to avoid the guess work. Measuring how much memory an object uses There are three factors which make measuring how much an object uses difficult. The TLAB allocates blocks of memory to a thread.  This means small amount of memory don't appear to reduce the free memory. If you do this repeatedly, you will see a block of free memory be used. The way around this is to turn off the TLAB. -XX:-UseTLAB A GC can occur while you are creating your object. This will result in more free memory at the end than when you started. I ignore any negative sizes in this test ;) Other threads in the system could use memory at the same time. I perform multiple test and take the median, which removes any outliers. Size of objects in a 32-bit JVM Running this SizeofTest , with on 32-bit Sun/Oracle Java 6 update 26, -XX:-UseTLAB I ge

30,000 hits last month.

I keep expecting interest to wane, but it is still growing. I have lots more ideas for articles on back to basics Java. :) BTW; I apologise for my English. If you see any typos or bad grammar don't be afraid to tell me. If you have any suggestions for articles you would like to see or you have seen an article you think I should review, let me know. peter.lawrey (a)

Incorrect Core Java Interview Answers

Overview On the Internet, Java interview questions and answers get copied from one web site to another. This can mean that an incorrect or out of date answer might never be corrected. Here are some questions and answer which are not quite correct or are now out of date. i.e. are pre Java 5.0. How many ways can an argument be passed to a subroutine and explain them? An argument can be passed in two ways. They are passing by value and passing by reference. Passing by value: This method copies the value of an argument into the formal parameter of the subroutine. Passing by reference: In this method, a reference to an argument (not the value of the argument) is passed to the parameter. Java only supports Pass-By-Value. You can pass a reference by value, but you cannot pass by reference in Java. Java references can be described as Call By Sharing but this is not commonly used. What is Garbage Collection and how to call it explicitly? When an object is no longer referred to by any v

Java Secret: Generated methods

Overview The Java compiler generates extra methods which appear in stack traces. These can be confusing at first. What are they and why do they exist? The access model in the JVM The access model in the JVM has not changed significantly since version 1.0. However, Java 5.0 added features such as inner/nested classes and convariant return types which were not in the original design. So how are they supported? The issue with nested classes Nested classes can access private methods and fields of an outer class. However the JVM does not support this so the compiler generates methods as a workaround. Private Methods In this example, the inner class calls a private method init() of the outer class. public class PrivateMethod { // line 4 private void init() { throw new UnsupportedOperationException(); // line 6 }; class Inner { Inner() { init(); } } public static void main(String... args) { PrivateMethod pm = new

Java low level: Converting between integers and text (part 2)

Parsing an integer from text Parsing an integer is relatively simple depending on how many checks you make. This loop does almost no validation. The input can multiple '-' signs and the number can overflow. As soon as it reaches a character is does not accept to stops. It is left to the caller to check if this character was a valid separartor. long num = 0; boolean negative = false; while (true) { byte b = buffer.get(); // if (b >= '0' && b <= '9') if ((b - ('0' + Integer.MIN_VALUE)) <= 9 + Integer.MIN_VALUE) num = num * 10 + b - '0'; else if (b == '-') negative = true; else break; } return negative ? -num : num; Note: The expression (b >= '0' && b <= '9') has been re-written taking advantage of an underflow to turn this into one comparison. To follow... Converting floating point numbers to text

Java low level: Converting between integers and text (part 1)

Overview As Java has a number of way to convert an integer to a String, its not something you might have considered writing yourself. However, there may be situations where you want to do this. One of them is when performance is critical. Java libraries tend to create more objects than are required. Usually this doesn't matter but there are times when you need the system to go faster and reading and writing numbers is a big hit for you. Performance difference The following is based on a tests of 128K integers in binary and text formats repeatedly to take an average time to write and read an integer. Source to run all tests Source for the examples Scroll down for the Unsafe examples. On a 2.6 GHz Xeon Unsafe text: Typically took 41.4 ns to write/read per long. ByteBuffer direct text: Typically took 63.9 ns to write/read per long. ByteBuffer heap text: Typically took 68.9 ns to write/read per long. Print text: Typically took 325.7 ns to write/read per long. DecimalFormat

Java: All about 64-bit programming

Overview I found this article series very interesting All about 64-bit programming in one place which collects "a lot of links on the topic of 64-bit C/C++ software development." However some of these issues are relevant to Java and can make a difference. The size_t in C++ is 64-bit Java uses the int type for sizes and this doesn't change in a 64-bit JVM. This gives backward compatibility but limits arrays, collections and ByteBuffer's to this size. Generally 2 billion is enough, but as files sizes and memory sizes get larger, the number of situations where it is not will grow over time. As of mid 2011, for £1K you can buy a PC with 24 GB of memory and for £21K you can buy a server with 512 GB of memory. This is already a problem for memory mapping a file larger than 2 GB. This can only be done by mapping in a 2 GB window of memory (2^31-1 bytes at a time). This is rather ugly and removes one of memory mapping's key features; transparency. BTW: I h

Human Readable vs Machine Readable Formats

Overview Most file/serialization formats can be broadly broking into two formats, Human Readable Text and Machine Readble Binary. The Human Readable formats have the advantage of being easily understood by a person reading them. Machine readable formats are easier/faster for a machine to encode/decode. There are formats which attempt to be a little of both. XML, JSon, CSV are examples of these. However these do not achieve close to the performance a binary format can achieve. Myth: Machine Readable Binary is always more compact than a Human Readable Binary can be more compact, however the obscurity of its format makes it difficult to ensure every byte counts. i.e. its usually hard enough getting something work. Making it compact as well is an added complication. However with Human Readable formats, determing how the format can be made more compact is more easily understood. As text: 38 bytes long, [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] As binary: 290 bytes long,

20000 hits last month.

Thank you for all who read my articles. It encourages me to do more. :)

Low GC in Java: Using primitives

Overview In a recent article I examined how using primitives and collections which support primitives natively instead of Wrappers and standard collections can reduce memory usage and improve performance. Different way to have a Map of int/Integer There are a number of ways you can use int/Integer and a number of collections you store them in. Depending on which approach you use can have a big difference on the performance and the amount of garbage produced. Test Performance Range Memory used Use Integer wrappers and HashMap 71 - 134 (ns) 53 MB/sec Use int primitives and HashMap 45 - 76 (ns) 36 MB/sec Use int primitives and FastMap 58 - 93 (ns) 28 MB/sec Use int primitives and TIntIntHashMap 18 - 28 (ns) nonimal Use int primitives and simple hash map 6 - 9 (ns) nonimal The performance range was the typical (50%tile) and one of the higher (98%tile) timings. The garbage was the result of 900,000 loops per second. These tests were run on my new system. The Code Loop tes

World's first commercial quantum computer

There is news of the World's first commercial quantum computer sold to Lockheed Martin You get a 128 qubit computer for $10 million. Since this computer works on different paradigms, its likely it will need a new programming language to describe the problem.

New toy for home

I have set up a new PC for home. Its got An Intel i7 running at 3.8 GHz 24 GB of 1600 MHz memory. 120 GB SSD drive Two 20" screens. Ubuntu 11.04 64-bit. It cost me just over £1K incl tax. I will be using this as a based system to benchmark in future articles. With lots of memory and a flash disk it does pretty much everything immediately. ;) If you want look at a good range of overclocked processors up to 4.8 GHz try Scan's Overclocked Bundles

Low GC in Java: Use primitives instead of wrappers

Overview There are two good reason to use primitives instead of wrappers where possible. Clarity. By using a primitive, you are making it clear that a null value is not appropriate. Performance. Using primitives is often much faster. Clarity is often more important than performance, and is the best reason to use them. However, this article discussed the performance implications of using wrappers. I have had a lot of interest in this article How to avoid Garbage Collection , however this was lacking in much practical detail. This is the first article in a series on ways to reduce demands on the GC. A followup article Low GC in Java: Using primitives looks at using primitives and collections which support them. Performance of using wrappers The following micro-benchmark behaves in a way many application do. Loop using Wrappers and Wrapper Collection Map<Integer, Integer> counters = new HashMap<Integer, Integer>(); int runs = 20 * 1000; for (Integer i = 0; i &

Java Secret: Loading and unloading static fields.

Overview To start with it is natural to assume that static fields have a special life cycle and live for the life of the application. You could assume that they live is a special place in memory like the start of memory in C or in the perm gen with the class meta information. However, it may be surprising to learn that static fields live on the heap, can have any number of copies and are cleaned up by the GC like any other object. This follows on from a previous discussion; Are static blocks interpreted? Loading static fields When a class is obtained for linking it may not result in the static block being intialised. A simple example public class ShortClassLoadingMain { public static void main(String... args) { System.out.println("Start"); Class aClass = AClass.class; System.out.println("Loaded"); String s= AClass.ID; System.out.println("Initialised"); } } class AClass { static final String ID;

15,000 hits last month.

The Vanilla Java blog got more than 15,000 hits last month. Thank you for you interest. :)

Java Secret: Are static blocks interpreted?

Overview An interesting comment was raised by @Vyadh. Are static block interpreted or does the JIT play a part? When are methods optimised? A method will be optimised when it is called often enough. This is controlled by the -XX:CompileThreshold=10000 flag. The compilation occurs in the background by default and a short time later, the optimised version of the code will be used. (Which is why it doesn't always happen at exact the 10K mark) However, loops can also be optimised if they are called often enough. This typically takes about 2-5 seconds. This is why any bound loop which does nothing takes about 2-5 seconds. (Which is how long it takes the JIT to realise the loop doesn't do anything) How does this apply to static blocks? static block are never called more than once. Even if they are loaded by different class loaders, they are optimised independently. (In the unlikely event you loaded the same class 10,000 times, it still wouldn't be optimised) Howe