Posts

Showing posts from 2011

Happy New Year all

Thank you everyone who read my blog, especially those who gave feedback. It encourages me to write me more. I hope everyone has a Happy New Year.

Using a memory mapped file for a huge matrix

Overview Matrices can be really large, sometimes larger than you can hold in one array. You can extend the maximum size by having multiple arrays however this can make your heap size really large and inefficient. An alternative is to use a wrapper over a memory mapped file. The advantage of memory mapped files is that they have very little impact on the heap and can be swapped in and out by the OS fairly transparently. A huge matrix This code supports large matrices of double. It partitions the file into 1 GB mappings. (As Java doesn't support mappings of 2 GB or more at a time, a pet hate of mine ;) import java.io.Closeable; import java.io.IOException; import java.io.RandomAccessFile; import java.nio.MappedByteBuffer; import java.nio.channels.FileChannel; import java.util.ArrayList; import java.util.List; public class LargeDoubleMatrix implements Closeable { private static final int MAPPING_SIZE = 1 << 30; private final RandomAccessFile raf; private

Thread Affinity library for Java supports JNA

Java Thread Affinity library 1.1 BegemoT has provided same code to allow the Thread Affinity library can use JNA if its available. This means you don't have to have the JNI library. The latest version is available HERE Click on the ZIP button to download the whole thing as a zip file.

What skills should a Core Java Developer have?

Overview I have been trying to put together a list of basic skills a Java developer should have to move on to being an advanced Core Java programmer. Skills You; can write code on paper which has a good chance of compiling. can use a debugger to debug programs and profile an application. are familiar all the primitives types and operators in Java. understands the class loading process and how class loaders work can use multiple threads both correctly and can prove this improve performance or behaviour. e.g. wait/notify/notifyAll, SwingUtils.invokeLater, the concurrency library can use checked exceptions, generics and enums can time a small benchmark and get reproducible results can write a very simple client server TCP service have an understanding of garbage collection, when is it triggered, what can you do to minimise it understand when to use design patterns such as Singleton, Factory, Fly-weight, Builder, Object Pool, Iterator, Strategy, Visitor, Composite Sugge

Solution: What is this broken code doing

Following my earlier blog , here is my solutions to these puzzles. Loop every second value List list = new ArrayList<>(); for (int i = 0; i < 10; i++) list.add(i); for (int i = 0; i < list.size(); i++) { System.out.println(list.get(i)); list.remove(i); } When you call remove(i) it shifts all the elements down one AND i++ shifts the index up one. This means, you see every second value. A work around is as follows. while(!list.isEmpty) System.out.println(list.remove(0)); Endless loop This prints all the characters which have value for(char ch = Character.MIN_VALUE; ch<= Character.MAX_VALUE; ch++) { int i = Character.getNumericValue(ch); if (i >= 0) System.out.println("char "+ch + ' ' + (int) ch+" = "+i); } As many realised, this doesn't stop because a char is always less than or equals to the maximum value. (Which is why its called the maximum ;) Some suggested avoiding this difficult value an

What is this broken code doing

Overview Here are some examples of broken code. I find the interesting to understand what they don't do what they appear to. Can you work out why these don't work? ;) Loop every second value List list = new ArrayList<>(); for (int i = 0; i < 10; i++) list.add(i); for (int i = 0; i < list.size(); i++) { System.out.println(list.get(i)); list.remove(i); } prints 0 2 4 6 8 Endless loop This prints all the characters which have value for(char ch = Character.MIN_VALUE; ch<= Character.MAX_VALUE; ch++) { int i = Character.getNumericValue(ch); if (i >= 0) System.out.println("char "+ch + ' ' + (int) ch+" = "+i); } However it doesn't stop. Can you change the code to use a do/while loop so that every char is considered but it does stop? (Without using a larger type) What am I waiting for? This "program" deadlocks. The main method is optional ;) class Main { static int value;

Solutions: Testing an Anagram of a Palindrome

There are many solutions to this problem . However, it is made simpler if you determine the number of letter with an odd number occurrences. As long as the text has no more than 1 odd numbered letter, you can make a palindrome from it. A straight forward solution public static boolean isPalindromeAnagram(String text) { BitSet bs = new BitSet(27); for (int i = 0; i < text.length(); i++) { char ch = text.charAt(i); if (('A' <= ch && ch <= 'Z') || ('a' <= ch && ch <= 'z')) bs.flip(ch % 32); } return bs.cardinality() <= 1; } A trickier solution public static boolean isPalindromeAnagram(String text) { int count = 0; for (int i = 0; i < text.length(); i++) { char ch = text.charAt(i); if (ch + Integer.MIN_VALUE - 'A' <= 'z' + Integer.MIN_VALUE - 'A') count ^= 1 << ch; } return Integer.bitCount(count

synchronized is slower but not worth worrying about

Overview Often developer jump through hoops to avoid synchronization. There are justifications like; the application does lots of writes or reads. Without a clear idea of what "lots" means, they have no idea whether the optimisations they are using will help or just add complexity (and possibly make it slower) Vector is slower than ArrayList, but that's not a good reason to avoid it I prefer to use lock free objects and collections in a single threaded context. This is more for clarity than performance. It makes it obvious this code was designed to be single threaded. The performance difference by comparison is often trivial. public static void main(String... args) throws IOException { for (int i = 0; i < 5; i++) { perfTest(new ArrayList ()); perfTest(new Vector ()); } } private static void perfTest(List objects) { long start = System.nanoTime(); final int runs = 100000000; for (int i = 0; i < runs; i += 20) { /

DZone adds page view summary.

DZone have added page views to the summary of articles an individual has posted. My articles on DZone This is interesting to me as the articles which are most popular on blogger and are not the same as those on DZone. (A factor for 2 one way or the other in many cases) Its is also interesting that the total number of pages views is currently 173K compared with on blogger its 163K page views. (Surprisingly similar)

Java Interview Question; Anagram of a Palindrome

Overview In many interview questions, what they are looking for is a simple solution to an apparently complex problem. Unfortunately, the solutions are usually obvious if you know the solution, but very hard to come up with in an interview situation. ;) Anagram of Palindrome Write a method to test if a String is an anagram of a palindrome. The method should have O(n) time complexity and O(1) space complexity. You can assume all letters are between ' ' (space) and 'z'. Assume only letters matter and case is ignored. e.g. "kayak" "Rats live on no evil star" "A man, a plan, a canal, Panama" "Doc, note: I dissent. A fast never prevents a fatness. I diet on cod." The solution A brute force approach is to generate anagrams and test if they are a palindromes. This is not required. All you need to test is if the String is the sort of string which can be turned into a palindrome. Sample solution available HERE

Thread Affinity library for Java.

Overview Locking critical thread to a CPU can improve throughput and reduce latency. It can make a big difference to 99%, 99.9% and 99.99%tile latencies. Unfortunately there is no standard calls in the JDK to do this, so I wrote a simple library so you can manage and see how CPUs have been assigned. What does it look like running? In the following example, there are four threads main, reader, writer and engine. The main thread finishes before the engine starts so they end up using the same CPU. Estimated clock frequency was 3400 MHz Assigning cpu 7 to Thread[main,5,main] Assigning cpu 6 to Thread[reader,5,main] Assigning cpu 3 to Thread[writer,5,main] Releasing cpu 7 from Thread[main,5,main] Assigning cpu 7 to Thread[engine,5,main] The assignment of CPUs is 0: General use CPU 1: General use CPU 2: Reserved for this application 3: Thread[writer,5,main] alive=true 4: General use CPU 5: General use CPU 6: Thread[reader,5,main] alive=true 7: Thread[engine,5,main] alive=true Releas

Java Puzzle operator confusion

Why does this int _=1, __=2; int j = __ ^_^ __, k = __-- -+- _ -+- --__; System.out.println(j + " " + k); print the following? 1 3

Java puzzle System.exit and locks.

When you call System.exit() it will stop the execution of the thread at that point and not call any finally blocks. private static final Object lock = new Object(); public static void main(String... args) { Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() { @Override public void run() { System.out.println("Locking"); synchronized (lock) { System.out.println("Locked"); } } })); synchronized (lock) { System.exit(0); } } What does this program print? Replace System.exit(0) with Thread.currentThread().stop() and run again for comparison.

Test a complete failure of the JVM

Say you want to test that your application behaves correctly on restart even after the application crashes. One approach is to trigger a crash in test code and check that data is in a correctable state on restart. Unsafe unsafe = getUnsafe(); // use reflection unsafe.setMemory(0, 1, (byte) 0); This triggers a SIGSEGV # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00000032b967ae09, pid=17870, tid=1082034496 # # JRE version: 7.0_01-b08 # Java VM: Java HotSpot(TM) 64-Bit Server VM (21.1-b02 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libc.so.6+0x7ae09] memset+0x9 # If your tests still pass, you can be reasonably confident it is recoverable even on a complete failure of the JVM. For the code for getUnsafe(). Warning: This class is appropriately named and misuse can result in a crash of the JVM. It can also change in future versions (It is more likely new methods will be added rather than removed) public

Java puzzle, low latency queue

There a number of ways to use queues. A simple and very fast queue is to use AtomicReference. (It stores 0 or 1 element) 1. Write a producer and consumer thread which communicates via an AtomicReference. [Click for sample solution] Took an average of 48 ns per object public class Main { public static void main(String... args) throws InterruptedException { AtomicReference<Long> queue = new AtomicReference<Long>(); final Consumer<Long> consumer = new Consumer<Long>(queue); Thread t = new Thread(new Runnable() { @Override public void run() { while(consumer.take() >= 0); } }); t.start(); Producer<Long> producer = new Producer<Long>(queue); long start = System.nanoTime(); int runs = 10*1000*1000; for(long i=0;i<runs;i++) producer.push(i); producer.push(-1L); // poison pill t.join();

Low Latency Slides

Last weekend was LJC Open Conference #4, and like many people I got a lot out of it. My talk was up first which meant I could relax for the rest of the day. Here are the slides Note: the message size was 16 longs or 128 bytes. This makes a difference at higher throughputs. In answer to @aix's questions On slide 12; what are we measuring here, elapsed time from the message hitting the buffer to the "echo" reply showing up on the client socket? In each message I add a timestamp as it is written. When each message is read, I compare the timestamp with the current time. I sort the results and take the middle / 50%tile as typical timings and the worst 0.01% as the 99.99%tile. What's causing the large difference between "nanoTime(), Normal" and "RDTSC, Normal" in the bottom half of the slide (2M/s)? The reason for taking the RDTSC directly (9 ns) is that System.nanoTime() is quite slow on Centos 5.7 (180 ns) and the later is a system calls

Ever decreasing cost of main memory

I have blogged about the relative cost of memory to the cost of your time. Memory in servers is so cheap that for small amounts of memory, it just isn't worth your time to introduce complexity into your application to try to save it. The average salary for a developer in the UK is £40,000 and the average cost of memory is £5 per GB. The cost to your organisation for both is higher and if you have many machines saving a 1 MB on 1000 machines saves 1 GB, however memory is so divisible you can quickly run into small amounts of memory which are not worth your time saving. Hard disk space is even cheaper and most people get its not worth keeping your disks completely clean all the time, because the cost of deleting a file you later need is high, but also the value of the space you save is not worth your time. A typical 2 TB of mirrored disk in a network appliance costs about £300. Your time Main memory to save in total     Disk space to save Blink of an eye (20 ms) 20 KB 1.7 MB

Stupid performance trick in Java

ByteBuffer has a compact() method which is useful for re-using a buffer if it is not full consumed. However, in the likely case it is consumed, it can perform surprising badly. while (in.read(buf) >= 0 || buf.position != 0) { buf.flip(); out.write(buf); buf.compact(); // In case of partial write } In my application, the throughput increased by 6% by replacing compact() with this method. public static void compact(ByteBuffer bb) { if (bb.remaining() == 0) bb.clear(); else bb.compact(); } Note: this is not the increase in a micro-benchmark, but across the entire application! Your Mileage May Vary of course

Java puzzle (Single threaded client server)

Without creating additional threads, create a ServerSocket on a random unused port, a client Socket connecting to localhost and a server Socket. Part 1) Send messages from client to server and back again. (This can be useful in unit tests) [Click for sample solution] ServerSocket ss = new ServerSocket(0); Socket c = new Socket("localhost", ss.getLocalPort()); Socket s= ss.accept(); ss.close(); // sends one byte. c.getOutputStream().write(1); int num = s.getInputStream().read(); s.getOutputStream().write(num+1); int num2 = c.getInputStream().read(); s.close(); c.close(); Part 2) Do the same with blocking NIO. [Click for sample solution] ServerSocketChannel ss = ServerSocketChannel.open(); ss.socket().bind(new InetSocketAddress(0)); SocketChannel c = SocketChannel.open(new InetSocketAddress("localhost", ss.socket().getLocalPort())); SocketChannel s = ss.accept(); ss.close(); // sends one byte. ByteBuffer bb1 = ByteBuffer.allocateDirect(1); c.write(

Java puzzle (Sum of even Fibonacci numbers)

Write a function which calculates the sum of even Fibonacci numbers. public static void main(String... args) { for(int i=1;i<22;i++) System.out.print(sumEvenFibonacci(i) + " "); } public static long sumEvenFibonacci(int n) { // add code here. return sum; } prints 0 0 2 2 2 10 10 10 44 44 44 188 188 188 798 798 798 3382 3382 3382 14328 For bonus points, optimise the sumEvenFibonacci method so it contains only one condition and doesn't test for even or odd. [Click for sample solution] public static long sumEvenFibonacci(int n) { long a = 1, b = 1, sum = 0; for (int i = n; i > 2; i -= 3) { long c = a + b; sum += c; a = b + c; b = a + c; } return sum; }

How arbitary is the start of System.nanoTime()?

System.nanoTime() is defined as having an arbitrary start time. However, there is no reason to make it random and it is based on an unspecified system call. On Centos 5.7 and other Linuxes I have tried, this appears to be based on the uptime. System.out.println("/proc/uptime " + new Scanner(new FileInputStream("/proc/uptime")).next()); System.out.println("System.nanoTime " + System.nanoTime() / 1e9); prints /proc/uptime 265671.85 System.nanoTime 265671.854834206

Why Concurrency examples are confusing

Overview I was recently at a talk were the audience was asked who used concurrency and who found it confusing. Nearly all who use concurrency found it confusing. This was a little surprising to me, and I was left wondering what the reason might be. I have been thinking about writing a book, so I have been watching a number of web talks and reading a number of books which talk about concurrency and I think I know at least one reason. Bad Examples The problem is; in a talk or a book you need a simple, easily understood example. Unfortunately, simple and easily understood examples almost always make very BAD examples for concurrency. What makes them bad examples is that concurrency code comes with an overhead both to the developer in terms of complexity of the code and in CPU performance. Simple examples are likely to have far more overhead than the work done making the performance of the concurrent example far worse than just writing it single threaded and also the code more c

Recycling objects to improve performance

Image
Overview In a previous article I stated that the reason the deserialisation of objects was faster was due to using recycled objects. This is potentially surprising for two reasons, 1) the belief that creating objects is so fast these days, it doesn't matter or is just as fast as recycling yourself, 2) None of the serialisation libraries use recycling by default. This article explores deserialisation with and without recycling objects. How it not only is slower to create objects, but it slows down the rest of your program by pushing data out of your CPU caches. While this talks about deserialisaton, the same applies to parsing text or reading binary files, as the actions being performed are the same. The test In this test, I deserialise 1000 Price objects, but also time how long it takes to copy a block of data. The copy represents work which the application might have to perform after deserialising. The test is timed one million times and those results sorted. The X

Serialization using ByteBuffer and recycled objects

Image
Overview I have always been of the view that using recycled objects is faster for serialization. So I wrote a test based on Thrift Protobuf Compare to see where it does well or poorly. The benchmark suggest that serialization/deserialization is fast with ByteBuffer and recycled objects, but creating new objects is relatively expensive. This suggests that provided you have a simple strategy for reusing objects, it may be worth using this approach. However, if you can't recycle objects I suspect it won't be worth the extra effort. Total Serailization time All results Note: the creation time for a recyclable object is much higher, but the serialization/deserialization times are much better. Starting , Object create, Serialize, /w Same Object, Deserialize, and Check Media, and Check All, Total Time, Serialized Size ByteBuffer-specific , 154.98000, 242.50000, 303.00000, 132.00000, 132.00000,

Serialization benchmarks and charts

I was looking at this serialization benchmark Thrift Protobuf Compare and saw at the end oif the report it spits out a series of HTML. One of these reads. <img src='http://chart.apis.google.com/chart?chtt=totalTime&chf=c||lg||0||FFFFFF||1||76A4FB||0|bg||s||EFEFEF&chs=689x430&chd=t:1263.0,1552.0,1747.0,1878.0,1966.5,2119.0,2203.0,2559.5,2706.0,2963.0,3585.5,3912.5,4182.5,4186.5,5948.5,7337.5,9050.5,10078.0,12404.0,12931.0,18031.0,28068.5&chds=0,30875.350000000002&chxt=y&chxl=0:|java|json/jackson-databind|JsonMarshaller|xstream (stax with conv)|binaryxml/FI|hessian|javolution xmlformat|stax/woodstox|protostuff-json|stax/aalto|protostuff-numeric-json|json (jackson)|thrift|avro-generic|sbinary|avro-specific|activemq protobuf|protobuf|kryo|kryo-optimized|MessagePack (buggy)|java (externalizable)&chm=N *f*,000000,0,-1,10&lklk&chdlp=t&chco=660000|660033|660066|660099|6600CC|6600FF|663300|663333|663366|663399|6633CC|6633FF|666600|666633|666666&a

Strine translator

Translating to Strine ;) public static void main(String... args) { System.out.println("Hello World"); } static { try { Field value = String.class.getDeclaredField("value"); value.setAccessible(true); value.set("Hello World", value.get("G'Day Mate.")); } catch (Exception e) { throw new AssertionError(e); } } prints G'Day Mate. BTW: Strine is the Australian Dialect of English.

Randomly not so random

In a random sequence, all sequences are equally likely, even not so random ones. Random random = new Random(441287210); for(int i=0;i<10;i++) System.out.print(random.nextInt(10)+" "); } prints 1 1 1 1 1 1 1 1 1 1 and Random random = new Random(-6732303926L); for(int i=0;i<10;i++) System.out.println(random.nextInt(10)+" "); } prints 0 1 2 3 4 5 6 7 8 9 Lastly public static void main(String ... args) { System.out.println(randomString(-229985452)+' '+randomString(-147909649)); } public static String randomString(int seed) { Random rand = new Random(seed); StringBuilder sb = new StringBuilder(); for(int i=0;;i++) { int n = rand.nextInt(27); if (n == 0) break; sb.append((char) ('`' + n)); } return sb.toString(); } prints hello world

Java plus

A confusing piece of code here for you to parse. ;) int i = (byte) + (char) - (int) + (long) - 1; System.out.println(i); prints 1 Later discussed on Stack Overflow here.

HugeCollections moved to github

HugeCollections has been moved to github. to support easier forking and collaboration. https://github.com/HugeCollections/Collections Its a pain to setup , but once that is done it makes working on the code and merging changes much easier.

New Contributors to HugeCollections

What is the HugeCollection library The objectives of HugeCollections for the first release are fairly ambitious. Scale to massive collections i.e. sizes much larger than 2 billion without significant heap foot print or GC impact. i.e. using heap-less memory Be faster and more efficient that using plain JavaBeans with an ArrayList or Map and a vector or unordered_map in C++. Support durability (transparently saved and loaded from disk) Support thread safety (with no overhead if not required) and using multiple threads implicitly. i.e. large operations are automatically distributed. Be faster and more efficient than using a database. Transactions will NOT be in this release.  Support in one application what might have to be distributed otherwise. Dynamic code generation as required (no need to pre-generate code in the build) A prototype has been built which shows these objectives are possible, however to turn this library in to a usable release, will take some help. Note

Forum for ideas

Someone recently asked me if had a forum to include suggestions for articles to add. Does anyone have a forum to work well with blogger. suggestions for articles. Please comment with your ideas.

The Exchanger and GC-less Java

Overview The Exchanger class is very efficient at passing work between thread and recycling the objects used. AFAIK, It is also one of the least used Concurrency classes. As @Marksim Sipos points out, if you don't need GC less logging using an ArrayBlockingQueue is much simpler. Exchanger class The Exchanger class is useful for passing data back and forth between two threads. e.g. Producer/Consumer. It has the property of naturally recycling the data structures used to pass the work and supports GC-less sharing of work in an efficient manner. Here is an example, passing logs to a background logger. Work (a log entry) is batched into LogEntries and passed to a background thread which later passes it back to the thread so it can add more work. Provided the background thread is always finished before the batch is full, it is almost transparent. Increasing the size of the batch reduces how often the batch is full but increase the number of unprocessed entries waiting at an

Is making Boost more like Java a good idea?

Overview I was reading a question StackOverflow about What is Boost missing? and I was wondering how many of these features are available in Java already and whether making boost more like Java is good idea. Top suggestions Suggestion What is in Java SQL support JDBC JSon Requires an additional library like XStream, Possibly should be in core Java Audio Java has basic support. Don't know if it much good. logging Logger callstack, a standard API Throwable.getStackTrace() and Thread.getStackTrace() Support for archives Support ZIP and JAR Redeveloped collections The same could be said for Java Collections Standard XML parsing for UTF-8 text SAX (event driven) and DOM (Document object model) parsers for XML Platform independent GUI Swing Concurrency with lock free collections and atomic operations Concurrency library Arbitrary precision floating point and decimal BigDecimal and BigInteger Python, Ruby and Lua support Not built in, but there is Jython, JRuby and Lu

Memory alignment in C, C++ and Java

Overview You might assume that reducing the size of a struct or class saves the same amount of memory.    However due to memory alignment in memory allocators it can make no difference, or perhaps more difference than you might expect.    This is because the the amount of memory reserved is usually a multiple of the memory alignment. For Java and C this can 8 or 16 bytes. Size of memory reserved These tests were performed in a 64-bit C program (gcc 4.5.2) and a 64 JVM (Oracle Java 7) In Java, direct memory is largely a wrapper for malloc and free. Bytes C malloc() reserved Java ByteBuffer.allocateDirect() 0 to 24 32 bytes 32 bytes + a ByteBuffer 25 to 40 48 bytes 48 bytes + a ByteBuffer 41 to 56 64 bytes 64 bytes + a ByteBuffer 57 to 72 80 bytes 80 bytes + a ByteBuffer Constructing objects is a similar story Number of fields C class of int (heap/stack) C class of void * (heap/stack)  Java class   with int  Java class with Object references 1 32/16 bytes 32/16 bytes 16

Why thread priority rarely matters

Overview Its is tempting to use the Thread.setPriority() option in Java. However for many applications this is more a comment for the developer than something which will make a measurable difference. esp. with multi-core systems. Why it usually doesn't matter If you have plenty of free CPU, every thread which can run will run. The OS has no reason not to run a low priority thread or process when it has free resources. If your system is close to 100% of CPU on every core , the OS has to make a choice as to how much time each thread or process gets on the CPU and it is likely to give favour to higher priority threads over lower priority threads, (many OSes ignore the hint) and other factors are likely to matter as well. This priority only extends to raw CPU. Threads compete equally for CPU cache, heap space, CPU to memory bandwidth, file cache, disk IO, network IO and everything else. If any of these resource are in competition, they are all equal. To set a high priority on Wi

Order of elements in a hash collection

Overview While is it generally understood that keys or elements in a HashMap or HashSet occur in a pseudo random order, what is not obvious is that two collections with the same elements can be in different orders. This is because the capacity of the collection also determines the order the elements appear. This can be important if you have only an Iterator to a Set. It is easy to assume the order will be the same and most of the time it will work (you can write unit tests with this assumption which will pass) However, if the Set has a different capacity (something you don't normally know or worry about) the order will change. Find what order elements can appear in The following test adds the same 11 elements to a HashSet, each time with a different collection and prints out all the combinations it finds. @Test public void testSetOrder() { Set<String> order = new HashSet<String>(); Collection<String> elements = Arrays.asList( &quo

The importance of innovation

Overview In a recent article I wrote about how you do things can make a big difference. Java can be significantly faster than C The aim was to show how you can use the same algorithm (the what), implemented with a different approach (the how) and make significant improvement in performance. What I have learnt I now understand that the purpose of the web site is compare languages and remove the developer from the equation by making "the how" as similar to what has been written before as possible. I see this is legitimate approach to have a fair comparison of languages. The Importance of Innovation To me it shows you what can happen when you deliberately limit innovation, even if you have good reasons to do so. If you prescribe one way of implementing a requirement, it can make a big difference to the solution you can achieve. In the case of this benchmark, following the requirements as closely as I could, but admittedly disregarding how these requirements had been ach

Java can be significantly faster than C

Image
Preface When I wrote this article it was to comment on how writing the same algorithm implement a different way can make a major difference. I now believe this is not the point of the web site perhaps shouldn't be the point of this article either. I have written a follow up article. The importance of innovation Overview Whether you use Java or C is not always as important as the approach you use. By "approach" I mean; algorithms not specified in the requirements or "the how" to do something. You might not see this as a fair test of Java vs C, but in the real world human factors are matter. There is no point saying C is faster in theory but there isn't anyone available to make it so. Its like getting a very cheap price on an item which is not in stock. ;) Benchmark Shootout Java is temporarily the fastest for this particular knucleotide benchmark. It is quite likely that if the algorithm I used is translated to C it would be faster again. But for

MethodHandle performance in Java 7

Overview A new feature in Java 7 provides a MethodHandle which "is a typed, directly executable reference to an underlying method, constructor, field, or similar low-level operation, with optional transformations of arguments or return values" This supports currying methods and many other features lambda based languages take for granted. Note: this method compares invoke() in reflections and MethodHandles. I couldn't get invokeExact to work when returning a primitive (it got confused between int and Integer ) However another benchmark Using Method Handles by Andrew Till indicated using invokeExact is much faster than using Method.invoke() with reflections. Thank you wensveen for the link. Example Say you have two method public static int multiply(int i, int j) { return i * j; } public static int subtract(int i, int j) { return i - j; } With MethodHandle you can not only get a reference to the Method (like in reflection) but you can construct new MethodH

Added Memory Mapped support to HugeCollections

Image
Overview The Huge collections library is designed to support large collections on data in memory efficiently without GC impact. It does this using heap less memory and generated code for efficiency. One of the benefits of this approach is memory mapped files can be larger than the available memory, use trivial heap and no direct memory. Memory mapping Loading large amounts of data is time consuming so being able to re-use a store persisted to disk can improve efficiency. With the use of SSD drives the amount of "memory" a Java system can access is effectively extended provided it can be used. How does it perform Adding an object with 12 fields and reading objects sequentially takes about 110 ns for one billion elements. It takes less than two minutes to add all the entries, and the same to read all the elements. Random access is much more expensive until all the data is in memory, with a fast SSD drive it takes about 600 ns when most of the data is in cache, slower i

Writing to stdin

Overview System.in or stdin is usually used as an InputStream. However on Linux you can access this stream in other ways. Accessing file descriptor 0 In linux, each file descriptor is accessible via /proc/{process-id}/fd/{file-id} You can use this to see what files a process has open but also see the contents of a file. Writing and memory mapping Getting the process id in Java is obscure, but once you have this you can re-open existing file descriptors such as stdin, file descriptor 0. int processId = Integer.parseInt(ManagementFactory.getRuntimeMXBean().getName().split("@")[0]); RandomAccessFile raf = new RandomAccessFile("/proc/" + processId + "/fd/0", "rw"); final FileChannel fc = raf.getChannel(); final MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_WRITE, 0, fc.size()); bb.putLong(System.nanoTime()); bb.force(); raf.close(); This opens the System.in is a read-write mode and memory maps it before changing its contents.

Avoiding Java Serialization to increase performance

Overview Many frameworks for storing objects in an off-line or cached manner, use standard Java Serialization to encode the object as bytes which can be turned back into the original object. Java Serialization is generic and can serialise just about any type of object. Why avoid it The main problem with Java Serialization is performance and efficiency. Java serialization is much slower than using in memory stores and tends to significantly expand the size of the object. Java Serialization also creates a lot of garbage. Access performance Say you have a collection and you want to update a field of many elements. Something like for (MutableTypes mt : mts) { mt.setInt(mt.getInt()); } If you update one million elements for about five seconds how long does each one take. Collection Fast PC Labtop Huge Collection 5.1 ns 33 ns List<JavaBean> 6.5 ns 54 ns List<byte[]> with Externalizable 5,841 ns 17,508 ns List<byte[]> with Serializable 23,217 ns 60