Printing arrays by hacking the JVM.

Overview

One the most common gotchas in Java, is knowing how to print arrays.  If an answer on how to print an array get more than 1000 upvotes, you have to wonder if there is a simpler way.  Just about every other popular language has that simpler way, so it's not clear to me why Java still does this.

Unlike other JDK classes, arrays don't have a particularly sane toString() as it is inherited from Object.

It prints the type and address right?

Actually, it doesn't print the address, it just looks as cryptic as one. It prints the internal representation of the type, and the hashCode() of the object.  As all arrays are an Object, they have a hashCode() and a type and a synchronized lock, and every thing else an Object has, but no methods specific to an array. This is why the toString() isn't useful for arrays.

What does it look like unhacked?

If I run the following program.

public class ObjectTest {
    boolean[] booleans = {true, false};
    byte[] bytes = {1, 2, 3};
    char[] chars = "Hello World".toCharArray();
    short[] shorts = {111, 222, 333};
    float[] floats = {1.0f, 2.2f, 3.33f, 44.44f, 55.555f, 666.666f};
    int[] ints = {1, 22, 333, 4_444, 55_555, 666_666};
    double[] doubles = {Math.PI, Math.E};
    long[] longs = {System.currentTimeMillis(), System.nanoTime()};
    String[] words = "The quick brown fox jumps over the lazy dog".split(" ");

    @Test
    public void testToString() throws IllegalAccessException {

        Map<String, Object> arrays = new LinkedHashMap<>();
        for(Field f : getClass().getDeclaredFields())
            arrays.put(f.getName(), f.get(this));
        arrays.entrySet().forEach(System.out::println);
    }
}

it prints.

booleans=[Z@277c0f21
bytes=[B@6073f712
chars=[C@43556938
shorts=[S@3d04a311
floats=[F@7a46a697
ints=[I@5f205aa
doubles=[D@6d86b085
longs=[J@75828a0f
words=[Ljava.lang.String;@3abfe836

I think that is obvious to everyone. o_O Like the fact that J is the internal code for a long and L is the internal code for a Java class. Also Z is the code for boolean when b is unused.

What can we do about it?

In this program it's we end up having to write a special toString method for object needs to be called by our special method for printing a Map.Entry.  Repeat this many times throughput your program and it's just easier to avoid using arrays in Java because they are hard to debug.

What about hacking the JVM?

What we can do is change the Object.toString().  We have to change this class as it is the only parent of arrays we have access to. We cannot change the code for an array as it is internal to the JVM.  There is no byte[] java class file for example for all the byte[] specific methods.

Take a copy of the source for java.lang.Object and replace the toString() with

    public String toString() {
        if (this instanceof boolean[])
            return Arrays.toString((boolean[]) this);
        if (this instanceof byte[])
            return Arrays.toString((byte[]) this);
        if (this instanceof short[])
            return Arrays.toString((short[]) this);
        if (this instanceof char[])
            return Arrays.toString((char[]) this);
        if (this instanceof int[])
            return Arrays.toString((int[]) this);
        if (this instanceof long[])
            return Arrays.toString((long[]) this);
        if (this instanceof float[])
            return Arrays.toString((float[]) this);
        if (this instanceof double[])
            return Arrays.toString((double[]) this);
        if (this instanceof Object[])
            return Arrays.deepToString((Object[]) this);
        return getClass().getName() + "@" + Integer.toHexString(hashCode());
    }

and in Java <= 8 we can add this class to the start of the bootclasspath by adding to the command line

    -Xbootclasspath/p:target/classes

(or wherever your classes have been compiled to) and now when we run our program we see

booleans=[true, false]
bytes=[1, 2, 3]
chars=[H, e, l, l, o,  , W, o, r, l, d]
shorts=[111, 222, 333]
floats=[1.0, 2.2, 3.33, 44.44, 55.555, 666.666]
ints=[1, 22, 333, 4444, 55555, 666666]
doubles=[3.141592653589793, 2.718281828459045]
longs=[1457629893500, 1707696453284240]
words=[The, quick, brown, fox, jumps, over, the, lazy, dog]

just like in you would in just about any other language.

Conclusion

While this is a cool trick, the best solution is that they finally fix Java so it produces a sane output for arrays. It knows you need one and provides it, but hides it away in a class you have to google to find, so that every new Java developer has to have a WTF moment the first time they try to work with arrays.


Comments

  1. There is no excuse for the JVM not to be able to print primitive arrays, but for Object arrays, deepToString uses a worst-case approach to avoid cycles. When printing large arrays, it would have high memory requirements

    ReplyDelete
    Replies
    1. It would be even better if callers to toString could explicitly (ideally implicitly) pass the Appendable to write to, in which case it could stream the results to the System.out.

      Delete
  2. There is always improvements between major versions of Java which might fix a program. If you have a program which depends on a byte[] printing `[B@` something, you have a problem with your program.

    It was suggested that a char[] which contains a password shouldn't be printable, but there are other ways to fix that IMHO.

    ReplyDelete
    Replies
    1. By "fix a program" I meant "break a program"

      Delete
  3. Better toString only the first 1000 or so entries, or you might deeply regret this in some production logs :) (in fact, better not ship this to production - is this even legal?)

    ReplyDelete
    Replies
    1. You would have the same problem with any collection. Using Instrumentation is legal.

      Delete
    2. Yep, logging any Collection is a major performance hit if you forget the guard clauses.
      I wished all Java objects had a `public StringBuilder toString(StringBuilder)`

      Delete
    3. Funny you should mention that but we do something similar. We have an Marshallable interface which when extended allows you to marshal you object to a variety of output including YAML. The toString() is implemented via off heap bytes in the form of YAML by default.

      The upshot is that toString() only creates one String no matter how complex the data structure is. You can also serialize within an existing section and even handles indenting appropriately.

      Delete
    4. Hi Peter, Big fan of your open source libraries, thanks. If you want reasonable printing of collections my jpad project may interest you:
      http://jpad.io/
      It outputs collections as HTML tables and charts.

      Delete

Post a Comment

Popular posts from this blog

Java is Very Fast, If You Don’t Create Many Objects

System wide unique nanosecond timestamps

Comparing Approaches to Durability in Low Latency Messaging Queues