Performance Tip: Set the capacity of a collection is you know its size.

Overview

An advantage of using a Collection or Map over using a plain array is you don't need to know how big it will be when you create it. Often this is the case. However sometimes you know what the size will be and its can be more efficient in CPU and memory if you give the correct length when it is created.

Don't set the length if you don't know what it is

Often the default size is likely to be the best, so just leave the default.
public static <T> ArrayList<T> list(Enumeration<T> e) {
    ArrayList<T> l = new ArrayList<T>();
    while (e.hasMoreElements())
        l.add(e.nextElement());
    return l;
}
Enumeration does give you a hint as the the final size, so just leave the default.

If you know the size of a Collection set it

In this example, the number of elements in the List is fixed and known.

From AnyImpl.checkExtractBadOperationList(int[] expected)
List list = new ArrayList() ;
for (int ctr=0; ctr<expected.length; ctr++)
    list.add( getTCKindName( expected[ctr] ) ) ;
As you can see the List will have expected.length elements. This would be more efficent if the list were defined
List list = new ArrayList(expected.length);

Set the capacity of a HashMap or LinkedHashMap with the load factor

When you set the capacity of a HashMap, you have to consider the load factor as well. This is because the collection will grow when the load factor is reached. For an ArrayList this is always 1 (it won't grow until the size reaches the capacity) However the typical load factor of a HashMap is 0.75f which means the capacity will grow when the size reaches 75% of capacity.

From HashAttributeSet.readObject(ObjectInputStream)
attrMap = new HashMap();
int count = s.readInt();
Attribute attr;
for (int i = 0; i < count; i++) {
    attr = (Attribute)s.readObject();
    add(attr);
} 
To optimise this the size could be set.
int count = s.readInt();
attrMap = new HashMap(count*4/3); // divide by 0.75 to get the desired capacity.
for (int i = 0; i < count; i++) {
    add((Attribute)s.readObject());
} 
Note: For HashMap, the capacity is always a power of 2 or the next power of two you provide.

Bonus tip: A collection doesn't need to be thread safe if it is only read by multiple threads.

It is possible to write a collection which is modified by just reading it. Using LinkedHashMap with accessOrder=true is one example. However most collections are not modified by reading them and don't need to be thread safe if they are only read by multiple threads.
static Hashtable contentTypes = new Hashtable();

static {
contentTypes.put("CDATA", new Integer(CDATA));
contentTypes.put("RCDATA", new Integer(RCDATA));
contentTypes.put("EMPTY", new Integer(EMPTY));
contentTypes.put("ANY", new Integer(ANY));
}

public static int name2type(String nm) {
Integer val = (Integer)contentTypes.get(nm);
return (val != null) ? val.intValue() : 0;
}
The static block is guaranteed to be thread safe, after that, the collection is not modified and does need to be thread safe. A way you could write it now is
static final Map contentTypes = new HashMap() {{
    put("CDATA", CDATA);
    put("RCDATA", RCDATA);
    put("EMPTY", EMPTY);
    put("ANY", ANY);
}

public static int name2type(String nm) {
    Integer val = contentTypes.get(nm);
    return val != null ? val : 0;
}

Hashtable is a legacy class which was replaced by HashMap in Java 1.2 (1998)

Comments

  1. count*4/3 isnt exactly power of 2. Java7 HashMap implementation is different then Java6 as they have a inflateTable(toSize) function, but not for the constructor in example. Wouldnt this create collisions?

    ReplyDelete
  2. The *4/3 is to avoid growing the map due to the load factor. It will choose the next power of 2 as a capacity.

    ReplyDelete

Post a Comment

Popular posts from this blog

Java is Very Fast, If You Don’t Create Many Objects

System wide unique nanosecond timestamps

Comparing Approaches to Durability in Low Latency Messaging Queues