Common misconception: How many objects does this create?
Overview
A common question is how many objects or how many Strings does a section of code create. Often the answer is not what you think nor should you really need to know. It is useful to have an idea of when an object is created but there is so many other factors which are often far more important to consider which can mean the total number for an application is not what you think.String is a not a single object
A String wraps a char[]. This means that when you see a new String there could be a new char[] also involved. If you do + with a String it could use a StringBuilder (from Java 5.0) which also wraps a char[]. This means that usually there is more char[] created in Java than String objects. Sometimes char[] is the most common object type in a JVM.String literals are still Strings
A common misconception is that String literals don't count. They don't add to the total after the code has been run at least once, however most of the time the question is about code which is run once. i.e. String literals still count.
Another common misconception is when String literals get loaded. In Java 6 and earlier they were loaded when the Class is loaded, however they are now (Java 7+) loaded when they are first used. This means that a section of code where String literals appear for the first time will create new String objects.
The JVM uses Strings, lots of them.
The JVM uses Java code and this uses Strings. The ClassLoader to load your class uses Strings. The name of the class you want to load is a String as are all the System properties and all environment variables which are created so you can run your program are all Strings, both the values and the key names.
Let us consider a Hello World program and see how many Strings are created so this program can run. Is it 0, 1 or 2 Strings, see if you can guess how many are actually created..
public class HowManyStrings { public static void main(String[] args) throws IOException { System.out.println("Hello world"); System.in.read(); } }
This program stops on System.in.read(); allowing me to take a dump of the heap. The utility jmap can give a histogram count of the number of objects currently on the heap, assuming there has been no GCs this will be the number created.
As you can see, the number of Strings was 2490. If I had a few more environment variables or a different update of Java it would be a different number.
In short, if you are arguing over 2 to 5 String in the code you can see, when the code is run once, you may be missing most of the Strings.
But what if I call the code lots of times?
If you are talking millions of times, it is likely to matter, but here is the thing. The JVM will optimise code which called this many times and it can do two things.
Dead Code Elimination
Code which the JIT detects doesn't do anything useful can be dropped. The JIT is pretty good at this and most likely the example you are looking at doesn't do anything useful either. However in real world code, hopefully it does something useful which is where the next optimisation is useful.
Escape Analysis
The JIT can look at a method (or what the method would look like after everything it calls has been inlined) and see if an object escapes the method. If it doesn't escape the method it can be placed on the stack, or effectively have it's fields unpacked onto the stack. This means no object is created on the heap, and in fact the object header doesn't even have to be created, all it's fields, possibly none of it's fields need to be created. In short, just because you see new String in the code doesn't mean the JIT has to actually create an object, provided it makes no difference to the result (unless you are counting the number of objects created)
Conclusion
The number of Strings created by even a trivial application is likely to be far more than you can imagine a use for, but called enough times and you might find that a method no longer creates any objects at all.
I'm sure you are aware of it, but its worth mentioning the String concatenation optimization, which is capable of turning a StringBuilder usage into a single char[] + String allocation. You really hope that happens because the default StringBuilder isn't particularly clever or fast.
ReplyDeleteThe javac compiler will do constant inlining which replaces "Hello" + " World" with one string. However, these have to be constants known at compile time.
DeleteI'm talking about the runtime string optimization (eg http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/opto/stringopts.cpp ), which does not require constant strings, just a fairly simple new StringBuilder.append(x).append(y)(...).toString form such as what javac will generate when you use +
Delete-Jackson
Thank you for the link. I suspect it did this but didn't know for sure.
DeleteI've been looking for an automated way how to count objects created via a jUnit Test, but thus far unsuccessfully. The JVM TI would be one way to go, but getting that back into Java feels clunky and unstable. Do You have any thoughts?
ReplyDeleteHi, I wanted to know, when we say String str = new String("Hello"); than the object created on heap will have the value "Hello" or it will have a reference which is pointing to "Hello" present in constant pool? If I assume object is created on heap with value "Hello" than how can I prove it programmatically?
ReplyDeleteTo compare two references to an object, you can use ==. If s == t returns true, you have the same object. In your example str is a copy of the String in the constant pool so == will be false even though equals() will be true as the contents are the same.
Delete