Regex vs IndexOf in Java.

Overview

The author of these articles Are C# .Net Regular Expressions Fast Enough for You? and .Net Regex: Can Regular Expression Parsing be Faster than XmlDocument or Linq to Xml? recently pointed out to me that he had found that Regular expressions were at least as fast or faster than the alternatives in C#. However it has been my experience that regular expressions in Java were slower. It is hard for me to say why this might be different in Java and C# or even if these are fair comparisons but here is what I found in Java.

Searching XML for the start of a field

This test searches for the expression "\\s*" in short section of text.
Performing 1 loops, regex took 323.389 us and indexOf took 73.026 us on average, ratio=4.4
Performing 10 loops, regex took 61.513 us and indexOf took 42.480 us on average, ratio=1.4
Performing 100 loops, regex took 84.899 us and indexOf took 8.134 us on average, ratio=10.4
Performing 1,000 loops, regex took 16.307 us and indexOf took 5.851 us on average, ratio=2.8
Performing 10,000 loops, regex took 3.751 us and indexOf took 1.321 us on average, ratio=2.8
Performing 100,000 loops, regex took 3.586 us and indexOf took 1.260 us on average, ratio=2.8
Using indexOf is somewhat faster, but more complex to code. You would only optimise this example if you really needed to as even one million searches takes a fraction of a second.

Code for Search XML example

Searching for a value= expression in a short string

This test search for the {value} in value="{value}" of a short string.
Performing 1 loops, regex took 227.029 us and indexOf took 9.387 us on average, ratio=24.2
Performing 10 loops, regex took 11.141 us and indexOf took 1.195 us on average, ratio=9.3
Performing 100 loops, regex took 9.631 us and indexOf took 1.108 us on average, ratio=8.7
Performing 1,000 loops, regex took 14.993 us and indexOf took 2.491 us on average, ratio=6.0
Performing 10,000 loops, regex took 2.472 us and indexOf took 0.254 us on average, ratio=9.7
Performing 100,000 loops, regex took 0.583 us and indexOf took 0.097 us on average, ratio=6.0
Performing 100,000 loops, regex took 0.465 us and indexOf took 0.032 us on average, ratio=14.5
Performing 100,000 loops, regex took 0.467 us and indexOf took 0.032 us on average, ratio=14.6
Since the times keep improving even after 100K iterations, I suspect the JIT is able to optimise indexOf more than the relatively complex code supporting regular expressions.

Code for Searching a short string example

Comments

  1. Hey Peter, Great post! :-) Give some time for a reply. -Bill

    ReplyDelete
  2. @OmegaMan I thought your articles worthy of a well thought out reply. ;)

    ReplyDelete
  3. hi Peter, this link can be worth looking at in regards to regex performance:

    http://tusker.org/regex/regex_benchmark.html

    built-in implementation definitely is not the fastest possible.

    ReplyDelete
  4. Your second link "Code for Searching a short string example" does not work as of the date of this comment.

    ReplyDelete
  5. @jay, It is working for me now. Perhaps if you try it again?

    You could try http://code.google.com/p/core-java-performance-examples/source/browse/trunk/src/test/java/com/google/code/java/core/regex/ which is the parent directory for both code examples.

    ReplyDelete
  6. Weird...the link didn't start working for me until I access the link you provided in the comment.

    Anyway, in the second example, by
    1. using this pattern: "value=\"([^\"]+?)"
    2. Creating the Matcher once and using matcher.reset() inside the for loop
    3. Passing in the Pattern instead of recompiling it in the function

    I reduced the last ratio from 10.7 to around 3. But my laptop is slower (originally 0.766 us vs. 0.467 on your machine for the regex)

    ReplyDelete
  7. An assumption I am making is you wouldn't actually want to parse the same string repeatedly, otherwise you would cache the result. What this should really test is a String which can change. Reusing a compiled Pattern makes sense, but I suspect isn't taking much time.

    Is it possible to re-use a Matcher with different Strings?

    ReplyDelete
  8. Yes, it is possible to reuse a Matcher with different Strings:

    matcher.reset(myNewString);

    By the way, I just found your blog yesterday and have enjoyed what I have read so far. Thanks!

    ReplyDelete

Post a Comment

Popular posts from this blog

Low Latency Microservices, A Retrospective

Unusual Java: StackTrace Extends Throwable

System wide unique nanosecond timestamps