Why testing code for thread safety is flawed
Overview
We know that ++ is not thread safe, even for volatile field, however there is a trick to proving this.The problem with testing code for thread safety is that it can happen to work repeatedly, but still be unsafe.
An example
I wrote a lock free ring buffer which passed multi-threaded tests of one billion entries repeatedly. However, a couple of days later the test consistently failed. What was the difference? There was a couple of CPU intensive processes also running on the same box. When these processes finished, the test passed again.The problem
If you try to prove that incremental volatile variable is not thread safe, you can get a result like this.public static void main(String... args) throws InterruptedException { for (int nThreads = 1; nThreads <= 64; nThreads*=2) doThreadSafeTest(nThreads); } static class VolatileInt { volatile int num = 0; } private static void doThreadSafeTest(final int nThreads) throws InterruptedException { final int count = 32 * 1000; ExecutorService es = Executors.newFixedThreadPool(nThreads); final VolatileInt vi = new VolatileInt(); for (int i = 0; i < nThreads; i++) es.submit(new Runnable() { public void run() { for (int j = 0; j < count; j += nThreads) vi.num++; } }); es.shutdown(); es.awaitTermination(10, TimeUnit.SECONDS); System.out.printf("With %,d threads should total %,d but was %,d%n", nThreads, count, vi.num); }On my machine with 8 logical threads prints
With 1 threads should total 32,000 but was 32,000 With 2 threads should total 32,000 but was 32,000 With 4 threads should total 32,000 but was 32,000 With 8 threads should total 32,000 but was 32,000 With 16 threads should total 32,000 but was 32,000 With 32 threads should total 32,000 but was 32,000 With 64 threads should total 32,000 but was 32,000There doesn't appear to be a problem. Why? This is because it takes time to start each thread, and each thread doesn't take long to complete so even though many tasks are started, each completes before the next one starts. i.e. it is effectively single threaded.
Thread safety bugs can hide
In a more complex test, you might not know what subtle thing to change which causes your code to break. In fact you can deploy your application into production and it can work for years. One day something changes like you add another application, increasing the load, a version of Java or your machine is upgraded and suddenly it fails intermittently. It is tempting to assume that the most recent change is the cause of the problem when actually the bug has always been there, it just hasn't shown itself.Changing the test
When we run the test long enough to have multiple threads running at once, we see a different pattern. We see that the single threaded test behaves as expected, without losing any counts, however the multi-threaded test runs start dropping incremented values.With 1 threads should total 100,000,000 but was 100,000,000 With 2 threads should total 100,000,000 but was 75,127,690 With 4 threads should total 100,000,000 but was 51,338,289 With 8 threads should total 100,000,000 but was 35,177,375 With 16 threads should total 100,000,000 but was 15,264,270 With 32 threads should total 100,000,000 but was 14,385,095 With 64 threads should total 100,000,000 but was 15,818,747
Fixing the test
If you replace a volatile int with AtomicInteger you get the following result.With 1 threads should total 100,000,000 but was 100,000,000 With 2 threads should total 100,000,000 but was 100,000,000 With 4 threads should total 100,000,000 but was 100,000,000 With 8 threads should total 100,000,000 but was 100,000,000 With 16 threads should total 100,000,000 but was 100,000,000 With 32 threads should total 100,000,000 but was 100,000,000 With 64 threads should total 100,000,000 but was 100,000,000
Check out ConTest at IBM alphaworks
ReplyDeleteConcurrency testing can be valuable if it finds bugs. However, just because a test passes doesn't mean you won't have thread safety issues. The only sure way is to understand the code and the guarantees that the platform provides.
ReplyDeleteTaking a slightly different slant to ConTest, multithreadedtc allows you to test very specific thread interleavings in a simple manner.
ReplyDeletehttp://www.cs.umd.edu/projects/PL/multithreadedtc/overview.html