Try optimising the memory consumption first
Overview You would think that if you wanted your application to go faster you would start with the CPU profiling. However, when looking for quick wins, it's the memory profiler I target first. Allocating memory is cheap Allocating memory has never been cheaper. Memory is cheaper, you can get machines will thousands of GBs of memory. You can buy 16 GB for less than $200. The memory allocation operation is cheaper than in the past, and it's multi-threaded so it scales reasonably well. However, memory allocation is not free. Your CPU cache is a precious resources especially if you are trying to use multiple threads. While you can buy 16 GB of main memory easily, you might only have 2 MB of cache per logical CPU. If you want these CPUs to run independently, you want to spend as much time as possible within the 256 KB L2 cache. Cache level Size access time in clock cycles concurrency 1 32 KB data 32 KB instruction 1 cores independent ...