I have seen a number of tests where a JVM has 10K threads. However, what happens if you go beyond this?
My recommendation is to consider having more servers once your total reaches 10K. You can get a decent server for $2K and a powerful one for $10K.
Creating threads gets slower
The time it takes to create a thread increases as you create more thread. For the 32-bit JVM, the stack size appears to limit the number of threads you can create. This may be due to the limited address space. In any case, the memory used by each thread's stack add up. If you have a stack of 128KB and you have 20K threads it will use 2.5 GB of virtual memory.
Bitness
Stack Size
Max threads
32-bit
64K
32,073
32-bit
128K
20,549
32-bit
256K
11,216
64-bit
64K
stack too small
64-bit
128K
32,072
64-bit
512K
32,072
Note: in the last case, the thread stacks total 16 GB of virtual memory.
You still have to watch how many objects you create. This article looks at a benchmark passing events over TCP/IP at 4 billion events per minute using the net.openhft.chronicle.wire.channel package in Chronicle Wire and why we still avoid object allocations.. One of the key optimisations is creating almost no garbage. Allocation is a very cheap operation and collection of very short-lived objects is also very cheap. Does this really make a difference? What difference does one small object per event (44 bytes) make to the performance in a throughput test where GC pauses are amortised? While allocation is as efficient as possible, it doesn’t avoid the memory pressure on the L1/L2 caches of your CPUs and when many cores are busy, they are contending for memory in the shared L3 cache. Results Benchmark on a Ryzen 5950X with Ubuntu 22.10. JVM Vendor, Version No objects Throughput, Average Latency* One object per event Throughput, Average Latency* Azul Zulu 1.8.0_322 60.6 M event/s, 528
A Unique Identifier can be very useful for tracing. Those ids are even more useful when they contain a high-resolution timestamp. Not only do they record the time of an event, but if unique can help trace events as they pass through the system. Such unique timestamps however can be expensive depending on how they are implemented. This post explores a lightweight means of producing a unique, monotonically increasing system-wide nano-second resolution timestamp available in our open-source library. Uses for Unique Identifiers Unique identifiers can be useful to associate with a piece of information so that information can be referred to later unambiguously. This could be an event, a request, an order id, or a customer id. They can naturally be used as a primary key in a database or key/value store to retrieve that information later. One of the challenges of generating these identifiers is avoiding creating duplicates while not having an increasing cost. You could record every identi
A significant feature of Chronicle Queue Enterprise is support for TCP replication across multiple servers to ensure the high availability of application infrastructure. I generally believe that replicating data to a secondary system is faster than syncing to disk, assuming the round trip network delay wasn’t high due to quality networks and co-located redundant servers. This is the first time I have benchmarked it with a realistic example. Little’s Law and Why Latency Matters In many cases, the assumption is that the latency won't be a problem as long as throughput is high enough. However, latency is often a key factor in why the throughput isn’t high enough. Little’s law states, “ the long-term average number L of customers in a stationary system is equal to the long-term average effective arrival rate λ multiplied by the average time W that a customer spends in the system”. In computer terminology, the level of concurrency or parallelism a system has to support must be
Comments
Post a Comment