JVM Performance Tuning for High Throughput and Low Latency

Optimize JVM performance by tuning heap size and garbage collection for low-latency, high-throughput in memory-intensive, multi-threaded applications.

Md Amran Hossain

CORE ·

Nov. 14, 24 · Tutorial

Likes (11)

Comment

Save

5.6K Views

Java Virtual Machine (JVM) tuning is the process of adjusting the default parameters to match our application needs. This includes simple adjustments like the size of the heap, through choosing the right garbage collector to use optimized versions of getters.

Understanding the Java Virtual Machine (JVM)

What Is JVM?

The Java Virtual Machine (JVM) is a key component in the Java ecosystem that enables Java applications to be platform-independent. It interprets Java bytecode and executes it as machine code on various operating systems, making it possible to "write once, run anywhere."

Optimizing Garbage Collection

Garbage Collection

The Java application creates many objects to handle incoming requests. After these requests are serviced, the objects become 'garbage' and must be cleaned up. Garbage Collection (GC) is essential for freeing up memory but can slow response times and increase CPU usage. Therefore, tuning GC is important for optimizing performance.

ZGC Algorithm

The main feature of Z Garbage Collector (ZGC) is its focus on minimizing GC pause times. It is designed to ensure that pause times, typically in a few milliseconds, even with large heap sizes.

Choose ZGC in multi-thread or memory-intensive applications that require low-latency performance, with large heaps or high-throughput, low-latency use cases. It minimizes pause times, scales with large memory sizes, and improves predictability.

JVM Argument

To achieve high throughput and low latency in multi-thread or memory-intensive services, you have to tune the JVM argument.

-Xss256k

The -Xss option in Java is used to set the thread stack size for each thread in the Java Virtual Machine (JVM). The -Xss256k option specifically sets the thread stack size to 256 kilobytes (KB). This value is useful when tuning Java applications, particularly in multi-threaded scenarios.
Pros: In highly concurrent applications, such as servers handling many simultaneous requests (e.g., web servers, message brokers, etc.), reducing the thread stack size can help prevent memory exhaustion by allowing more threads to be created without hitting memory limits.

-Xms<Size>g

The -Xms JVM option in Java specifies the initial heap size for the JVM when an application starts.
Pros: A large-scale web application, a data processing system, or an in-memory database might benefit from an initial large heap allocation to accommodate its memory needs without constantly resizing the heap during startup.

-Xmx<MaxSize>g

The -Xmx JVM option in Java sets the maximum heap size for the JVM. Max size could be 90% percent of RAM.
Pros:
- Avoids OutOfMemoryError
- Optimizes GC performance
- Optimizes memory-intensive applications (in-memory databases, caching, etc.)
- Avoids frequent heap resizing
- Improves performance for multi-threaded applications

-XX:+UseZGC

The -XX:+UseZGC option in Java enables ZGC in the JVM. ZGC is a low-latency garbage collector designed to minimize pause times during garbage collection, even for applications running with very large heaps (up to terabytes of memory).
Pros:
- Low-latency GC
- Scalability with large heaps
- Concurrent and incremental GC
- Suitable for containerized and cloud-native environments
- Low pause time, even during Full GC
- Better for multi-core systems

-XX:+ZGenerational

The -XX:+ZGenerational option is used in Java to enable a generational mode for the ZGC. By default, ZGC operates as a non-generational garbage collector, meaning it treats the entire heap as a single, unified region when performing garbage collection.
Pros:
- Improves performance for applications with many short-lived objects
- Reduces GC pause times by collecting young generation separately
- Improves heap management
- Reduces the cost of full GCs

Using -XX:+ZGenerational enables generational garbage collection in ZGC, which improves performance for applications with a mix of short-lived and long-lived objects by segregating these into different regions of the heap. This can lead to better memory management, reduced pause times, and improved scalability, particularly for large-scale applications that deal with large amounts of data.

-XX:SoftMaxHeapSize=<Size>g

The -XX:SoftMaxHeapSize JVM option is used to set a soft limit on the maximum heap size for a Java application. When you use -XX:SoftMaxHeapSize=4g, you're telling the JVM to aim for a heap size that does not exceed 4 gigabytes (GB) under normal conditions, but the JVM is allowed to exceed this limit if necessary.
Pros:
- Memory management with flexibility
- Handling memory surges without crashing
- Effective for containerized or cloud environments
- Performance tuning and resource optimization
- Preventing overuse of memory

-XX:+UseStringDeduplication

The -XX:+UseStringDeduplication option in Java enables string deduplication as part of the garbage collection process. String deduplication is a technique that allows the JVM to identify and remove duplicate string literals or identical string objects in memory, effectively reducing memory usage by storing only one copy of the string value, even if it appears multiple times in the application.
Pros:
- Reduces memory usage for duplicate strings
- Optimizes memory in large applications with many strings
- Applicable to interned strings
- Helps with large textual data handling
- Automatic deduplication with minimal configuration

Using the -XX:+UseStringDeduplication flag can be a great way to optimize memory usage in Java applications, especially those that deal with large numbers of repeated string values. By enabling deduplication, you allow the JVM to eliminate redundant copies of strings in the heap, which can lead to significant memory savings and improved performance in memory-intensive applications.

-XX:+ClassUnloadingWithConcurrentMark

The -XX:+ClassUnloadingWithConcurrentMark option in Java is used to enable the concurrent unloading of classes during the concurrent marking phase of garbage collection. This option is particularly useful when running applications that dynamically load and unload classes, such as application servers (e.g., Tomcat, Jetty), frameworks, or systems that rely on hot-swapping or dynamic class loading.
Pros:
- Reduces GC pause times
- Improves memory management in long-lived applications
- Better scalability for servers and containers
- Hot-swapping and frameworks

-XX:+UseNUMA

The -XX:+UseNUMA JVM option is used to optimize the JVM for systems with a Non-Uniform Memory Access (NUMA) architecture. NUMA is a memory design used in multi-processor systems where each processor has its own local memory, but it can also access the memory of other processors, albeit with higher latency. The -XX:+UseNUMA option enables the JVM to optimize memory allocation and garbage collection on NUMA-based systems to improve performance.
Pros:
- Improves memory access latency and performance
- Better GC efficiency
- Optimizes memory allocation
- Better scalability on multi-socket systems

What is NUMA?

In a NUMA system, processors are connected to local memory, and each processor can access its own local memory more quickly than the memory that is attached to other processors. In contrast to Uniform Memory Access (UMA) systems, where all processors have the same access time to all memory, NUMA systems have asymmetric memory access due to the varying latency to local versus remote memory.

NUMA architectures are commonly used in large-scale servers with multiple processors (or sockets), where performance can be improved by ensuring that memory is accessed locally as much as possible.

What Does -XX:+UseNUMA Do?

When you enable the -XX:+UseNUMA option, the JVM is configured to optimize memory access by considering the NUMA topology of the system. Specifically, the JVM will:

Allocate memory from the local NUMA node associated with the processor executing a given task (whenever possible).
Keep thread-local memory close to the processor where the thread is running, reducing memory access latency.
Improve garbage collection performance by optimizing how the JVM manages heap and other memory resources across multiple NUMA nodes.

-XX:ConcGCThreads=<size>

The -XX:ConcGCThreads option in Java allows you to control the number of concurrent GC threads that the JVM uses during the concurrent phases of garbage collection.
Pros:
- Controls the degree of parallelism in GC
- Minimizes garbage collection pause times
- Optimizes performance based on hardware resources
- Improves throughput for multi-threaded applications

What Does -XX:ConcGCThreads Do?

When the JVM performs garbage collection, certain collectors (e.g., G1 GC or ZGC) can execute phases of garbage collection concurrently, meaning they run in parallel with application threads to minimize pause times and improve throughput. The -XX:ConcGCThreads option allows you to specify how many threads the JVM should use during these concurrent GC phases.

-XX:+ZUncommit

The -XX:+ZUncommit JVM option is used to control the behavior of memory management in the ZGC, specifically related to how the JVM releases (or uncommits) memory from the operating system after it has been allocated for the heap.
Pros:
- Reduces memory footprint
- Dynamic memory reclamation in low-memory environments
- Avoids memory fragmentation
- Optimizes GC overhead

-XX:+AlwaysPreTouch

The -XX:+AlwaysPreTouch JVM option is used to pre-touch the memory pages that the JVM will use for its heap, meaning that the JVM will touch each page of memory (i.e., access it) as soon as the heap is allocated rather than lazily touching pages when they are actually needed.
Pros:
- Reduces latency during application runtime
- Prevents OS page faults during initial execution
- Preloads virtual memory in large heap applications
- Improves memory allocation efficiency in multi-core systems
- Avoids memory swapping during startup

-XX:MaxGCPauseMillis=<size>

The -XX:MaxGCPauseMillis option in Java is used to set a target for the maximum acceptable pause time during GC. When you specify -XX:MaxGCPauseMillis=100, you are instructing the JVM's garbage collector to aim for a maximum GC pause time of 100 milliseconds.
Pros:
- Minimize application latency
- Control and balance throughput vs. pause time
- Optimized for interactive or high-throughput systems
- Garbage collection tuning in G1 GC
- Improved user experience in web and server applications
- Use with ZGC and other low-latency garbage collectors

-XX:+UseLargePages

The -XX:+UseLargePages JVM option is used to enable the JVM to use large memory pages (also known as huge pages or Superpages) for the Java heap and other parts of the memory, such as the metaspace and JIT (Just-In-Time) compilation caches.
Pros:
- Improves memory access performance
- Reduces operating system overhead
- Better performance for memory-intensive applications
- Lower memory fragmentation
- Reduces memory paging activity

What Does -XX:+UseLargePages Do?

Operating systems typically manage memory in pages, which are the basic unit of memory allocation and management. The size of a memory page is usually 4 KB by default on many systems, but some systems support larger page sizes—commonly 2 MB (on x86-64 Linux and Windows systems) or even 1 GB for certain processors and configurations.

-XX:+UseTransparentHugePages

The -XX:+UseTransparentHugePages JVM option enables the use of Transparent Huge Pages (THP) for memory management in the Java Virtual Machine (JVM). Transparent Huge Pages are a Linux kernel feature designed to automatically manage large memory pages, improving performance for memory-intensive applications.

Bonus: How to Use JVM Arguments in a Dockerfile for Java and Spring Boot Services.

    Dockerfile
   
 

   ENTRYPOINT [
  "java",
  "-Xss256k",
  "-Xms1g",
  "-Xmx4g",
  "-XX:+UseZGC",
  "-XX:+UseStringDeduplication",
  "-XX:+ZGenerational",
  "-XX:SoftMaxHeapSize=4g",
  "-XX:+ClassUnloadingWithConcurrentMark",
  "-XX:+UseNUMA",
  "-XX:ConcGCThreads=4",
  "-XX:+ZUncommit",
  "-XX:+AlwaysPreTouch",
  "-XX:MaxGCPauseMillis=100",
  "-XX:+UseLargePages",
  "-XX:+UseTransparentHugePages",
  "org.springframework.boot.loader.launch.JarLauncher"
]
  

Conclusion

Performance tuning in the JVM is essential for optimizing multi-threaded and memory-intensive applications, especially when aiming for high throughput and low latency. The process involves fine-tuning garbage collection, optimizing memory management, and adjusting concurrency settings.

Java virtual machine

Opinions expressed by DZone contributors are their own.

Related

Trending