Degradation in String Deduplication Performance in Recent Java Versions

String deduplication is an important feature aimed at optimizing memory usage by eliminating duplicate strings from heap memory.

Ram Lakshmanan

CORE ·

May. 07, 24 · Tutorial

Likes (3)

Comment

Save

2.3K Views

String deduplication is an important feature aimed at optimizing memory usage by eliminating duplicate strings from heap memory. However, recent observations suggest a concerning trend – a degradation in string deduplication performance across newer Java versions. Thus, we embarked on a comparative analysis to assess the string deduplication performance behavior in Java versions 11, 17, and 21. This post intends to share our observations and insights gleaned from this analysis.

WebCrawler Spring Boot Application

In order to experiment with string deduplication, we used an open-source Web Crawler application developed in Spring Boot. The web crawler is a REST-based application, which will crawl any given website and archive that site’s information into the H2 Database. When an HTTP POST request is made, WebCrawler will start the crawling job on the site and return a jobId. This jobId can be used later to query the status of the crawling task.

To crawl the Wikipedia website, you need to pass the seed with depth. The depth will decide how deep it should crawl the websites and extract the information. For example, this URL will crawl the Wikipedia site with a depth of 100 in the background.

Once you make a POST request, it will return the jobId as a response and the crawling will start in the background. Once the crawling task reaches the depth level, automatically will stop that particular crawling task.

We are going to load test this web crawler with a POST request in Java 11, 17, and 21 and study the String deduplication performance.

Enabling String Deduplication

To study the behavior of String Deduplication, it must first be enabled in the JVM. This can be achieved by passing the following JVM argument:

-XX:+UseStringDeduplication

String Deduplication events performance characteristics (like how many strings examined, how many of them were deduplicated, how long it took to complete, …) can be printed into the log file by passing the following JVM arguments to the application:

-Xlog:stringdedup*=debug:file=string-dup-logfile.log  -Xloggc:string-dup-logfile.log

Based on the above JVM arguments, we are directing both string deduplication debug events and the GC events log messages into one single ‘string-dup-logfile.log’ file

For detailed information about the JVM arguments, refer to this article for String deduplication.

JMeter Load Test

We conducted load testing on the WebCrawler application using JMeter, simulating a load of 50 users submitting crawling tasks with identical seed URLs and depths for approximately 1 hour. We are submitting the same URL so that a lot of duplicate strings will be simulated in the application.

Note this test was conducted on my local laptop, whose configuration is:

Operating System: Windows 11
System type: 64-bit operating system, x64-based processor
RAM: 12 GB
Processor: Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz 1.19 GHz
Java Heap Size (i.e. -Xm512mb)

The image below shows the JMeter configuration in the above case:

*Fig: POST method configuration in JMeter*

Garbage Collection Analysis Study

After completing the tests, we uploaded the generated Garbage Collection log file to the online GCeasy tool for analysis. The tool promptly generated reports showcasing String Deduplication performance metrics for each Java version tested. Here are the reports generated by the tool:

Comparison Study of String Deduplication Key Metrics Across Java Versions

The table below summarizes String deduplication key metrics across the Java versions:

	Java 11	Java 17	Java 21
Inspected Strings	1,540,837	997,693	650,245
Deduplicated Strings	1,342,865	505,376	299,261
Deduplicated Size	102.4 mb	39.57 mb	15.81 mb
Deduplicated Percentage	34.3 %	9.3 %	3.4 %
Deduplication time	1,264.442 ms	2,323.492 ms	3,439.466 ms

The analytics reveal that Java 11 exhibits the best String Deduplication performance, eliminating 34.3% of strings. Conversely, Java 17 and 21 only managed to eliminate 9.3% and 3.4%, respectively. Moreover, the time taken to deduplicate strings increased in modern Java versions, with Java 11 completing the process in only 1,264.442 ms, compared to 2,323.492 ms for Java 17 and 3,439.466 ms for Java 21.

In essence, modern JVMs are spending more time inspecting a higher number of strings while eliminating fewer duplicates from memory. This underscores a clear degradation in JVM performance.

Conclusion

The investigation into string deduplication performance across Java versions 11, 17, and 21 yields significant insights into the evolution of JVM behavior. While string deduplication serves as a vital mechanism for optimizing memory utilization, our analysis reveals a concerning trend of degradation in performance across newer Java releases. Java 11 emerges as the standout performer, efficiently eliminating a substantial portion of duplicate strings within a shorter time frame. In contrast, Java 17 and 21 exhibit diminished effectiveness, both in terms of the percentage of strings deduplicated and the time taken to execute the process.

Java virtual machine garbage collection Java (programming language) Strings Performance

Published at DZone with permission of Ram Lakshmanan, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending