Going Beyond Java 8: Compact Strings
Compact strings are one of the most compelling reasons to move forward from Java 8. Here's why.
Join the DZone community and get the full member experience.
Join For FreeOriginally published December 15, 2020
Introduction
According to some surveys such as that of JetBrains, version 8 of Java is currently the most used by developers all over the world, despite being a 2014 release.
What you are reading is the first in a series of articles titled “Going beyond Java 8”, inspired by the contents of my book “Java for Aliens”. These articles will guide the reader step by step to explore the most important features introduced starting from version 9. The aim is to make the reader aware of how important it is to move forward from Java 8, explaining the enormous advantages that the latest versions of the language offer.
In this article, we will talk about compact strings, a mechanism introduced with Java 9, which represents one of the most valid reasons to abandon Java 8 and upgrade to one of the most recent versions.
Spoiler Alert
The String
class is statistically the most used class in Java programming. Therefore, it seems important to ask ourselves how efficient the objects of this class are. The good news is that starting from Java 9, these objects are significantly better performing than the previous version. Moreover, this advantage is obtained practically without effort, that is, it will be enough to launch our program with a JVM version 9 (or higher), without adopting any precautions regarding our code. So, let's understand what compact strings are and how to use them.
Behind the Scenes
Figure 1 – Location of the src.zip file inside the JDK version 8 installation folder.
Up to Java 8, an array of char was used within the class to store the characters that made up the string. It was possible to verify this by reading the source code of the String class. To do this, simply search for the String.java file in the src.zip file located in the installation folder of the JDK version 8.
This file contains all the source files of the standard Java library.
So, after unzipping it, we can find the source of the String.java class in the java/lang path (in fact the String
class belongs to the java.lang
package). If we open this file with any editor, we can verify that the String
class is declared as follows (we have removed some comments and other elements not useful for our discussion):
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/** The value is used for character storage. */
private final char value[];
// omitted the rest of the code
Up to Java 8, therefore, the existence of the value character array implied that 16 bits (2 bytes) of memory were allocated for each character of a string.
Actually, in most applications, we use characters that can be stored in only 8 bits (1 byte). So, to get more performance in terms of speed and memory usage in our programs, in Java 9 the implementation of the String
class has been revised to be supported by a byte array instead of a char
array. Following is the initial part of the declaration of the String
class in version 15 of Java, stripped of uninteresting elements:
xxxxxxxxxx
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/** The value is used for character storage. */
private final byte[] value;
/**
* The identifier of the encoding used to encode the bytes in
* {@code value}.
*/
private final byte coder;
From JDK 9, the src.zip file has been moved to the lib directory, and the packages have been included in the folders that represent the modules. So, the String.java source is now under the java.base/java/lang folders. In fact, java.base is the name of the module that contains the java.lang
package.
However, it is always possible to use less common characters that need to be stored in 16 bits (2 bytes). In fact, inside the String
class, has been implemented a mechanism based on the coder variable which takes care of allocating the right amount of bytes for each character. This mechanism is known as compact strings, and since version 9 of Java it is the method used by default by the JVM. Nothing changes programmatically, we will use strings as we have always used them. However, Java applications will perform better.
Are We Really Going to Use Half the Memory for Strings?
Although we have noticed that today the String
class is supported by a byte
array instead of a char
array as in version 8, unfortunately with Java it is not possible to determine a priori how much memory a program will use. In fact, it is automatically managed by the complex mechanisms of the Garbage Collector, and at each execution, our program could use very different amounts of memory. Furthermore, there is no way in Java to know precisely how much memory is being used for a certain object at any given time as is possible with other languages.
With a strategy based on the Instrumentation
interface of the java.lang
.instrument package, it is possible to have an approximation of the size of an object, but this does not apply to strings which, being immutable objects, are allocated in memory in a different way than the other items. So, even if the compact strings mechanism seems to imply a memory saving, this is neither certain nor demonstrable. So, let's see what the advantage involves using a JDK version 9 or higher with a code example.
Example
Let's consider the following example:
xxxxxxxxxx
public class CompactStringsDemo {
public static void main(String[] args) {
long initialTime = System.currentTimeMillis();
long limit = 100_000;
String s ="";
for (int i = 0; i < limit; i++) {
s += limit;
}
long totalTime = System.currentTimeMillis() - initialTime;
System.out.println("Created "+ limit +" strings in "+ totalTime +
" milliseconds");
}
}
In this class, 100,000 strings are instantiated (which contain the very first 100,000 numbers) which are concatenated. Furthermore, the milliseconds it takes to create these instances and concatenate them are calculated and printed.
Let's try to launch this application 5 times using the JDK version 15.1, and analyze the outputs:
xxxxxxxxxx
java CompactStringsDemo
Created 100000 strings in 3539 milliseconds
java CompactStringsDemo
Created 100000 strings in 3548 milliseconds
java CompactStringsDemo
Created 100000 strings in 3564 milliseconds
java CompactStringsDemo
Created 100000 strings in 3561 milliseconds
java CompactStringsDemo
Created 100000 strings in 3609 milliseconds
We can observe that for each launch the speed of the application is almost constant, and is around 3.5 seconds.
So let's try to disable compact strings using the -XX:-CompactStrings
option, and try to run the same application 5 times and then analyze the results:
xxxxxxxxxx
java -XX:-CompactStrings CompactStringsDemo
Created 100000 strings in 8731 milliseconds
java -XX:-CompactStrings CompactStringsDemo
Created 100000 strings in 8263 milliseconds
java -XX:-CompactStrings CompactStringsDemo
Created 100000 strings in 8547 milliseconds
java -XX:-CompactStrings CompactStringsDemo
Created 100000 strings in 8602 milliseconds
java -XX:-CompactStrings CompactStringsDemo
Created 100000 strings in 8353 milliseconds
Again, the performance in terms of speed is almost constant, but much worse than when we used the compact strings. In fact, the average execution speed of this application without compact strings turns out to be about 8.5 seconds, while when we used compact strings, the average was only about 3.5 seconds. A significant advantage that has saved us almost 60% of the time.
If we even recompile and relaunch the program directly with the latest build of Java 8 (JDK 1.8.0_261), the advantages are even more evident:
xxxxxxxxxx
"C:\Program Files\Java\jdk1.8.0_261\bin\java" CompactStringsDemo
Created 100000 strings in 31113 milliseconds
"C:\Program Files\Java\jdk1.8.0_261\bin\java" CompactStringsDemo
Created 100000 strings in 30376 milliseconds
"C:\Program Files\Java\jdk1.8.0_261\bin\java" CompactStringsDemo
Created 100000 strings in 32868 milliseconds
"C:\Program Files\Java\jdk1.8.0_261\bin\java" CompactStringsDemo
Created 100000 strings in 32508 milliseconds
"C:\Program Files\Java\jdk1.8.0_261\bin\java" CompactStringsDemo
Created 100000 strings in 35328 milliseconds
The deterioration in performance this time is even more evident: with a JDK 15 and compact strings the performance of the application was almost 10 times better! Of course, this does not mean that all programs will have such great improvements because our example was exclusively based on the allocation and concatenation of strings.
Regarding the saving of memory usage, although probable, as we have said, it cannot be proved since the Garbage Collector performs a complex job based on the current situation.
Conclusions
In this article, we have seen the first valid reason to move forward from Java 8. The compact strings introduced starting from version 9, allow our programs to be more efficient when strings are used. Since the String class is statistically the most used class in Java programs, we can conclude that just using a JDK with a version greater than 8 will guarantee a faster execution speed for our applications. We also found that a JDK 15 without using compact strings still guarantees significantly higher performance than the latest build of the JDK 8.
Updating the JDK seems like the first step.
Author Notes
Even ignoring the increased security offered by the latest versions of the JDK, there are plenty of reasons to upgrade your knowledge of Java, or at least your own Java runtime installations. My book "Java for Aliens", which inspired the " Going beyond Java 8" series, contains all the information you need to learn Java from scratch, and uses a well-tested teaching method that has been perfected over 20 years of experience, which makes learning simple and exciting. It is also structured to deepen the topics and have superior knowledge that can make a difference in your career.
Published at DZone with permission of Claudio De Sio Cesari. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments