Java: ChronicleMap, Part 1: Go Off-Heap
Off-heap CronicleMap can contain billions of objects with little or no heap impact.
Join the DZone community and get the full member experience.
Join For FreeFilling up a HashMap
with millions of objects will quickly lead to problems such as inefficient memory usage, low performance, and garbage collection issues. Learn how to use off-heap CronicleMap
that can contain billions of objects with little or no heap impact.
The built-in Map
implementations, such as HashMap
and ConcurrentHashMap
, are excellent tools when we want to work with small to medium-sized data sets. However, as the amount of data grows, these Map
implementations are deteriorating and start to exhibit a number of unpleasant drawbacks as shown in this first article in an article series about open-sourced ChronicleMap
.
Heap Allocation
In the examples below, we will use Point
objects. Point
is a POJO with a public default constructor and getters and setters for X and Y properties (int). The following snippet adds a million Point
objects to a HashMap
:
final Map<Long, Point> m = LongStream.range(0, 1_000_000)
.boxed()
.collect(
toMap(
Function.identity(),
FillMaps::pointFrom,
(u,v) -> { throw new IllegalStateException(); },
HashMap::new
)
);
// Conveniency method that creates a Point from
// a long by applying modulo prime number operations
private static Point pointFrom(long seed) {
final Point point = new Point();
point.setX((int) seed % 4517);
point.setY((int) seed % 5011);
return point;
}
We can easily see the number of objects allocated on the heap and how much heap memory these objects consume:
Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34366 | head
num #instances #bytes class name (module)
-------------------------------------------------------
1: 1002429 32077728 java.util.HashMap$Node (java.base@10)
2: 1000128 24003072 java.lang.Long (java.base@10)
3: 1000000 24000000 com.speedment.chronicle.test.map.Point
4: 454 8434256 [Ljava.util.HashMap$Node; (java.base@10)
5: 3427 870104 [B (java.base@10)
6: 185 746312 [I (java.base@10)
7: 839 102696 java.lang.Class (java.base@10)
8: 1164 89088 [Ljava.lang.Object; (java.base@10)
For each Map
entry, a Long
, HashMap$Node
, and Point
object need to be created on the heap. There are also a number of arrays with HashMap$Node
objects created. In total, these objects and arrays consume 88,515,056 bytes of heap memory. Thus, each entry consumes on average 88.5 bytes.
NB: The extra 2429 HashMap$Node
objects come from other HashMap
objects used internally by Java.
Off-Heap Allocation
Contrary to this, a ChronicleMap
uses very little heap memory as can be observed when running the following code:
final Map<Long, Point> m2 = LongStream.range(0, 1_000_000)
.boxed()
.collect(
toMap(
Function.identity(),
FillMaps::pointFrom,
(u,v) -> { throw new IllegalStateException(); },
() -> ChronicleMap
.of(Long.class, Point.class)
.averageValueSize(8)
.valueMarshaller(PointSerializer.getInstance())
.entries(1_000_000)
.create()
)
);
Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34413 | head
num #instances #bytes class name (module)
-------------------------------------------------------
1: 6537 1017768 [B (java.base@10)
2: 448 563936 [I (java.base@10)
3: 1899 227480 java.lang.Class (java.base@10)
4: 6294 151056 java.lang.String (java.base@10)
5: 2456 145992 [Ljava.lang.Object; (java.base@10)
6: 3351 107232 java.util.concurrent.ConcurrentHashMap$Node (java.base@10)
7: 2537 81184 java.util.HashMap$Node (java.base@10)
8: 512 49360 [Ljava.util.HashMap$Node; (java.base@10)
As can be seen, there are no Java heap objects allocated for the ChronicleMap
entries and consequently no heap memory either.
Instead of allocating heap memory, ChronicleMap
allocates its memory off-heap. Provided that we start our JVM with the flag -XX:NativeMemoryTracking=summary
, we can retrieve the amount off-heap memory being used by issuing the following command:
Pers-MacBook-Pro:chronicle-test pemi$ jcmd 34413 VM.native_memory | grep Internal
- Internal (reserved=30229KB, committed=30229KB)
Apparently, our one million objects were laid out in off-heap memory using a little more than 30 MB of off-heap RAM. This means that each entry in the ChronicleMap
used above needs on average 30 bytes.
This is much more memory effective than a HashMap
that required 88.5 bytes. In fact, we saved 66 percent of RAM memory and almost 100 percent of heap memory. The latter is important because the Java Garbage Collector only sees objects that are on the heap.
Note that we have to decide upon creation how many entries the ChronicleMap
can hold at maximum. This is different compared to HashMap
which can grow dynamically as we add new associations. We also have to provide a serializer (i.e. PointSerializer.getInstance()
), which will be discussed in detail later in this article.
Garbage Collection
Many Garbage Collection (GC) algorithms complete in a time that is proportional to the square of objects that exist on the heap. So if we, for example, double the number of objects on the heap, we can expect the GC would take four times longer to complete.
If we, on the other hand, create 64 times more objects, we can expect to suffer an agonizing 1,024 fold increase in expected GC time. This effectively prevents us from ever being able to create really large HashMap
objects.
With ChronicleMap
, we could just put new associations without any concern of garbage collection times.
Serializer
The mediator between heap and off-heap memory is often called a serializer. ChronicleMap
comes with a number of pre-configured serializers for most built-in Java types such as Integer
, Long
, String
and many more.
In the example above, we used a custom serializer that was used to convert a Point
back and forth between heap and off-heap memory. The serializer class looks like this:
public final class PointSerializer implements
SizedReader<Point>,
SizedWriter<Point> {
private static PointSerializer INSTANCE = new PointSerializer();
public static PointSerializer getInstance() { return INSTANCE; }
private PointSerializer() {}
@Override
public long size(@NotNull Point toWrite) {
return Integer.BYTES * 2;
}
@Override
public void write(Bytes out, long size, @NotNull Point point) {
out.writeInt(point.getX());
out.writeInt(point.getY());
}
@NotNull
@Override
public Point read(Bytes in, long size, @Nullable Point using) {
if (using == null) {
using = new Point();
}
using.setX(in.readInt());
using.setY(in.readInt());
return using;
}
}
The serializer above is implemented as a stateless singleton and the actual serialization in the methods write()
and read()
are fairly straight forward. The only tricky part is that we need to have a null check in the read()
method if the "using" variable does not reference an instantiated/reused object.
How to Install It?
When we want to use ChronicleMap
in our project, we just add the following Maven dependency in our pom.xml file and we have access to the library.
<dependency>
<groupId>net.openhft</groupId>
<artifactId>chronicle-map</artifactId>
<version>3.17.3</version>
</dependency>
If you are using another build tool, for example, Gradle, you can see how to depend on ChronicleMap
by clicking this link.
The Short Story
Here are some properties of ChronicleMap
:
- Stores data off-heap
- Is almost always more memory efficient than a
HashMap
- Implements
ConcurrentMap
- Does not affect garbage collection times
- Sometimes needs a serializer
- Has a fixed max entry size
- Can hold billions of associations
- Is free and open-source
Published at DZone with permission of Per-Åke Minborg, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments