Weak, Soft, and Phantom References in Java (and Why They Matter)
This breakdown of weak, soft, and phantom references explains how they impact GC and memory allocation as well as some ideal use cases.
Join the DZone community and get the full member experience.
Join For FreeAllocation problem at Sunset restaurant in Crete
Almost every Java programmer knows there are Soft and Weak references, but usually, they are not fully understood. Phantom ones are even less well-known.
I think this is a bit of a shame because they are not complex to understand (compared to Locks, for example), and they can be really useful to know if you have memory problems with your application.
So, I have prepared a tiny GitHub project to show how to use them. It is also quite interesting how the different garbage collectors are treating them, but that will be the topic for my next post.
Before analyzing the code, let’s consider why we need memory references at all.
The Problem
One common problem of all computer languages that allow for dynamic memory allocation is finding a way to “collect” the memory after is not used anymore.
It is a bit like in a restaurant. In the beginning, you can accommodate customers with empty tables, but when you don’t have empty tables anymore, you need to check if some of the already-allocated tables have got free in the meanwhile.
Some languages, like C, leave the responsibility to users: You have the memory and now it is your responsibility to free it. It’s a bit like in fast food, where you are (in theory) supposed to clean up your table after the meal.
This is very efficient… if everybody behaves correctly. But if some customers forget to clean up, it will easily become a problem. Same with memory: It’s very easy to forget to free an area of memory.
So Garbage Collectors (GC from here on) come to help. Some languages, namely Java, use a special algorithm to collect all the memory that is not used. That is very nice of them and very convenient for programmers. You may be forgiven if you think that GC is a relatively recent technique.
Garbage collection was invented by John McCarthy around 1959 to simplify manual memory management in Lisp.
Modern GCs are very sophisticated programs, and they use several combined techniques to quickly identify memory that can be reused. For the moment, let’s assume Java GC works flawlessly and that it will free all objects that are not reachable anymore.
That introduces a new problem: What if we want to keep a reference to an object, but we don’t want to prevent GC from freeing it if there is no other reference? It’s a bit like sitting a while at a table at a restaurant after having finished, but also being ready to leave if a new customer needs the table.
The Solution
You may wonder why would I need such a thing. Actually, there are a few use cases. Let’s introduce our protagonists from the Java documentation:
SoftReference: Soft reference objects are cleared at the discretion of the garbage collector in response to memory demand. Soft references are most often used to implement memory-sensitive caches. All soft references to softly reachable objects are guaranteed to have been cleared before the virtual machine throws an
OutOfMemoryError
.WeakReference: Weak reference objects do not prevent their referents from being made finalizable, finalized, and then reclaimed. Weak references are most often used to implement canonicalizing mappings. (Here, Canonicalizing mappings means mapping only reachable object instances.)
PhantomReference: Phantom reference objects are enqueued after the collector determines that their referents may otherwise be reclaimed. Phantom references are most often used for scheduling pre-mortem cleanup actions in a more flexible way than is possible with the Java finalization mechanism. Unlike soft and weak references, phantom references are not automatically cleared by the garbage collector as they are enqueued. An object that is reachable via phantom references will remain so until all such references are cleared or themselves become unreachable.
So in brief: Soft references try to keep the reference. Weak references don’t try to keep the reference. Phantom references don’t free the reference until cleared.
To reuse (and stretch) our restaurant metaphor one last time: A SoftReference is like a customer that says, "I’ll leave my table only when there are no other tables available." A WeakReference is like someone ready to leave as soon as a new customer arrives. A PhantomReference is like someone ready to leave as soon as a new customer arrives, but actually not leaving until the manager gives them permission.
Now let’s go back to the code.
The small program runs from the command line and does a few, simple things:
- Allocate 500,000 1KB blocks in a linked list,
- Reference them using one of the 3 reference types.
- De-reference half of them,
- Remove unused references,
- Repeat all of the above 100 times and exit.
To make things difficult for the GC, it removes alternate elements from the linked list so that the list is always composed of elements of a different “age”. Which is important for how GC works, as we will see in the next post.
Reference<HeavyList> softRef = new SoftReference<>(curr, queue);
Reference<HeavyList> weakRef = new WeakReference<>(curr, queue);
Reference<HeavyList> phantomRef = new PhantomReference<>(curr, queue);
As you can see, it is very easy to create a reference to our object (curr in this case). All reference types take the referenced object in the constructor and, optionally, a queue as a parameter.
We can always reach the referenced object with the method get(). In the case of Weak and Soft, get
will return the actual object if still active — that is, if it is reachable by other objects. In case the object has been collected, get() will return null.
This opens a possible problem if someone manages to “resurrect” the object using the reference get() during the finalization. For this reason, Phantom always returns null in the get() regardless of whether the object is still active. In this way, we can pass a PhantomReference to another object without risking that it will store a new, hard reference to it.
The other parameter in the constructor is the ReferenceQueue. To understand why is important, we have to consider how we know when the referenced object is finalized.
For Soft and Weak references, we can check the get() method, but it would be very time consuming if we have a big list of references. Moreover, for Phantom references, we cannot use it at all.
For this reason, if we pass a queue in the constructor of the reference, we will get a notification when the referenced object expires. In my simple example, I poll the queue after the deallocation:
private static int removeRefs(ReferenceQueue queue, Set < Reference < HeavyList >> references) {
int removed = 0;
while (true) {
Reference r = queue.poll();
if (r == null) break;
references.remove(r);
removed++;
}
return removed;
}
If queue.poll() returns null, then the queue is empty. A less naive approach is to create a separate thread and call queue.remove() , which will block until there is something to remove.
Just remember that whilst Weak and Soft references are put in the queue after the object is finalized, Phantom references are put in the queue before. If for any reason you don’t poll the queue, the actual objects referenced by Phantom will not be finalized, and you can incur an OutOfMemory error.
Possible Uses
Well, so now that we understand memory references better, what can we use them for?
The Java documentation already suggests some uses for the references.
SoftReferences can be used to implement a cache that can grow without risking an application crash. To do this, you need to implement a Map interface in which values are stored, wrapped inside a SoftReference. SoftReferences will keep the objects alive until there is memory available on the heap, but it will discard them before an OutOfMemoryError.
If you are interested, there is an example in Guava to study. You need to keep in mind that filling almost all your memory can slow down your program so much that a cache hardly matters. It’s easy to verify this just by running the program and uncommenting the line that creates the SoftReference.
WeakReferences can be used, for example, to store some information related to an object until the object gets finalized. To do this, you can implement a Map in which the keys are wrapped in a WeakReference. As soon as GC reclaims the key object, you can remove the value as well.
Of course, it can also be done using some notification mechanism, but using GC will be more robust and efficient. As an example, you can look at java.util.WeakHashMap, but it is not thread-safe.
PhantomReferences can be used to notify you when some object is out of scope to do some resource cleanup. Remember that the object.finalize() method is not guaranteed to be called at the end of the life of an object, so if you need to close files or free resources, you can rely on Phantom. Since Phantom doesn't have a link to the actual object, a typical pattern is to derive your own Reference type from Phantom and add some info useful for the final freeing, for example filename.
@simonebordet suggested another use for Phantom (or Weak) references: To verify memory leaks. You can look at the Jetty LeakDetector class as an example.
Playing around with this small program, I also verified that WeakReferences are sensibly faster than ShadowReferences. In a project I am working on, I added a WeakReference to some critical resources for each request, and then I added monitor info to verify they are actually freed in a reasonable time after the request expires.
I learned a lot writing this, and I hope this post can be useful to other people as well. The next blog post will continue the analysis of memory allocation and performance, comparing Java CMS and G1 Garbage Collectors.
Published at DZone with permission of Uberto Barbini, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments