JVM and Garbage Collection Interview Questions: The Beginners Guide
Have an interview coming up? Let us help you prep with these JVA and garbage collection basics.
Join the DZone community and get the full member experience.
Join For Freethe java virtual machine is the achilles heel of most developers and can cause even the most seasoned developers to be come unstuck. the simple fact is that unless something is going wrong, we don’t normally care about it. maybe we tune it a little when the application goes live but after that it remains untouched until something goes wrong. this makes it a very difficult subject to excel in during interviews. even worse, interviewers love to ask questions about it. everyone should have a basic knowledge of the jvm to be able to do their job but often people recruiting are looking for someone who knows how to fix a problem like a memory leak when it happens.
in this guide we take a ground up approach to the jvm and garbage collection so you can feel some level of confidence going into your big day.
java question: what is the jvm? why is it a good thing? what is “write once, run anywhere”? are there negatives?
jvm stands for java virtual machine. java code is compiled down into an intermediary language called byte code. the java virtual machine is then responsible for executing this byte code. this is unlike languages such as c++ which are compiled directly to native code for a specific platform.
this is what gives java its ‘write once, run anywhere’ ability. in a language which compiles directly to platform you would have to compile and test the application separately on every platform on which you wish it to run. there would likely me several issues with libraries, ensuring they are available on all of the platforms for example. every new platform would require new compilation and new testing. this is time consuming and expensive.
on the other hand a java program can be run on any system where a java virtual machine is available. the jvm acts as the intermediary layer and handles the os specific details which means that as developers we shouldn’t need to worry about it. in reality there are still some kinks between different operating systems, but these are relatively small. this makes it quicker and easier to develop, and means developers can write software on windows laptops that may be destined for other platforms. i only need to write my software once and it is available on a huge variety of platforms, from android to solaris.
in theory, this is at the cost of speed. the extra layer of the jvm means it is slower than direct-to-tin languages like c. however java has been making a lot of progress in recent years, and given the many other benefits such as ease of use, it is being used more and more often for low latency applications.
the other benefit of the jvm is that any language that can compile down to byte code can run on it, not just java. languages like groovy, scala and clojure are all jvm based languages. this also means the languages can easily use libraries written in other languages. as a scala developer i can use java libraries in my applications as it all runs on the same platform.
the separation from the real hardware also means the code is sandboxed, limiting the amount of damage it can do to a host computer. security is a great benefit of the jvm.
there is another interesting facet to consider; not all jvms are built equal. there are a number of different implementations beyond the standard jvm implementation from oracle. jrockit is renowned for being an exceptionally quick jvm. openjdk is an open source equivalent. there are tons of jvm implementations available. whilst this choice is ultimately a good thing, all of the jvms may behave slightly differently. a number of areas of the java specification are left intentionally vague with regards to their implementation and each vm may do things differently. this can result in a bug which only manifests in a certain vm in a certain platform. these can be some of the hardest bugs to figure out.
from a developer perspective, the jvm offers a number of benefits, specifically around memory management and performance optimisation.
java interview question: what is jit?
jit stands for “just in time”. as discussed, the jvm executes bytecode. however, if it determines a section of code is being run frequently it can optionally compile a section down to native code to increase the speed of execution. the smallest block that can be jit compiled is a method. by default, a piece of code needs to be excuted 1500 times for it to be jit compiled although this is configurable. this leads to the concept of “warming up” the jvm. it will be at it’s most performant the longer it runs as these optimisations occur. on the downside, jit compilation is not free; it takes time and resource when it occurs.
java garbage collection interview questions
java interview question: what do we mean when we say memory is managed in java? what is the garbage collector?
in languages like c the developer has direct access to memory. the code literally references memory space addresses. this can be difficult and dangerous, and can result in damaging memory leaks. in java on the other hand all memory is managed. as a programmer we deal exclusively in objects and primitives and have no concept of what is happening underneath with regards to memory and pointers. most importantly, java has the concept of a garbage collector. when objects are no longer needed the jvm will automatically identify and clear the memory space for us.
java interview question: what are the benefits and negatives of the garbage collector?
on the positive side:
- the developer can worry much less about memory management and concentrate on actual problem solving. although memory leaks are still technically possible they are much less common.
- the gc has a lot of smart algorithms for memory management which work automatically in the background. contrary to popular belief, these can often be better at determining when best to perform gc than when collecting manually.
on the negative side
- when a garbage collection occurs it has an effect on the application performance, notably slowing it down or stopping it. in so called “stop the world” garbage collections the rest of the application will freeze whilst this occurs. this is can be unacceptable depending on the application requirements, although gc tuning can minimise or even remove the impact.
- although it’s possible to do a lot of tuning with the garbage collector, you cannot specify when or how the application performs gc.
java interview question: what is “stop the world”?
when a gc happens it is necessary to completely pause the threads in an application whilst collection occurs. this is known as stop the world. for most applications long pauses are not acceptable. as a result it is important to tune the garbage collector to minimise the impact of collections to be acceptable for the application.
java interview question: how does generational gc work? why do we use generational gc? how is the java heap structured?
it is important to understand how the java heap works to be able to answer questions about gc. all objects are stored on the heap (as opposed to the stack, where variables and methods are stored along with references to objects in the heap). garbage collection is the process of removing objects which are no longer needed from the heap and returning the space for general consumption. almost all gcs are “generational”, where the heap is divided into a number of sections, or generations. this has proven significantly more optimal which is why almost all collectors use this pattern.
new generation
most applications have a high volume of short lived objects. analyzing all objects in an application during a gc would be slow and time consuming, so it therefore makes sense to separate the shortlived objects so that they can be quickly collected. as a result all new objects are placed into the new generation. new gen is split up further:
- eden space: all new objects are placed in here. when it becomes full, a minor gc occurs. all objects that are still referenced are then promoted to a survivor space
- survivor spaces: the implementation of survivor spaces varies based on the jvm but the premise is the same. each gc of the new generation increments the age of objects in the survivor space. when an object has survived a sufficient number of minor gcs (defaults vary but normally start at 15) it will then be promoted to the old generation. some implementations use two survivor spaces, a from space and a to space. during each collection these will swap roles, with all promoted eden objects and surviving objects move to the to space, leaving from empty.
a gc in the newgen is known as a minor gc .
one of the benefits of using a new generation is the reduction of the impact of fragmentation. when an object is garbage collected, it leaves a gap in the memory where it was. we can compact the remaining objects (a stop-the-world scenario) or we can leave them and slot new objects in. by having a generational gc we limit the amount that this happens in the old generation as it is generally more stable which is good for improving latencies by reducing stop the world. however if we do not compact we may find objects cannot just fit in the spaces inbetween, perhaps due to size concerns. if this is the case then you will see objects failing to be promoted from new generation.
old generation
any objects that survive from survivor spaces in the new generation are promoted to the old generation. the old generation is usually much larger than the new generation. when a gc occurs in old gen it is known as a full gc. full gcs are also stop-the-world and tend to take longer, which is why most jvm tuning occurs here. there are a number of different algorithms available for garbage collection, and it is possible to use different algorithms for new and old gen.
serial gc
designed when computers only had one cpu and stops the entire application whilst gc occurs. it uses mark-sweep-compact. this means it goes through all of the objects and marks which objects are available for garbage collection, before clearing them out and then copying all of the objects into contiguous space (so therefore has no fragmentation).
parallel gc
similar to serial, except that it uses multiple threads to perform the gc so should be faster.
concurrent mark sweep
cms gc minimises pauses by doing most of the gc related work concurrently with the processing of the application. this minimises the amount of time when the application has to completely pause and so lends itself much better to applications which are sensitive to this. cms is a non compacting algorithm which can lead to fragmentation problems. the cms collector actually uses parallel gc for the young generation.
g1gc (garbage first garbage collector)
a concurrent parallel collector that is viewed as the long term replacement for cms and does not suffer from the same fragmentation problems as cms.
permgen
the permgen is where the jvm stores the metadata about classes. it no longer exists in java 8, having been replaced with metaspace. generally the permgen doesn’t require any tuning above ensuring it has enough space, although it is possible to have leaks if classes are not being unloaded properly.
java interview question: which is better? serial, parallel or cms?
it depends entirely on the application. each one is tailored to the requirements of the application. serial is better if you’re on a single cpu, or in a scenario where there are more vms running on the machine than cpus. parallel is a throughput collector and really good if you have a lot of work to do but you’re ok with pauses. cms is the best of the three if you need consistent responsiveness with minimal pauses.
from the code
java interview question: can you tell the system to perform a garbage collection?
this is an interesting question. the answer is both yes and no. we can use the call “system.gc()” to suggest to the jvm to perform a garbage collection. however, there is no guarantee this will do anything. as a java developer, we don’t know for certain what jvm our code is being run in. the jvm spec makes no guarantees on what will happen when this method is called. there is even a startup flag, -xx:+disableexplicitgc, which will stop this from doing anything.
it is considered bad practice to use system.gc().
java interview question: what does finalize() do?
finalize() is a method on java.lang.object so exists on all objects. the default implementation does nothing. it is called by the garbage collector when it determines there are no more references to the object. as a result there are no guarantees the code will ever be executed and so should not be used to execute actual functionality. instead it is used for clean up, such as file references. it will never be called more than once on an object (by the jvm).
tuning
java interview question: what flags can i use to tune the jvm and gc?
there are textbooks available on tuning the jvm for optimal garbage collection. nonetheless it’s good to know a few for the purpose of interview.
-xx:-useconcmarksweepgc: use the cms collector for the old gen.
-xx:-useparallelgc: use parallel gc for new gen
-xx:-useparalleloldgc: use parallel gc for old and new gen.
-xx:-heapdumponoutofmemoryerror: create a thread dump when the application runs out of memory. very useful for diagnostics.
-xx:-printgcdetails: log out details of garbage collection.
-xms512m: sets the initial heap size to 512m
-xmx1024m: sets the maximum heap size to 1024m
-xx:newsize and -xx:maxnewsize: specifically set the default and max size of the new generation
- xx:newratio=3: set the size of the young generation as a ratio of the size of the old generation.
-xx:survivorratio=10: set the size of eden space relative to the size of a survivor space.
diagnosis
whilst all of the questions above are very good to know to show you have a basic understanding of how the jvm works, one of the most standard questions during an interview is this: “ have you ever experience a memory leak? how did you diagnose it?”. this is a difficult question to answer for most people as although they may have done it, chances are it was a long time ago and isn’t something you’ve done recently. the best way to prepare is to actually try and write an application with a memory leak and attempt to diagnosis it. below i have created a ridiculous example of a memory leak which will allow us to go step by step through the process of identifying the problem. i strongly advise you download the code and follow through this process. it is much more likely to be committed to your memory if you actually do this process.
public class main {
public static void main(string[] args) {
tasklist tasklist = new tasklist();
final taskcreator taskcreator = new taskcreator(tasklist);
new thread(new runnable() {
@override
public void run() {
for (int i = 0; i < 100000; i++) {
taskcreator.createtask();
}
}
}).start();
}
}
public class tasklist {
private static deque<task> tasks = new arraydeque<task>();
public void addtask(task task){
tasks.add(task);
tasks.peek().execute();//memory leak!
}
}
in the above very contrived example, the application executes tasks put onto a deque. however when we run this we get an out of memory! what could it possibly be?
to find out we need to use a profiler . a profiler allows us to look at exactly what is going on the vm. there are a number of options available. visualvm (https://visualvm.java.net/download.html) is free and allows basic profiling. for a more complete tool suite there are a number of options but my personal favourite is yourkit . it has an amazing array of tools to help you with diagnosis and analysis. however the principles used are generally the same.
i started running my application locally, then fired up visualvm and selected the process. you can then watch exactly what’s going on in the heap, permgen etc.
you can see on the heap (top right) the tell tail signs of a memory leak. the application sawtooths, which is not a problem per se, but the memory is consistently going up and not returning to a base level. this smells like a memory leak. but how can we tell what’s going on? if we head over to the sampler tab we can get a clear indication of what is sitting on our heap.
those object arrays look a bit odd. but how do we know if that’s the problem? visual vm allows us to take snapshots , like a photograph of the memory at that time. the above screenshot is a snapshot from after the application had only been running for a little bit. the next snapshot a couple of minutes later confirms this:
we can actually compare these directly by selecting both in the menu and selecting compare.
there’s definitely something funky going on with the array of objects. how can we figure out the leak though? by using the profile tab. if i go to profile, and in settings enable “record allocations stack traces” we can then find out where the leak has come from.
by now taking snapshot and showing allocation traces we can see where the object arrays are being instantiated.
looks like there are thousands of task objects holding references to object arrays! but what is holding onto these task items?
if we go back to the “monitor” tab we can create a heap dump. if we double click on the object[] in the heap dump it will show us all instances in the application, and in the bottom right panel we can identify where the reference is.
it looks like tasklist is the culprit! if we take a look at the code we can see what the problem is.
tasks.peek().execute();
we’re never clearing the reference after we’ve finished with it! if we change this to use poll() then the memory leak is fixed.
whilst clearly this is a very contrived example, going through the steps will refresh your memory for if you are asked to explain how you would identify memory leaks in an application. look for memory continuing to increase despite gcs happening, take memory snapshot and compare them to see which objects may be candidates for not being released, and use a heap dump to analyze what is holding references to them.
Published at DZone with permission of Sam Atkinson, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments