Trick the JVM for Maximum Performance With Megamorphic Call Sites
Let's see how we can trick the JVM into getting maximum performance even with megamorphic call sites by using old-fashioned or strange-looking code.
Join the DZone community and get the full member experience.
Join For FreeThe JVM does great stuff to let our Java applications run the fastest way possible. Most of the time, you don't have to care about the internals, but if you really need to get the best performance, you need to take into account how JVM works. It's not the goal of this article to give a detailed introduction to this topic but merely to show how our design can impact performance and how you can work around some issues which arise.
An important aspect of performance optimization is inlining, i.e. the JIT replaces a method call with the code that will be executed by the method. For that to work, the method must be short, and it must be clear what the method is called. This is unfortunately not trivial as basically all methods can be overloaded, so the JVM does some run-time analysis and then makes some predictions. It basically distinguishes between monomorphic call sites, which always call the same method; bimorphic ones, which are optimized for two different targets; and call sites calling with more than two targets, which are called megamorphic, where fewer optimizations are applied. This analysis and the resulting optimization are done for each call site individually.
For our analysis, we use a simple example where we need to execute different calculations.
The first class, SimpleCalc, is implemented without overriding, so this should be the fastest implementation possible and will act as the baseline for the other ones.
class SimpleCalc {
int calc(int n) {
return n + 1;
}
}
All other classes implement the common interface Calc; the classes Calc1-4 are the different operations our application should support. You can find the full source in the gist.
If we run the benchmark using JMH, we get the following results:
Benchmark Mode Cnt Score Units
CallSitePerformance.RunJmh.test0Simple thrpt 5 65395895.716 ops/s
CallSitePerformance.RunJmh.test1Monomorphic thrpt 5 65828331.877 ops/s
CallSitePerformance.RunJmh.test2Bimorphic thrpt 5 56950468.449 ops/s
CallSitePerformance.RunJmh.test3Megamorphic3 thrpt 5 26138713.308 ops/s
CallSitePerformance.RunJmh.test4Megamorphic4 thrpt 5 27125605.963 ops/s
CallSitePerformance.RunJmh.test5MegamorphicEnum thrpt 5 59799107.787 ops/s
CallSitePerformance.RunJmh.test6MegamorphicInstance thrpt 5 59339463.418 ops/s
- Test0Simple: directly calls instance SimpleCalc.
- Test1Monomorphic: only calls instances of Calc1; the call site is therefore monomorphic and as fast as test0Simple.
- Test2Bimorphic: calls instances of both Calc1 and Calc2; the call site is, therefore, bimorphic and about 10% slower than test1Monomorphic.
- Test3Megamorphic3/test4Megamorphic4: call instances of 3 or 4 different classes and are therefore megamorphic — they both have the same performance, but you can see a drop of performance of about 60% compared to test1Monomorphic!
This observation rises the question of whether you can work around this limitation seen for megamorphic call sites — yes, you can; the examples show you two possibilities:
- Class EnumCalc uses an enumeration to handle the different calculation modes, basically the way you would implement such functionality before OOP.
public int calc(int n) {
switch (mode) {
case Calc1:
return n + 1;
case Calc2:
return n + 2;
...
- Class InstanceCalc takes a class implementing Calc as an argument and then manually provides different call sites for each class:
public int calc(int n) {
if (calc instanceof Calc1) {
return calc.calc(n);
} else if (calc instanceof Calc2) {
return calc.calc(n);
} else if ...
Where this code may look strange at first sight, it makes sense if you recall that the JVM does the optimization for each call site individually, so the bunch of if statements allow it to treat each call site as monomorphic, resulting in a good performance. If you look at the results, you can see that both test5MegamorphicEnum and test6MegamorphicInstance are even faster than the bimorphic case — mission accomplished!
Please keep in mind that this kind of optimization will only be applicable to a few high-performance classes. In all other cases, it still pays off to write clear and easy-to-understand OOP code.
Opinions expressed by DZone contributors are their own.
Comments