Java and Branch Prediction
Get your microprocessing performance hat on because we're going to talk about branch prediction in Java and how it affects your code.
Join the DZone community and get the full member experience.
Join For FreeA branch prediction unit is a device that is part of microprocessors that have a pipeline architecture that predicts a conditional jump in an executable program. Branch prediction allows you to reduce the pipeline downtime by preloading and executing instructions that must be executed after the conditional branch instruction is executed. Forecasting branching plays a critical role (the accuracy of the prediction of transitions in modern processors exceeds 90%) because it allows you to optimally use the computing resources of the processor.
Let's consider this situation: There are two arrays with the same length. The first array is sorted and the other array is unsorted. Processing of one of them is faster than the other. As you could guess from the topic, processing the sorted array is faster than the unsorted array. In fact, branch prediction for the unsorted array will fail.
To understand branch prediction, one must first understand instruction pipelining: A processor with an implementation of branch prediction that usually makes correct predictions can minimize the performance penalty from branching. However, if branches are predicted poorly, it may create more work for the processor, such as flushing the incorrect code path from the pipeline that has begun execution before resuming execution at the correct location.
In case the of the sorted array, the branch predictor always takes the true branch, since historically, all its predictions are correct. But with the random unsorted array, the prediction will need to discard the partially executed instructions and start over with the correct branch.
It can be shown using JMH:
package samples.branch_prediction;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.TimeUnit;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(5)
@State(Scope.Benchmark)
public class BranchPredictionBenchmark {
private static final int COUNT = 1024 * 1024;
private byte[] sorted;
private byte[] unsorted;
@Setup
public void setup() {
sorted = new byte[COUNT];
unsorted = new byte[COUNT];
Random random = new Random(1234);
random.nextBytes(sorted);
random.nextBytes(unsorted);
Arrays.sort(sorted);
}
@Benchmark
@OperationsPerInvocation(COUNT)
public void sorted(Blackhole bh1, Blackhole bh2) {
for (byte v : sorted) {
if (v > 0) {
bh1.consume(v);
} else {
bh2.consume(v);
}
}
}
@Benchmark
@OperationsPerInvocation(COUNT)
public void unsorted(Blackhole bh1, Blackhole bh2) {
for (byte v : unsorted) {
if (v > 0) {
bh1.consume(v);
} else {
bh2.consume(v);
}
}
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(".*" + BranchPredictionBenchmark.class.getSimpleName() + ".*")
.build();
new Runner(opt).run();
}
}
The output of benchmark is next:
# Run complete. Total time: 00:01:48
Benchmark Mode Cnt Score Error Units
BranchPredictionBenchmark.sorted avgt 25 4.289 ± 0.754 ns/op
BranchPredictionBenchmark.unsorted avgt 25 9.742 ± 0.466 ns/op
For this benchmark, we used AverageTime BenchmarkMode (see 14 line of the code snapshot). It means that the lower score value is better, and as you see, BranchPredictionBenchmark.sorted is more than two times faster. The full listing of source code is on my GitHub repository.
Opinions expressed by DZone contributors are their own.
Comments