Performance Evaluation of Java ArrayLists

Let's do a performance breakdown of the ArrayList add operation to see how to get the most throughout out of your ArrayLists.

Viraj Salaka

Malith Jayasinghe

Isuru Perera

Srinath Perera

Oct. 18, 17 · Analysis

Likes (12)

Comment

Save

28.0K Views

In this article, we will look at the performance characteristics of the Java ArrayList (java.util.ArrayList) add operation.

An ArrayList has an initial capacity, which is simply the size of the array used to store the elements in the list. When you create an ArrayList you can specify the initial capacity. For example:

ArrayList<Integer> arrayList = new ArrayList<>(100);

In this case, the initial size of the ArrayList will be 100. As you add elements to an ArrayList, its capacity grows automatically. The growing factor is 1.5. If we do not specify an initial capacity then an ArrayList object will be created containing an initial array of size ten.

ArrayList<Integer> arrayList = new ArrayList<>();

Performance Evaluation

In this section, we will provide details of how we conducted our experiments. The performance tests are done using JMH (Java Microbenchmark Hardness). It is a tool that can be used to implement benchmarks correctly for the applications run top of the JVM. JVM's HotSpot VM has the ability to do optimizations like dead code elimination.

The two main performance metrics considered are

Average latency (measured in nanoseconds)
Throughput (measured in number of operations per millisecond)

The scenario implemented is sequentially adding elements to an ArrayList using the add operation. We run the benchmark by varying the following parameters:

Initial capacity (size) of the ArrayList
Final size of the ArrayList (the maximum number of elements used)

The part of the benchmark code is shown below:

…………….

@Benchmark

public void test_ArrListAdd(Blackhole bh) {

        ArrayList < Integer > m;

        m = new ArrayList < > (initialCapacity);

        for (int j = 0; j < finalSize; j++) {

            m.add(j, 0);

        }

        bh.consume(m);

        count += 1;

    }

    ………….

The following are the parameters specified for JMH, which are fixed for all runs.

Number of warmup iterations: 10
Numbers of iterations: 10
Number of forks: 2
Number of threads: 1
Time unit ns (nanoseconds)
Mode: benchmark mode

The performance test was run on a 4-core machine with 8 GB RAM and JDK used was Oracle 1.8_131

Performance Results

This section presents the performance results. These are the results obtained:

Final size	Initial Size	Average Latency (ns)	Throughput (operations/ms)

9	10	79.53	13045.111
	100	92.945	10471.726
	1000	345.138	2957.518
	10000	2066.279	495.057

99	10	675.109	1506.031
	100	440.578	2244.508
	1000	669.46	1378.31
	10000	2542.431	426.257

999	10	5797.814	171.068
	100	5583.3	170.748
	1000	4145.378	232.636
	10000	5995.691	171.921

9999	10	54751.808	18.212
	100	53713.615	17.492
	1000	50556.323	19.876
	10000	41174.069	24.14

Let's now make a few plots so that we can understand the behavior. The following graph shows how the average latency varies with the initial array size when the final size is 99.

Chart

The following graph shows how the throughput varies with the initial array size when the final size is 99.

The following graph shows how the average latency varies with the initial array size when the final size is 999.

The following graph shows how the throughput varies with the initial array size when the final size is 999.

The following graph shows how the average latency varies with the initial array size when the final size is 9999.

The following graph shows how the throughput varies with the initial array size when the used capacity is 9999.

Discussion

Initial array size > Final size

If the initial array size is greater than the final size, there can be a significant degradation in performance.

For example, when we create an array with an initial capacity of 1000 and use only the first 10 elements, we observed an average latency of 345 ns. On the other hand, when we create an array with an initial capacity of 10 and used the first 10 elements, then we get an average latency of 57 ns.

The main reason for the degradation in performance for the larger initial array size case is the cost of initializing the large array itself. When we profile applications, we noticed array initialization taking a significant amount of processing time.

Initial Array Size < Final Size

If the initial size is smaller than the final size, then there can also be a significant degradation in performance.

For example, when we create an array with an initial capacity of 10 and use the first 1000 elements, then we observed an average latency of 5797 ns. On the other hand, when we create an array with an initial capacity of 1000 and use 1000 elements, then we get an average latency of 4145 ns.

The main reason behind the degradation in performance for the smaller initial array size case is the additional processing needed for copying existing elements into a new array with a large size together. When we profile benchmark code, we noticed array copy operation taking a significant amount of processing time. In addition to this, there may be degradation due to the creation of arrays multiple times and initializing them.

Over Specifying vs. Under Specifying Initial Array Size

We note that the degradation in performance, as a result of over specifying the initial size, is less compared to that of under specifying. Let's now discuss this behavior in a bit more detail. Let's assume that the final size of the array list is 1000. The following figure plots the variation in the throughput with increasing ArrayList size.

We note that the maximum throughput is achieved when the initial array list size is 1000 (i.e. when initial size = final size). Let us now assume that TPS_1 and TPS_2 represent the throughput if the initial array size is 500 and 1500 respectively (refer to the figure above). We note that TPS_2 is significantly higher than TPS_1, which validates our claim regarding the performance differences in over specifying vs. under specifying the initial size.

Conclusion

In this article, we have done an in-depth performance analysis of the Java ArrayList add operation.

It is clear from the from the results (considering the add operation of ArrayList) that if the required maximum capacity of the ArrayList is known, we can get the optimal performance (both average latency and throughput) by specifying the initial capacity to the required capacity.

In doing so, we can get 24% to 34% improvement in average latency and 30% to 50% improvement in throughput. In our future work, we hope to consider benchmarking other operations such as insert, remove, and get in addition to the add operation. In our current tests, although we vary the final size between different tests, for a given test scenario final size is fixed. We hope to extend our work to cater for scenarios where the final size varies. In such cases, the final size can come from probability distribution, which represents (actual) access pattern of users.

If you enjoyed this article and want to learn more about Java Collections, check out this collection of tutorials and articles on all things Java Collections.

Data structure Java (programming language) Evaluation

Opinions expressed by DZone contributors are their own.

Related

Trending