Should I Parallelize Java 8 Streams?
What do we need to consider before parallelizing Java streams?
Join the DZone community and get the full member experience.
Join For FreeIn Java 8, the streams API is easy to iterate over collections, and it's easy to parallelize a stream by calling the parallelStream()
method. But should we be using parallelStream()
wherever we can? What are the considerations?
You may also like: Think Twice Before Using Java 8 Parallel Streams
Look at the following ParallelStreamTester
class to generate collections of different sizes for the purpose of testing parallel streams performance against a sequential stream.
public class ParallelStreamTester {
static int COLLECTION_SIZE = 100000;
private static Collection <Person> getPersonCollection (){
List <Person> personList = new ArrayList <Person> ();
String [] names = {"David", "Marry", "Satya", "Matt", "Patrick", "Bill", "Mike", "Jake", "Amber", "Dianne"};
int [] age = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100};
String [] states = {"NY", "MA", "MO", "CA", "TX", "MN", "WA", "PE", "NE", "NH", "OH"};
for (int i=0; i< COLLECTION_SIZE; i++){
personList.add(new Person (names [getRandom()], age[getRandom()], states [getRandom()]));
}
System.out.println ("Generated the collection \n");
return personList;
}
// more code
Now, consider the following code snippet to test the performance of the sequential. Get all of the Person
s who are older than 50 from “NY” or "TX" with names that start with “M”.
private static void sequentialStreamPerformance (Collection <Person> persons){
long t1 = System.currentTimeMillis(), count;
count = persons.stream().
filter(x-> (x.getState().equals("NY") || x.getState().equals("TX")))
.filter(x-> x.getAge() > 50)
.filter(x-> x.getName().startsWith("M"))
.count();
long t2 = System.currentTimeMillis();
System.out.println("Count = " + count + " Normal Stream Takes " + (t2-t1) + " ms\n");
}
And for parallel stream performance:
private static void parallelStreamPerformance (Collection <Person> persons){
long t1 = System.currentTimeMillis(), count;
count = persons.parallelStream().
filter(x-> (x.getState().equals("NY") || x.getState().equals("TX")))
.filter(x-> x.getAge() > 50)
.filter(x-> x.getName().startsWith("M"))
.count();
long t2 = System.currentTimeMillis();
System.out.println("Count = " + count + " Parallel Stream takes " + (t2-t1) + " ms\n");
}
Now, let's run some tests by varying the value of COLLECTION_SIZE. Start with a value of 100 and steadily increase the value up to 10000000 each time, taking note of the time taken. Here is my observed result:
- Sequential streams outperformed parallel streams when the number of elements in the collection was less than 100,000.
- Parallel streams performed significantly better than sequential streams when the number of elements was more than 100,000.
What about synchronization problems when using parallel Streams?
If a shared resource is used by the predicate, and functions are used in the process, we need to make sure the access is controlled and thread-safe.
A parallel stream has a much higher overhead compared to a sequential stream. Coordinating the threads takes a significant amount of time. Sequential streams sound like the default choice unless there is a performance problem to be addressed.
The code used in this POC can be found on GitHub.
Further Reading
Think Twice Before Using Java 8 Parallel Streams
What's Wrong in Java 8, Part III: Streams and Parallel Streams
If you enjoyed this article and want to learn more about Java Streams, check out this collection of tutorials and articles on all things Java Streams.
Opinions expressed by DZone contributors are their own.
Comments