Stream Summary Statistics
Execute multiple operations on a Java Stream at once to avoid repeated traversal. Note that the Stream becomes invalid after the terminal operation.
Join the DZone community and get the full member experience.
Join For FreeIn order to be able to leverage various capabilities of the Java Streams, one shall first understand two general concepts – the stream and the stream pipeline. A Stream in Java is a sequential flow of data. A stream pipeline, on the other hand, represents a series of steps applied to data, a series that ultimately produce a result.
My family and I recently visited the Legoland Resort in Germany – a great place, by the way – and there, among other attractions, we had the chance to observe in detail a sample of the brick-building process. Briefly, everything starts from the granular plastic that is melted, modeled accordingly, assembled, painted, stenciled if needed, and packed up in bags and boxes. All the steps are part of an assembly factory pipeline.
What is worth mentioning is the fact that the next step cannot be done until the previous one has been completed and also that the number of steps is finite. Moreover, at every step, each Lego element is touched to perform the corresponding operation, and then it moves only forward, never backward, so that the next step is done. The same applies to Java streams.
In functional programming, the steps are called stream operations, and they are of three categories – one that starts the job (source), one that ends it and produces the result (terminal), and a couple of intermediate ones in between.
As a last consideration, it’s worth mentioning the intermediate operations have the ability to transform the stream into another one but are never run until the terminal operation runs (they are lazily evaluated). Finally, once the result is produced and the initial scope is achieved, the stream is no longer valid.
Abstract
Having as starting point the fact that in the case of Java Streams, once the terminal stream operation is done, the stream is no longer valid, this article aims to present a way of computing multiple operations at once through only one stream traversal. It is accomplished by leveraging the Java summary statistics objects (in particular IntSummaryStatistics
) that reside since version 1.8.
Proof of Concept
The small project was built especially to showcase the statistics computation uses the following:
- Java 17
- Maven 3.6.3
- JUnit Jupiter Engine v.5.9.3
As a domain, there is one straightforward entity – a parent.
public record Parent(String name, int age) { }
It is modeled by two attributes – the name and its age. While the name is present only for being able to distinguish the parents, the age is the one of interest here.
The purpose is to be able to compute a few age statistics on a set of parents, that is:
- The total sample count
- The ages of the youngest and the oldest parent.
- The age range of the group.
- The average age
- The total number of years the parents accumulate.
The results are encapsulated into a ParentStats
structure and represented as a record as well.
public record ParentStats(long count,
int youngest,
int oldest,
int ageRange,
double averageAge,
long totalYearsOfAge) { }
In order to accomplish this, an interface is defined.
public interface Service {
ParentStats getStats(List<Parent> parents);
}
For now, it has only one method that receives input from a list of Parent
s and provides as output the desired statistics.
Initial Implementation
As the problem is trivial, an initial and imperative implementation of the service might be as below:
public class InitialService implements Service {
@Override
public ParentStats getStats(List<Parent> parents) {
int count = parents.size();
int min = Integer.MAX_VALUE;
int max = 0;
int sum = 0;
for (Parent human : parents) {
int age = human.age();
if (age < min) {
min = age;
}
if (age > max) {
max = age;
}
sum += age;
}
return new ParentStats(count, min, max, max - min, (double) sum/count, sum);
}
}
The code looks clear, but it seems too focused on the how rather than on the what; thus, the problem seems to get lost in the implementation, and the code is hard to read.
As the functional style and streams are already part of every Java developer’s practices, most probably, the next service implementation would be chosen.
public class StreamService implements Service {
@Override
public ParentStats getStats(List<Parent> parents) {
int count = parents.size();
int min = parents.stream()
.mapToInt(Parent::age)
.min()
.orElseThrow(RuntimeException::new);
int max = parents.stream()
.mapToInt(Parent::age)
.max()
.orElseThrow(RuntimeException::new);
int sum = parents.stream()
.mapToInt(Parent::age)
.sum();
return new ParentStats(count, min, max, max - min, (double) sum/count, sum);
}
}
The code is more readable now; the downside though is the stream traversal redundancy for computing all the desired stats – three times in this particular case. As stated at the beginning of the article, once the terminal operation is done – min, max, sum – the stream is no longer valid. It would be convenient to be able to compute the aimed statistics without having to loop the list of parents multiple times.
Summary Statistics Implementation
In Java, there is a series of objects called SummaryStatistics which come in different types – IntSummaryStatistics
, LongSummaryStatistics
, DoubleSummaryStatistics
.
According to the JavaDoc, IntSummaryStatistics
is “a state object for collecting statistics such as count, min, max, sum and average. The class is designed to work with (though does not require) streams”.
It is a good candidate for the initial purpose; thus, the following implementation of the Service seems the preferred one.
public class StatsService implements Service {
@Override
public ParentStats getStats(List<Parent> parents) {
IntSummaryStatistics stats = parents.stream()
.mapToInt(Parent::age)
.summaryStatistics();
return new ParentStats(stats.getCount(),
stats.getMin(),
stats.getMax(),
stats.getMax() - stats.getMin(),
stats.getAverage(),
stats.getSum());
}
}
There is only one stream of parents, the statistics get computed, and the code is way readable this time.
In order to check all three implementations, the following abstract base unit test is used.
abstract class ServiceTest {
private Service service;
private List<Parent> mothers;
private List<Parent> fathers;
private List<Parent> parents;
protected abstract Service setupService();
@BeforeEach
void setup() {
service = setupService();
mothers = IntStream.rangeClosed(1, 3)
.mapToObj(i -> new Parent("Mother" + i, i + 30))
.collect(Collectors.toList());
fathers = IntStream.rangeClosed(4, 6)
.mapToObj(i -> new Parent("Father" + i, i + 30))
.collect(Collectors.toList());
parents = new ArrayList<>(mothers);
parents.addAll(fathers);
}
private void assertParentStats(ParentStats stats) {
Assertions.assertNotNull(stats);
Assertions.assertEquals(6, stats.count());
Assertions.assertEquals(31, stats.youngest());
Assertions.assertEquals(36, stats.oldest());
Assertions.assertEquals(5, stats.ageRange());
final int sum = 31 + 32 + 33 + 34 + 35 + 36;
Assertions.assertEquals((double) sum/6, stats.averageAge());
Assertions.assertEquals(sum, stats.totalYearsOfAge());
}
@Test
void getStats() {
final ParentStats stats = service.getStats(parents);
assertParentStats(stats);
}
}
As the stats are computed for all the parents, the mothers, and fathers are first put together in the same parents list (we will see later why there were two lists in the first place).
The particular unit test for each implementation is trivial – it sets up the service instance.
class StatsServiceTest extends ServiceTest {
@Override
protected Service setupService() {
return new StatsService();
}
}
Combining Statistics
In addition to the already used methods – getMin()
, getMax()
, getCount()
, getSum()
, getAverage()
– IntSummaryStatistics
provides a way to combine the state of another similar object into the current one.
void combine(IntSummaryStatistics other)
As we saw in the above unit test, initially, there are two source lists – mothers and fathers. It would be convenient to be able to directly compute the statistics without first merging them.
In order to accomplish this, the Service is enriched with the following method.
default ParentStats getCombinedStats(List<Parent> mothers, List<Parent> fathers) {
final List<Parent> parents = new ArrayList<>(mothers);
parents.addAll(fathers);
return getStats(parents);
}
The first two implementations – InitialService
and StreamService
– are not of interest here; thus, a default implementation was provided for convenience. It is overwritten only by the StatsService
.
@Override
public ParentStats getCombinedStats(List<Parent> mothers, List<Parent> fathers) {
Collector<Parent, ?, IntSummaryStatistics> collector = Collectors.summarizingInt(Parent::age);
IntSummaryStatistics stats = mothers.stream().collect(collector);
stats.combine(fathers.stream().collect(collector));
return new ParentStats(stats.getCount(),
stats.getMin(),
stats.getMax(),
stats.getMax() - stats.getMin(),
stats.getAverage(),
stats.getSum());
}
By leveraging the combine()
method, the statistics can be merged directly as different source lists are available.
The corresponding unit test is straightforward.
@Test
void getCombinedStats() {
final ParentStats stats = service.getCombinedStats(mothers, fathers);
assertParentStats(stats);
}
Having seen the above Collector
, the initial getStats()
method may be written even more briefly.
@Override
public ParentStats getStats(List<Parent> parents) {
IntSummaryStatistics stats = parents.stream()
.collect(Collectors.summarizingInt(Parent::age));
return new ParentStats(stats.getCount(),
stats.getMin(),
stats.getMax(),
stats.getMax() - stats.getMin(),
stats.getAverage(),
stats.getSum());
}
Conclusion
Depending on the used data types, IntSummaryStatistics
, LongSummaryStatistics
or DoubleSummaryStatistics
are convenient out-of-the-box structures that one can use to quickly compute simple statistics and focus on writing more readable and maintainable code.
Published at DZone with permission of Horatiu Dan. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments