Java 8 Steaming (groupBy Example)
Learn more about Java 8 Streams and how they can be applied to aggregate functions.
Join the DZone community and get the full member experience.
Join For FreeIn this article, I would like to share few scenarios where we will see how the Java 8 Stream and related functions can be useful for reading a file and applying some aggregate functions, which we generally do in our day-to-day SQL/queries on the database side.
So, I am taking an example of employee CSV files, which is shown below:
EMP_NAME,EMP_ID,DEPT_ID,SAL
EMP2,1,10,300
EMP3,2,20,50
EMP4,3,20,25
EMP5,4,11,100
EMP6,5,10,200
EMP7,6,10,200
EMP1,7,10,500
EMP8,8,11,101
I would like to start with a use case that we execute on the database side, generally.
Scenario A
Assuming the above records are present in the Employee table, we get the total salary of employees at the departmental level.
Hint: GROUP BY
enables you to use aggregate functions on groups of data returned from a query, as shown below.
SQL :: SELECT DEPT_ID, SUM(SAL) FROM EMP_TABLE GROUP BY DEPT_ID;
Now, we can see that the only difference is that we have the records in our CSV files instead of the database rows.
So, what we need to do is, basically, take Java objects representing records from the CSV file. Each line in the file can be assumed to have the record that we used in the database.
So, we will start with our resource or plain object, which isEmployee
.
public class Employee {
private String empName;
private long empID;
private long deptID;
private long salary;
// have your setters and getters as well
}
Now, let's have our test program (main function) perform an aggregate function.
Step 1: CSV File
Read the CSV file and map it to the Java object. This method will convert all rows to a List
of employees
private static List<Employee> mapCSV(String empCSVFilePath) {
List<Employee> empList = new ArrayList<Employee>();
try {
File inputF = new File(empCSVFilePath);
InputStream inputFS = new FileInputStream(inputF);
BufferedReader br = new BufferedReader(new InputStreamReader(inputFS));
// Skip the header since its just coloumn name in table and in CSV file but not the data.
empList = br.lines().skip(1).map(csv2EmpObj).collect(Collectors.toList());
br.close();
} catch (IOException e) {
System.err.println(e.getMessage());
}
return empList;
}
Step 2: Mapper Function
Write the mapper function, csv2EmpObj
, to bind all of your rows to a plain Java object:
private static Function<String, Employee> csv2EmpObj = (line) -> {
String[] record = line.split(",");// This can be delimiter which
Employee employee = new Employee();
if (record[0] != null && record[0].trim().length() > 0) {
employee.setEmpName(record[0]);
}
if (record[1] != null && record[1].trim().length() > 0) {
employee.setEmpID(Long.valueOf(record[1]));
}
if (record[2] != null && record[2].trim().length() > 0) {
employee.setDeptID(Long.valueOf(record[2]));
}
if (record[3] != null && record[3].trim().length() > 0) {
employee.setSalary(Long.valueOf(record[3]));
}
return employee;
};
Step 3: Apply Java 8 Stream Features
Keep in mind that the requirement was to get the sum of the salary using groupBy
department (dept_id
).
So, we will first have the streams of the collection (list of employees), which we had from our earlier step, and then, we apply the following steaming API/methods.
First, we have the collect
method. This is basically the result based on the Collector
used (input, functions)
When determining the Collector
to use, your input/accumulator will be dep_id
and the reduction/result would be the sum of the salary.
So, the function will look like below:
public static Map<Long, Long> totalSalaryForDepartment(List<Employee> empList) {
return empList.stream()
.collect(Collectors.groupingBy(
Employee::getDeptID,
Collectors.summingLong(Employee::getSalary)));
}
Step 4: Print It
Your main function of the test program will look like this:
public static void main(String[] args) {
String empFileName = "C:\\tmp\\Employee.csv";
List<Employee> listOFEmployees = process(empFileName); // Step 1 & Step 2.
Map<Long, Long> groupByDeptID = totalSalaryForDepartment(listOFEmployees); // Step 3.
groupByDeptID.entrySet().stream().forEach(System.out::println); // Step 4.
//
System.out.println("Employee with max salary : "+mostExpensiveResource(listOFEmployees));
}
Scenario B
Find the employee with the maximum salary:
public static Employee mostExpensiveResource(List<Employee> employees) {
return employees.stream()
.collect(Collectors.<Employee>maxBy(
Comparator.comparing(Employee::getSalary))).get();
}
// you can tweak to find minimum one as well and there are other scenarios like filtering, average etc.
The best practices have not been followed; this is just an example to showcase day-to-day use cases of an aggregate function through the Java 8 streaming API.
Happy streaming!
Opinions expressed by DZone contributors are their own.
Comments