Java 8 Steaming (groupBy Example)

Learn more about Java 8 Streams and how they can be applied to aggregate functions.

Bhupender G

May. 07, 19 · Tutorial

Likes (9)

Comment

Save

28.7K Views

In this article, I would like to share few scenarios where we will see how the Java 8 Stream and related functions can be useful for reading a file and applying some aggregate functions, which we generally do in our day-to-day SQL/queries on the database side.

So, I am taking an example of employee CSV files, which is shown below:

EMP_NAME,EMP_ID,DEPT_ID,SAL
EMP2,1,10,300
EMP3,2,20,50
EMP4,3,20,25
EMP5,4,11,100
EMP6,5,10,200
EMP7,6,10,200
EMP1,7,10,500
EMP8,8,11,101

I would like to start with a use case that we execute on the database side, generally.

Scenario A

Assuming the above records are present in the Employee table, we get the total salary of employees at the departmental level.

Hint: GROUP BY enables you to use aggregate functions on groups of data returned from a query, as shown below.

SQL :: SELECT DEPT_ID, SUM(SAL) FROM EMP_TABLE GROUP BY DEPT_ID;

Now, we can see that the only difference is that we have the records in our CSV files instead of the database rows.

So, what we need to do is, basically, take Java objects representing records from the CSV file. Each line in the file can be assumed to have the record that we used in the database.

So, we will start with our resource or plain object, which isEmployee.

public class Employee {

private String empName;
private long empID;
private long deptID;
private long salary;

// have your setters and getters as well

}

Now, let's have our test program (main function) perform an aggregate function.

Step 1: CSV File

Read the CSV file and map it to the Java object. This method will convert all rows to a List of employees

private static List<Employee> mapCSV(String empCSVFilePath) {
List<Employee> empList = new ArrayList<Employee>();

try {
    File inputF = new File(empCSVFilePath);
    InputStream inputFS = new FileInputStream(inputF);
    BufferedReader br = new BufferedReader(new InputStreamReader(inputFS));

// Skip the header since its just coloumn name in table and in CSV file but not the data.
    empList = br.lines().skip(1).map(csv2EmpObj).collect(Collectors.toList());
    br.close();
} catch (IOException e) {
    System.err.println(e.getMessage());
}

return empList;
}

Step 2: Mapper Function

Write the mapper function, csv2EmpObj, to bind all of your rows to a plain Java object:

private static Function<String, Employee> csv2EmpObj = (line) -> {
    String[] record = line.split(",");// This can be delimiter which
    Employee employee = new Employee();
    if (record[0] != null && record[0].trim().length() > 0) {
        employee.setEmpName(record[0]);
    }
    if (record[1] != null && record[1].trim().length() > 0) {
        employee.setEmpID(Long.valueOf(record[1]));
    }
    if (record[2] != null && record[2].trim().length() > 0) {
        employee.setDeptID(Long.valueOf(record[2]));
    }
    if (record[3] != null && record[3].trim().length() > 0) {
        employee.setSalary(Long.valueOf(record[3]));
    }
    return employee;
};

Step 3: Apply Java 8 Stream Features

Keep in mind that the requirement was to get the sum of the salary using groupBy department (dept_id).

So, we will first have the streams of the collection (list of employees), which we had from our earlier step, and then, we apply the following steaming API/methods.

First, we have the collect method. This is basically the result based on the Collector used (input, functions)

When determining the Collector to use, your input/accumulator will be dep_id and the reduction/result would be the sum of the salary.

So, the function will look like below:

public static Map<Long, Long> totalSalaryForDepartment(List<Employee> empList) {
    return empList.stream()
    .collect(Collectors.groupingBy(
    Employee::getDeptID,
    Collectors.summingLong(Employee::getSalary)));
}

Step 4: Print It

Your main function of the test program will look like this:

public static void main(String[] args) {

    String empFileName = "C:\\tmp\\Employee.csv";
    List<Employee> listOFEmployees = process(empFileName); // Step 1 & Step 2.
    Map<Long, Long> groupByDeptID = totalSalaryForDepartment(listOFEmployees); // Step 3.
    groupByDeptID.entrySet().stream().forEach(System.out::println); // Step 4.

  //

System.out.println("Employee with max salary : "+mostExpensiveResource(listOFEmployees));
}

Scenario B

Find the employee with the maximum salary:

public static Employee mostExpensiveResource(List<Employee> employees) {
    return employees.stream()
    .collect(Collectors.<Employee>maxBy(
    Comparator.comparing(Employee::getSalary))).get();
}

// you can tweak to find minimum one as well and  there are other scenarios like filtering, average etc.

The best practices have not been followed; this is just an example to showcase day-to-day use cases of an aggregate function through the Java 8 streaming API.

Happy streaming!

Database Java (programming language)

Opinions expressed by DZone contributors are their own.

Related

Trending