How to Read a Large CSV File With Java 8 and Stream API
Scenario: you have to parse a large CSV file (~90MB), practically read the file, and create one java object for each of the lines. What do you do?
Join the DZone community and get the full member experience.
Join For FreeScenario: you have to parse a large CSV file (~90MB), practically read the file, and create one Java object for each of the lines. In real life, the CSV file contains around 380,000 lines.
Assumption: you already know the path of the CSV file before using the code below.
The following code will read the file and create one Java object per line.
private List<YourJavaItem> processInputFile(String inputFilePath) {
List<YourJavaItem> inputList = new ArrayList<YourJavaItem>();
try{
File inputF = new File(inputFilePath);
InputStream inputFS = new FileInputStream(inputF);
BufferedReader br = new BufferedReader(new InputStreamReader(inputFS));
// skip the header of the csv
inputList = br.lines().skip(1).map(mapToItem).collect(Collectors.toList());
br.close();
} catch (FileNotFoundException|IOException e) {
....
}
return inputList ;
}
Some explanation about the above code might be needed:
lines()
: returns a stream object.skip(1)
: skips the first line in the CSV file, which in this case is the header of the file.map(mapToItem)
: calls themapToItem
function for each line in the file.collect(Collectors.toList())
: creates a list containing all the items created bymapToItem
function.
Now, mapToItem
function looks like this:
private Function<String, YourJavaItem> mapToItem = (line) -> {
String[] p = line.split(COMMA);// a CSV has comma separated lines
YourJavaItem item = new YourJavaItem();
item.setItemNumber(p[0]);//<-- this is the first column in the csv file
if (p[3] != null && p[3].trim().length() > 0) {
item.setSomeProeprty(p[3]);
}
//more initialization goes here
return item;
}
Performance Consideration
From the testing I've done, it seems that reading a 90 MB CSV file using the way described above will take around 700 ms when running from inside Eclipse.
It is probably even faster in production.
Not bad. Happy coding!
Opinions expressed by DZone contributors are their own.
Comments