FileInputStream / FileOutputStream Considered Harmful
Your FileInputStream and FileOutputstreams might introduce long GC pauses. Fortunately, Java 7's 'files.new' additions can help with your I/O efficiency.
Join the DZone community and get the full member experience.
Join For FreeOk, so say you have been given an array of bytes that you have to write to a file. You’re a Java developer. You have been writing Java code for years. You got this:
public void writeToFile(String fileName, byte[] content) throws IOException {
try (FileOutputStream os = new FileOutputStream(fileName)) {
os.write(content);
}
}
Can you spot the bug?
What about this method to read the files back again?
public byte[] readFromFile(String fileName) throws IOException {
byte[] buf = new byte[8192];
try (FileInputStream is = new FileInputStream(fileName)) {
int len = is.read(buf);
if (len < buf.length) {
return Arrays.copyOf(buf, len);
}
ByteArrayOutputStream os = new ByteArrayOutputStream(16384);
while (len != -1) {
os.write(buf, 0, len);
len = is.read(buf);
}
return os.toByteArray();
}
}
Spotted the bug yet?
Of course, the bug is in the title of this post! We are using FileInputStream
and FileOutputStream
.
So what, exactly, is wrong with that?
Have you ever noticed that FileInputStream
overrides finalize()
? Same goes for FileOutputStream
, by the way.
Every time you create either a FileInputStream
or a FileOutputStream
, you are creating an object. Even if you close it correctly and promptly, it will be put into a special category that only gets cleaned up when the garbage collector does a full GC. Sadly, due to backwards compatibility constraints, this is not something that can be fixed in the JDK anytime soon as there could be some code out there where somebody has extended FileInputStream
/ FileOutputStream
and is relying on those finalize()
methods to ensure the call to close()
.
Now, that is not an issue for short-lived programs… or for programs that do very little file I/O… but for programs that create a lot of files, it can cause issues. For example, Hadoop found “long GC pauses were devoted to process high number of final references,” resulting from the creation of lots of FileInputStream
instances.
The solution (at least if you are using Java 7 or newer) is not too hard — apart from retraining your muscle memory — just switch to Files.newInputStream(...)
and Files.newOutputStream(...)
Our code becomes:
public void writeToFile(String fileName, byte[] content) throws IOException {
try (OutputStream os = Files.newOutputStream(Paths.get(fileName))) {
os.write(content);
}
}
public byte[] readFromFile(String fileName) throws IOException {
byte[] buf = new byte[8192];
try (InputStream is = Files.newInputStream(Paths.get(fileName))) {
int len = is.read(buf);
if (len < buf.length) {
return Arrays.copyOf(buf, len);
}
ByteArrayOutputStream os = new ByteArrayOutputStream(16384);
while (len != -1) {
os.write(buf, 0, len);
len = is.read(buf);
}
return os.toByteArray();
}
}
A seemingly small change that could reduce GC pauses if you do a lot of file I/O!
Oh and yeah, we’re making this change in Jenkins.
Published at DZone with permission of Stephen Connolly. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments