Writing Large JSON Files With Jackson
Let's take a look at how to easily write a large amount of JSON data to a file using everyone's favorite JSON library, Jackson! Click here to learn more.
Join the DZone community and get the full member experience.
Join For FreeSometimes, you need to export a lot of data to a JSON file. Maybe, you need to export all data to JSON or the GDPR "Right to portability," where you effectively need to do the same.
And, as with any big dataset, you can't just fit it all in memory and write it to a file. It takes a while, it reads a lot of entries from the database, and you need to be careful not to make such exports overload the entire system or run out of memory.
Luckily, it's fairly straightforward to do that with the help of Jackson's SequenceWriter
and optional piped streams. Here's what it would look like:
private ObjectMapper jsonMapper = new ObjectMapper();
private ExecutorService executorService = Executors.newFixedThreadPool(5);
@Async
public ListenableFuture<Boolean> export(UUID customerId) {
try (PipedInputStream in = new PipedInputStream();
PipedOutputStream pipedOut = new PipedOutputStream(in);
GZIPOutputStream out = new GZIPOutputStream(pipedOut)) {
Stopwatch stopwatch = Stopwatch.createStarted();
ObjectWriter writer = jsonMapper.writer().withDefaultPrettyPrinter();
try(SequenceWriter sequenceWriter = writer.writeValues(out)) {
sequenceWriter.init(true);
Future<?> storageFuture = executorService.submit(() ->
storageProvider.storeFile(getFilePath(customerId), in));
int batchCounter = 0;
while (true) {
List<Record> batch = readDatabaseBatch(batchCounter++);
for (Record record : batch) {
sequenceWriter.write(entry);
}
}
// wait for storing to complete
storageFuture.get();
}
logger.info("Exporting took {} seconds", stopwatch.stop().elapsed(TimeUnit.SECONDS));
return AsyncResult.forValue(true);
} catch (Exception ex) {
logger.error("Failed to export data", ex);
return AsyncResult.forValue(false);
}
}
The above code does a few things:
- Uses a
SequenceWriter
to continuously write records. It is initialized with anOutputStream
, which contains everything written. This could be a simpleFileOutputStream
or a piped stream, as discussed below. Note that the naming here is a bit misleading —writeValues(out)
sounds like you are instructing the writer to write something now; instead, it configures it to use a particular stream later. - The
SequenceWriter
is initialized withtrue
, which means "wrap in array." You are writing many identical records, so they should represent an array in the final JSON. - Uses
PipedOutputStream
andPipedInputStream
to link theSequenceWriter
to a anInputStream,
which is then passed to a storage service. If we were explicitly working with files, there would be no need for that — simply passing aFileOutputStream
would do. However, you may want to store the file differently, e.g. in Amazon S3, and there theputObject
call requires anInputStream
from which reads data and stores it in S3. So, in effect, you are writing to anOutputStream
that is directly written to anInputStream
, which, when attempted to be read from, gets everything written to anotherOutputStream.
- Storing the file is invoked in a separate thread so that writing to the file does not block the current thread, whose purpose is to read from the database. Again, this would not be needed if the simple
FileOutputStream
was used. - The whole method is marked as
@Async
(Spring) so that it doesn't block the execution. It gets invoked and finishes when ready, using an internal Spring executor service with a limited thread pool. - The database batch reading code is not shown here, as it varies depending on the database. The point is that you should fetch your data in batches, rather than SELECT * FROM X.
- The
OutputStream
is wrapped in aGZIPOutputStrea
, as text files like JSON, with repetitive elements, benefit significantly from compression.
The main work is done by Jackson's SequenceWriter
, and the (kind of obvious) point to take home is — don't assume your data will fit in memory. It almost never does, so do everything in batches and incremental writes. Hope this helps!
Published at DZone with permission of Bozhidar Bozhanov, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments