How to Rasterize PDFs in Java
This article discusses the advantages and disadvantages of rasterizing PDF documents, and it provides a free API solution to help convert vector PDFs to raster format.
Join the DZone community and get the full member experience.
Join For FreeIncorporating a method for converting vector PDFs to raster PDFs in any file upload/download application is a great way to expand its utility. This will make it possible for users to share or download smaller and more secure versions of their PDFs on command — a process that particularly benefits signed contracts and invoices, confidential reports, and other such sensitive materials.
Further down the page, I've provided a free-to-use API solution to help make this conversion in Java. Before we proceed, however, let's first understand the nature of the operation.
What Is PDF Rasterization?
At a high level, rasterization refers to the process of converting any two-dimensional digital content to a pixel-based image display, and it can be used to describe the process of scanning physical documents into a digital format. PDF rasterization involves replacing PDF vector data (meaning data created from a computer-aided design program with individual lines, curves, etc., charted on a graph) and PDF text data (plain text objects) with a pixel-based (dots-per-inch) version of that content. This process differs from PDF to PNG, PDF to JPG, and other common PDF-to-image conversions by rendering a bitmap image within a new PDF file.
What Are the Pros and Cons of Rasterizing PDFs?
There are a few pros and cons to rasterizing a PDF, and making this conversion depends entirely on the use case. For example, one downside of rasterizing PDF content is the drop in quality that we’ll notice when we zoom in on (i.e., increase the size of) our document contents, which makes this operation a poor choice for documents displaying things like architectural layouts or any other fine-detail, interaction-oriented content. Vector images retain quality in closeup view because they don’t rely on static pixel displays to render their visuals, whereas raster images lose quality and appear “pixelated” when they're enlarged in much the same way any bitmap image does. Additionally, once our content is rasterized, we’ll lose the ability to convert that content directly back into its original format; we’ll have to rely on OCR (optical character recognition) solutions to extract text components instead, and we won’t be able to access our original image files, hyperlinks, or other multimedia components.
On the other hand, as a direct consequence of removing text, links, images, and other individual components from the file, rasterization will increase the security of our PDF document considerably. It won’t be possible for unsolicited third parties to make changes to our original content in PDF editing applications because there won’t be any real text, image, or hyperlink information for them to access. The only changes that can be made are regular image editing operations such as cropping, resizing, filtering, adjusting brightness/colors, and so on. Further, raster formatting will increase the compatibility range of our document by making it viewable outside of PDF editing applications, and — particularly in the case of extremely large PDF files — it’ll greatly reduce the document's overall file size, which will make our PDF much less costly to store and share.
Demonstration
In the remainder of this article, I’ll briefly demonstrate a free API that we can use to make vector PDF to raster PDF conversions at scale in our applications. This API will process all document data in memory and release that data upon completion of the conversion to ensure document security.
We can easily structure our PDF Rasterization API call using the ready-to-run Java code examples provided further down the page, and we can authorize our request with a free-tier API key to make up to 800 conversions per month.
Our first step is to install the Java SDK. We can install with Maven by first adding a reference to the repository in pom.xml:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
And then adding a reference to the dependency in pom.xml:
<dependencies>
<dependency>
<groupId>com.github.Cloudmersive</groupId>
<artifactId>Cloudmersive.APIClient.Java</artifactId>
<version>v4.25</version>
</dependency>
</dependencies>
We can now add the imports to the top of our file and call the function directly after. We can include our API key in the Apikey.setApiKey
line:
// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.EditPdfApi;
ApiClient defaultClient = Configuration.getDefaultApiClient();
// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");
EditPdfApi apiInstance = new EditPdfApi();
File inputFile = new File("/path/to/inputfile"); // File | Input file to perform the operation on.
try {
byte[] result = apiInstance.editPdfRasterize(inputFile);
System.out.println(result);
} catch (ApiException e) {
System.err.println("Exception when calling EditPdfApi#editPdfRasterize");
e.printStackTrace();
}
When the operation is complete, we’ll receive the encoding for our new file, and we can write that to a new PDF document.
Opinions expressed by DZone contributors are their own.
Comments