Reading an HTML File, Parsing It and Converting It to a PDF File With the Pdfbox Library

In this article, we will read an HTML file from a specified folder and replace variables with their actual values.

Erkin Karanlık

Nov. 22, 23 · Tutorial

Likes (5)

Comment

Save

11.5K Views

In this article, we will ensure that the HTML file we put in a folder we specify is read and the variables in its content are parsed and replaced with their real values. Then, I modified the HTML file with the "openhtmltopdf-pdfbox" library. We will cover converting it to a PDF file.

First, we will read the HTML file under a folder we have determined, parse it, and pass our own dynamic values to the relevant variables in the HTML. We will convert the HTML file to PDF file using the "openhtmltopdf-pdfbox" library in its latest updated form.

I hope it will be a reference for those who need it on this subject. You can easily do the conversion in your Java projects. You can see an example project below.

First, we will create a new input folder where we will read our input HTML file and an output folder where we will write the PDF file.

We can put the HTML file under the input folder. We define a key value to be replaced in the HTML file. This key value is given as #NAME# as an example. Optionally, you can replace the key value you want here in Java with an externally sent value.

     Plain Text 
   
   input folder :  \ConvertHtmlToPDF\input

output folder:  \ConvertHtmlToPDF\output

     HTML 
   
 
 
   <?xml version='1.0' encoding='UTF-8' ?>
<!DOCTYPE html>
<html lang="tr">
  <head>
    <meta data-fr-http-equiv="Content-Type" content="text/html; charset=UTF-8">
    </meta>
    <title>Convert to Html to Pdf</title>
    <style type="text/css">
      body {
        font-family: "Times New Roman", Times, serif;
        font-size: 40px;
      }
    </style>
  </head>
  <body topmargin="0" leftmargin="0" rightmargin="0" bottommargin="0">
    <table width="700" border="0" cellspacing="0" cellpadding="0" style="background-color: #1859AB;
                                 
                                 color: white;
                                 font-size: 14px;
                                 border-radius: 1px;
                                 line-height: 1em; height: 30px;">
      <tbody>
        <tr>
          <td>
            <strong style="color:#F8CD00;">   Hello </strong>#NAME#
          </td>
        </tr>
      </tbody>
    </table>
  </body>
</html> 
  

Creating a New Project

We are creating a new spring project. I am using Intellj Idea.

Controller

To replace a key with a value in HTML, we will send the value value from outside. We will write a rest service for this.

We create the "ConvertHtmlToPdfController.java" class under the Controller folder. We create a get method called "convertHtmlToPdf " within the Controller class. We can pass the value to this Method dynamically as follows.

     Java 
   
 
 
   package com.works.controller;

import com.works.service.ConvertHtmlToPdfService;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
@RequestMapping("")
public class ConvertHtmlToPdfController {

    private final ConvertHtmlToPdfService convertHtmlToPdfService;

    public ConvertHtmlToPdfController(ConvertHtmlToPdfService convertHtmlToPdfService) {
        this.convertHtmlToPdfService = convertHtmlToPdfService;
    }

    @GetMapping("/convertHtmlToPdf/{variableValue}")
    public ResponseEntity<String> convertHtmlToPdf(@PathVariable @RequestBody String variableValue) {
        try {
            return ResponseEntity.ok(convertHtmlToPdfService.convertHtmlToPdf(variableValue));
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

}
 
  

Service

     Java 
   
 
 
   package com.works.service.impl;

import com.works.service.ConvertHtmlToPdfService;
import com.works.util.ConvertHtmlToPdfUtil;
import io.micrometer.common.util.StringUtils;
import org.springframework.stereotype.Service;

@Service("convertHtmlToPdfService")
public class ConvertHtmlToPdfServiceImpl implements ConvertHtmlToPdfService {

    private String setVariableValue(String htmlContent, String key, String value) {

        if (StringUtils.isNotEmpty(value)) {
            htmlContent = htmlContent.replaceAll("#" + key + "#", value);
        } else {
            htmlContent = htmlContent.replaceAll("#" + key + "#", "");
        }

        return htmlContent;
    }

    @Override
    public String convertHtmlToPdf(String variableValue) throws Exception {
        String inputFile = "/convertHtmlToPDF/input/input.html";
        String outputFile = "/convertHtmlToPDF/output/output.pdf";
        String fontFile = "/convertHtmlToPDF/input/times.ttf";

        try {
            String htmlContent = ConvertHtmlToPdfUtil.readFileAsString(inputFile);
            htmlContent = setVariableValue(htmlContent, "NAME", variableValue);
            ConvertHtmlToPdfUtil.htmlConvertToPdf(htmlContent, outputFile, fontFile);

        } catch (Exception e) {
            throw new Exception("convertHtmlToPdf - An error was received in the service : ", e);
        }
        return "success";
    }
}
 
  

ConvertHtmlToPdfService.java service contains the method called convertHtmlToPdf. The convertHtmlToPdf method takes string variableValue input.

In the convertHtmlToPdf service method;

The "inputFile" variable is defined to read the html file under the input folder. We can give this variable the URL of the input html file we will read.

The "outputFile" variable is defined to assign the pdf file to the output folder. We can give the output file folder url to this variable.

You can also read the font file from outside. You can get this from under the input folder. We can also assign the URL where the font file is located to the "fontFile" variable.

In the above code line, the URL of the folder containing the input is given to the "ConvertHtmlToPdfUtil.readFileAsString" method to read the HTML file in the input folder.

String htmlContent = ConvertHtmlToPdfUtil.readFileAsString(inputFile);

     Java 
   
 
 
   package com.works.util;

import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;

public class ConvertHtmlToPdfUtil {

    public static void safeCloseBufferedReader(BufferedReader bufferedReader) throws Exception {
        try {
            if (bufferedReader != null) {
                bufferedReader.close();
            }
        } catch (IOException e) {
            throw new Exception("safeCloseBufferedReader  - the method got an error. " + e.getMessage());
        }
    }

    public static String readFileAsString(String filePath) throws Exception {
        BufferedReader br = null;
        String encoding = "UTF-8";

        try {

            br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), encoding));

            StringBuilder fileContentBuilder = new StringBuilder();
            String line;

            while ((line = br.readLine()) != null) {
                if (fileContentBuilder.length() > 0) {
                    fileContentBuilder.append(System.getProperty("line.separator"));
                }
                fileContentBuilder.append(line);
            }

            return fileContentBuilder.toString();

        } catch (Exception e) {
            new Exception("readFileAsString - the method got an error." + e.getMessage(), e);
            return null;
        } finally {
            safeCloseBufferedReader(br);
        }
    }

    public static OutputStream htmlConvertToPdf(String html, String filePath, String fonts) throws Exception {
        OutputStream os = null;
        try {
            os = new FileOutputStream(filePath);
            final PdfRendererBuilder pdfBuilder = new PdfRendererBuilder();
            pdfBuilder.useFastMode();
            pdfBuilder.withHtmlContent(html, null);
            String fontPath = fonts;
            pdfBuilder.useFont(new File(concatPath(fontPath, "times.ttf")), "Times", null, null, false);
            pdfBuilder.toStream(os);
            pdfBuilder.run();
            os.close();
        } catch (Exception e) {
            throw new Exception(e.getMessage(), e);
        } finally {
            try {
                if (os != null) {
                    os.close();
                }
            } catch (IOException e) {
            }
        }
        return os;
    }

    public static String concatPath(String path, String... subPathArr) {
        for (String subPath : subPathArr) {
            if (!path.endsWith(File.separator)) {
                path += File.separator;
            }
            path += subPath;
        }

        return path;
    }
}
 
  

The HTML file is read with FileInputStream in the ConvertHtmlToPdfUtil.readFileAsString method. It is converted into a character set with InputStreamReader and put into the internal buffer with BufferedReader.

The characters in BufferedReader are read line by line as seen in the code block below. All HTML content is thrown into the string variable. With the safeCloseBufferedReader method, we close the buffer when we are done with it.

     Plain Text 
   
 
 
            br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), encoding));
            StringBuilder fileContentBuilder = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                if (fileContentBuilder.length() > 0) {
                    fileContentBuilder.append(System.getProperty("line.separator"));
                }
                fileContentBuilder.append(line);
            }
            return fileContentBuilder.toString(); 
  

We can send our HTML content to the setVariableValue method to be replaced with the value we sent to the service from outside. The key value we marked as #key# in HTML is replaced with the value value.

     Plain Text 
   
 
 
       private String setVariableValue(String htmlContent, String key, String value) {
        if (StringUtils.isNotEmpty(value)) {
            htmlContent = htmlContent.replaceAll("#"+key+"#", value);
        }else {
            htmlContent = htmlContent.replaceAll("#"+key+"#", "");
        }
        return htmlContent;
    } 
  

Then, after the replacement process, we can call the ConvertHtmlToPdfUtil.htmlConvertToPdf method to produce the html URL file as pdf output. ConvertHtmlToPdfUtil.htmlConvertToPdf method can receive html content, output, and font inputs, as can be seen below.

We can pass these inputs to the method.

     Plain Text 
   
                 ConvertHtmlToPdfUtil.htmlConvertToPdf(htmlContent, outputFile, fontFile);

ConvertHtmlToPdfUtil.htmlConvertToPdf method content;

We will create a new FileOutputStream. This will determine the creation of the output.pdf file we specified.

PdfRendererBuilder class is in the com.openhtmltopdf.pdfboxout library. Therefore, we must add this library to the pom.xml file as follows.

     Plain Text 
   
                    final PdfRendererBuilder pdfBuilder = new PdfRendererBuilder();

Pom.xml

     XML 
   
 
 
         <dependency>
            <groupId>com.openhtmltopdf</groupId>
            <artifactId>openhtmltopdf-pdfbox</artifactId>
            <version>1.0.10</version>
        </dependency> 
  

     XML 
   
 
 
   <?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>3.1.1</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>
	<groupId>com.works</groupId>
	<artifactId>convertHtmlToPDF</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>convertHtmlToPDF</name>
	<description>convertHtmlToPDF</description>
	<properties>
		<java.version>17</java.version>
		<spring-cloud.version>2022.0.3</spring-cloud.version>
	</properties>
	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		<dependency>
			<groupId>com.openhtmltopdf</groupId>
			<artifactId>openhtmltopdf-pdfbox</artifactId>
			<version>1.0.10</version>
		</dependency>
		<dependency>
			<groupId>org.springframework.cloud</groupId>
			<artifactId>spring-cloud-starter-zookeeper-discovery</artifactId>
			<version>4.0.1</version>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
				<configuration>
					<image>
						<builder>paketobuildpacks/builder-jammy-base:latest</builder>
					</image>
				</configuration>
			</plugin>
		</plugins>
	</build>

</project>
 
  

After the PdfRendererBuilder object is implemented, we can set the HTML parameter to 'withHtmlContent' and the fontpath parameters to 'useFont'. We can set the output file with toStream. Finally, we can run it with the run method.

     Java 
   
 
 
               pdfBuilder.useFastMode();
            pdfBuilder.withHtmlContent(html, null);
            String fontPath = fonts;
            pdfBuilder.useFont(new File(concatPath(fontPath, "times.ttf")), "Times", null, null, false);
            pdfBuilder.toStream(os);
            pdfBuilder.run(); 
  

pdfBuilder.run(); After the method is run, we should see that the output.pdf file is created under the output folder.

Thus, we can see the smooth HTML to PDF conversion process with the openhtmltopdf-pdfbox library.

HTML Library PDF Plain text XML Java (programming language)

Opinions expressed by DZone contributors are their own.

Related

Trending