How To Perform OCR on a Photograph of a Receipt Using Java
Learn of challenges associated with processing physical receipts for digital expensing operations and discover an OCR API solution to alleviate the problem.
Join the DZone community and get the full member experience.
Join For FreeThe purpose of this article is to demonstrate an API that is specifically designed to perform OCR (Optical Character Recognition) operations on photographs of receipts and extract key business information from them automatically, such as the name and address of the business, the phone number, the receipt total, and much more. Further down the page, I’ve provided code examples and instructions to help you structure an API call in Java.
There are dozens of costs associated with running a business, and efforts to manage those costs vary in complexity. While corporate expenditures such as office rent, salaries, and vendor contracts represent cyclical and manageable invoices that internal teams (i.e., accounts payable) can handle directly, employee expenditures in the form of client dinners, taxi rides, and team outings require corporate reimbursement which may only be accomplished with proof of the employee’s transactions. For the employee, proving such transactions entails presenting a receipt to the business. Along with displaying the all-important total cost of the outing, receipts provide other useful information which the employee’s business can verify, including the name of the venue the employee visited, its website, address, phone number, and a list of the specific purchased goods or services at that location. As simple as the receipt-expensing process may appear, however, it often suffers from a major technological deficiency: most businesses have fully digitized their payroll and expensing procedures, and receipts are still often obtained in hard-copy form. As a result, transitioning a physical receipt into a digital form presents a relevant business technology challenge.
At a high level, Optical Character Recognition refers to the digitization of hard-copy text contents into data that can be easily stored, queried, and transferred in/from a database. To achieve digitization, a scanned or photographed image of a document is processed by an OCR API/application, and the characters on the document are singled out by a recognition algorithm. Each page of the document is broken down into various components (blocks of text, tables, images, etc.), and each letter from those sections is recreated digitally, either through comparison with a stored set of alphabetical or numerical characters or through recognition of unique shapes/features. Without doubt, the inclusion of OCR in businesses across the world has helped to increase workflow efficiency. Not only has it reduced the overhead cost of data entry services, but it has also enhanced the accessibility of data with hard-copy origins. Further, it has improved the security of this data considerably: there’s no longer a risk of those documents being lost forever in the case of a fire, robbery, or similarly devastating event.
Before smartphones, scanning documents using a bulky office scanning device was the most common method of creating a digital copy of a document for OCR purposes (or for object storage). Nowadays, with the abundance of handheld smartphones in circulation and a growing cultural acceptance of remote work, phone cameras have become extremely relevant DIY (Do It Yourself) tools for OCR. Taking crude photos of a document from a personal device and sending those photos directly to a relevant stakeholder is now considered standard practice for many professional transactions, receipt-expensing included.
Receipts are among the most obvious beneficiaries among documents that can be photographed for OCR through personal devices. They are hardly ever collected at convenient times: we often receive receipts while leaving a taxi, walking out of a restaurant, or leaving a shopping center with bags of goods in our hands. Further, receipts are typically made of flimsy material and are more prone to damage than most other physical documents (if they are not adequately maintained). This makes the process of successfully storing and handing over physical receipt copies harder than most other physical documents we receive, which might arrive laminated or tucked neatly within a manila envelope. The ability to take a quick photo of a receipt for OCR purposes and send that photo directly to a relevant expense-processing application means reducing the burden on the employee to manage the physical document over an extended period. This convenience drastically improves expense processing efficiency, ensuring receipts can be funneled through a singular entry point for storage both as an object (e.g., a photo file such as JPG or PNG) AND as searchable text within a separate document/application.
Including a Receipt OCR service for your business is straightforward using the Cloudmersive Receipt OCR API. This API supports dozens of common languages (including English, Arabic, Chinese, and more) and outputs all the important information contained within a receipt including the receipt timestamp, business name, business website, business address, business phone number, receipt items (including a description and price per item), and the receipt total and subtotal. The API can be optionally configured to include advanced recognition and handwriting recognition modes, and you may elect to turn on an optional preprocessing mode which will automatically enhance the image before the operation takes place (this will correct for some minor errors that may have occurred while taking the original photo). Below, I will demonstrate how to structure your API call with Java.
To begin, we will first need to install the SDK package with Maven. To do so, let’s first add a reference to the repository in pom.xml:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
Next, let’s a reference to the dependency in pom.xml:
<dependencies>
<dependency>
<groupId>com.github.Cloudmersive</groupId>
<artifactId>Cloudmersive.APIClient.Java</artifactId>
<version>v4.25</version>
</dependency>
</dependencies>
Once installation is finished, we can include the imports at the top of the file:
// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.ImageOcrApi;
Then, we can call the Receipt OCR function using the code examples below. At this stage, you’ll need to lay out a few parameters and decide which optional features you wish to include. Your parameters include the following:
- Your input file to perform the operation on
- Your Cloudmersive API key (this can be obtained by registering a free account on our website, which will provide a limit of 800 API calls per month)
And your optional features include the following:
- Enable Advanced Recognition Mode by setting
recognitionMode
equal to the stringAdvanced
. - Enable Handwriting Recognition Mode by setting
recognitionMode
equal to the stringEnableHandwriting
. - Set language (default option is English) by making
String Language
equal to a string including the preferred language’s three-letter identifier. - Enable Pre-Processing (default is disabled) by setting
string preprocessing
equal toAdvanced
.
ApiClient defaultClient = Configuration.getDefaultApiClient();
// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");
ImageOcrApi apiInstance = new ImageOcrApi();
File imageFile = new File("/path/to/inputfile"); // File | Image file to perform OCR on. Common file formats such as PNG, JPEG are supported.
String recognitionMode = "recognitionMode_example"; // String | Optional, enable advanced recognition mode by specifying 'Advanced', enable handwriting recognition by specifying 'EnableHandwriting'. Default is disabled.
String language = "language_example"; // String | Optional, language of the input document, default is English (ENG). Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)
String preprocessing = "preprocessing_example"; // String | Optional, preprocessing mode, default is 'None'. Possible values are None (no preprocessing of the image), and 'Advanced' (automatic image enhancement of the image before OCR is applied; this is recommended and needed to handle rotated receipts).
try {
ReceiptRecognitionResult result = apiInstance.imageOcrPhotoRecognizeReceipt(imageFile, recognitionMode, language, preprocessing);
System.out.println(result);
} catch (ApiException e) {
System.err.println("Exception when calling ImageOcrApi#imageOcrPhotoRecognizeReceipt");
e.printStackTrace();
}
Once you’ve configured your optional features, your API call is complete and ready for testing. Below, I've included an example API response model JSON for your reference:
{
"Successful": true,
"Timestamp": "2022-07-14T14:33:18.565Z",
"BusinessName": "string",
"BusinessWebsite": "string",
"AddressString": "string",
"PhoneNumber": "string",
"ReceiptItems": [
{
"ItemDescription": "string",
"ItemPrice": 0
}
],
"ReceiptSubTotal": 0,
"ReceiptTotal": 0
}
To further enhance the quality of your images for OCR, I recommend researching preprocessing APIs in-depth and testing those which may resolve your most commonly recurring issues. For example, it’s common in OCR photos to see documents at a slight angle on top of a dark background. With effective preprocessing services, you may be able to automatically detect those angles and correct them automatically, ensuring your OCR application detects text with the highest possible degree of accuracy.
Opinions expressed by DZone contributors are their own.
Comments