Getting Started With OCR docTR on Ubuntu

Explore docTR, an open-source Optical Character Recognition (OCR) solution. Walk through what is needed to install it on Ubuntu as well as a test example.

Sylvain Kalache

May. 08, 24 · Tutorial

Likes (1)

Comment

Save

2.8K Views

In this tutorial, I'll explore how to set up and utilize docTR, the open-source OCR (Optical Character Recognition) solution of the document parsing API startup Mindee. I’ll go through what you need to install docTR on Ubuntu. It accepts PDFs, images, and even a website URL as an input. In this example, I will parse a grocery store receipt. Let’s get started.

Setting Up docTR on Ubuntu

docTR is compatible with any Linux distribution, macOS, and Windows. It is also available as a Docker image. I will use Ubuntu 22.04 LTS (Jammy Jellyfish) for this tutorial. Hardware-wise, you don’t need anything specific, but if you want to do extensive testing, I recommend using a GPU instance; OVHcloud offers affordable options, with servers starting at less than a dollar per hour.

Let’s start by installing Python. At the time of writing, docTR requires Python 3.8 (or higher).

    Shell
   
   sudo apt install -y  python3

To avoid messing with system libraries, let’s use a virtual environment.

    Shell
   
   sudo apt install -y python3.10-venv
python3 -m venv testing-Mindee-docTR

Then we install the OpenGL Mesa 3D Graphics Library, used for the computer vision part of docTR.

    Shell
   
   sudo apt install -y libgl1-mesa-glx

We install pango, which is a text layout engine library.

    Shell
   
   sudo apt-get install -y libpango-1.0-0 libpangoft2-1.0-0

Then, we install pip so that we can install docTR.

    Shell
   
   sudo apt install -y python3-pip

Finally, we install docTR within our virtual environment. This version is specifically for PyTorch. If you choose to use TensorFlow, change the command accordingly.

    Shell
   
   testing-Mindee-docTR/bin/pip3 install "python-doctr[torch]"

Using docTR

Now that docTR is installed, let’s start playing with it. In this example, I will test it with a grocery store receipt. You can download the receipt using the command below.

    Shell
   
   wget "https://media.istockphoto.com/id/889405434/vector/realistic-paper-shop-receipt-vector-cashier-bill-on-white-background.jpg?s=612x612&w=0&k=20&c=M2GxEKh9YJX2W3q76ugKW23JRVrm0aZ5ZwCZwUMBgAg=" -O receipt.jpeg

Create a testing-docTR.py file and insert the following code into it.

    Python
   
   from doctr.io import DocumentFile
from doctr.models import ocr_predictor

# Load the grocery receipt
doc = DocumentFile.from_images("receipt.jpeg")

# Load the OCR model
model = ocr_predictor(pretrained=True)

# Perform OCR
result = model(doc)

# Display the OCR result
print(result.export())

Note that docTR uses a two-stage approach:

First, it performs text detection to localize words.
Then, it conducts text recognition to identify all characters in the word.

The ocr_predictor function accepts additional parameters to select the text detection and recognition architecture. For simplicity, I used the default ones in this example. You can find information about other models on the docTR documentation.

Reading a Receipt Using docTR

Now you just need to run your Python script:

    Shell
   
   testing-Mindee-docTR/bin/python3 testing-docTR.py

You will get an output such as the one below:

    JSON
   
   {"pages": [{"page_idx": 0, "dimensions": [612, 612], "orientation": {"value": null, "confidence": null}, "language": {"value": null, "confidence": null}, "blocks": [{"geometry": [[0.44140625, 0.1201171875], [0.548828125, 0.14453125]], "lines": [{"geometry": [[0.44140625, 0.1201171875], [0.548828125, 0.14453125]], "words": [{"value": "RECEIPT", "confidence": 0.9695481061935425, "geometry": [[0.44140625, 0.1201171875], [0.548828125, 0.14453125]]}]}], "artefacts": []}]}]}

Note that I drastically shortened the JSON output for readability and only kept the part showing the “RECEIPT” word.

Here is the JSON structure you’d be looking at without truncating the result. I have expanded the part of the tree that I kept in the JSON output. docTR will provide a bunch of information about the document but the important part is about how it breaks down the document into lines, and for each line, provides an array containing the words it detected along with the degree of confidence. Here, we can see it spotted the word RECEIPT with a confidence of 96%.

docTR offers an efficient OCR solution that simplifies text recognition processes. Depending on the document type, you may need to change the text detection and text recognition architectures to improve accuracy. Comprehensive docTR documentation is available here.

Considerations When Using docTR

Deploying docTR entails certain complexities. First, you must create a dataset and train docTR to achieve satisfactory accuracy. This means dealing with data annotation on many images. Since OCR systems typically serve as backend services for other apps, it may be necessary to integrate docTR via an API and scale it according to the app’s needs. docTR does not provide this out of the box, but there are many open-source technologies that can help facilitate this step.

Conclusion

Document processing technologies have come a long way since the advent of OCR tools, which are limited to character recognition. Intelligent Document Processing (IDP) platforms represent the next step; they utilize OCR (such as docTR) along with additional layers of intelligence like table reconstruction, document classification, and natural language understanding, to achieve better accuracy and precision.

Additionally, for those seeking a scalable IDP solution without the complexities of data collection and model training, I recommend trying out Mindee’s latest solution, docTI. This training-free IDP solution leverages Large Language Models (LLMs) to eliminate the need for data collection, annotation, and the model training process. You can use the free-tier plan, configure an instance, and start querying the API in minutes.

API Data set Document Python (language) ubuntu

Opinions expressed by DZone contributors are their own.

Related

Trending