---
title: Interactive bounding boxes OCR with Tesseract
type: guide
tier: all
order: 55
hide_menu: true
hide_frontmatter_title: true
meta_title: Interactive bounding boxes OCR in Label Studio with a Tesseract backend
meta_description: Tutorial for how to use Label Studio and Tesseract to assist with your OCR projects
categories:
- Computer Vision
- Optical Character Recognition
- Tesseract
Use an OCR engine for interactive ML-assisted labeling, facilitating faster
annotation for layout detection, classification, and recognition
models.
Tesseract is used for OCR but minimal adaptation is needed to connect other OCR
engines or models.
Tested against Label Studio 1.10.1, with basic support for both Label Studio
Local File Storage and S3-compatible storage, with a example data storage with
MinIO.
Before you begin:
* Ensure git is installed
* Ensure Docker Compose is installed. For MacOS and Windows users,
we suggest using Docker Desktop.
You must also install the Label Studio ML backend.
This tutorial uses the tesseract example.
Launch Label Studio. You can follow the guide from the official documentation or use the following commands:
docker run -it \ -p 8080:8080 \ -v `pwd`/mydata:/label-studio/data \ heartexlabs/label-studio:latest
Optionally, you may enable local file serving in Label Studio
docker run -it \ -p 8080:8080 \ -v `pwd`/mydata:/label-studio/data \ --env LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true \ --env LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/data/images \ heartexlabs/label-studio:latest
If you're using local file serving, be sure to get a copy of the API token from
Label Studio to connect the model.
Create a new project for Tesseract OCR. In the project Settings set up the Labeling Interface.
Fill in the following template code. It's important to specify smart="true" in RectangleLabels.
```
<RectangleLabels name="bbox" toName="image" strokeWidth="1" smart="true">
<Label value="Label1" background="green"/>
<Label value="Label2" background="blue"/>
<Label value="Label3" background="red"/>
</RectangleLabels>
<TextArea name="transcription" toName="image"
editable="true" perRegion="true" required="false"
maxSubmissions="1" rows="5" placeholder="Recognized Text"
displayMode="region-list"/>
```
Download the Label Studio Machine Learning backend repository.
git clone https://github.com/humansignal/label-studio-ml-backend cd label-studio-ml-backend/label_studio_ml/examples/tesseract
Configure parameters in example.env file:
```
LABEL_STUDIO_HOST=http://host.docker.internal:8080
LABEL_STUDIO_ACCESS_TOKEN=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_ENDPOINT=http://host.docker.internal:9000
MINIO_ROOT_USER=
MINIO_ROOT_PASSWORD=
MINIO_API_CORS_ALLOW_ORIGIN=*
```
Depending on your data ingestion method, several configurations are possible:
Local File Storage
If you opted to use Label Studio Local File Storage, be sure to set the LABEL_STUDIO_HOST and LABEL_STUDIO_ACCESS_TOKEN variables.
S3-Compatible Storage (MinIO or AWS S3)
Configure the backend and the MinIO server by editing the MINIO_ROOT_USER AND MINIO_ROOT_PASSWORD variables, and make the
AWS_ACCESS_KEY_ID AND AWS_SECRET_ACCESS_KEY variables equal to those values. You may optionally connect to your
own AWS cloud storage by setting those variables. Note that you may need to make additional software changes to the
tesseract.py file to match your particular infrastructure configuration.
Note: If you're using this method, remove
LABEL_STUDIO_ACCESS_TOKENfrom theexample.envfile or leave it empty.
Other remote storage
If you host your images on any other public storage with http or https access, don't change the default example.env file.
docker compose up
If you're using the Label Studio Local File Storage option, upload images
directly to Label Studio using the Label Studio interface.
If you're using MinIO for task storage, log into the MinIO control panel at
http://localhost:9001. Create a new bucket, making a note of the name, and
upload your tasks to MinIO.
Set the visibility of the tasks to be public.
Further configuration of your cloud storage is beyond the scope of this
tutorial, and you will want to configure your storage according to your
particular needs.
If using MinIO, go to the Cloud storage page in the project settings.
Add your source S3 storage by connecting to the S3 Endpoint
http://host.docker.internal:9000, using the bucket name from the previous
step, and Access Key ID and Secret Access Key as configured in the previous
steps. For the MinIO example, uncheck Use pre-signed URLS. Check the
connection and save the storage.
From the project settings, select the Model page and click Connect Model.
Add the URL http://host.docker.internal:9090 and save the model as an ML backend.
To use this functionality, activate Auto-Annotation and use the Autodetect rectangle for drawing boxes
Example below:

Reference links:
- https://labelstud.io/blog/improve-ocr-quality-for-receipt-processing-with-tesseract-and-label-studio
- https://labelstud.io/blog/release-130.html