---
title: Classify text with a BERT model
type: guide
tier: all
order: 35
hide_menu: true
hide_frontmatter_title: true
meta_title: BERT-based text classification
meta_description: Tutorial on how to use BERT-based text classification with your Label Studio project
categories:
- Natural Language Processing
- Text Classification
- BERT
- Hugging Face
The NewModel is a BERT-based text classification model that is designed to work with Label Studio. This model uses the Hugging Face Transformers library to fine-tune a BERT model for text classification. The model is trained on the labeled data from Label Studio and then used to make predictions on new data. With this model connected to Label Studio, you can:
Before you begin, you must install the Label Studio ML backend.
This tutorial uses the bert_classifier example.
http://localhost:9090 with the prebuilt image:docker-compose up
$ curl http://localhost:9090/
{"status":"UP"}
http://localhost:9090.Warning! Note the current limitation of the ML backend: models are loaded dynamically from huggingface.co. You may need the
HF_TOKENenv variable provided in your environment. Consequently, this may result in a slow response time for the first prediction request. If you are experiencing timeouts on Label Studio side (i.e., no predictions are visible when opening the task), check the logs of the ML backend for any errors, and refresh the page in a few minutes.
To build the ML backend from source, you have to clone the repository and build the Docker image:
docker-compose build
To run the ML backend without Docker, you have to clone the repository and install all dependencies using pip:
python -m venv ml-backend
source ml-backend/bin/activate
pip install -r requirements.txt
Then you can start the ML backend:
label-studio-ml start ./dir_with_your_model
In project Settings > Labeling Interface > Browse Templates > Natural Language Processing > Text Classification, you can find the default labeling configuration for text classification in Label Studio. This configuration includes a single <Choices> output and a single <Text> input.
Feel free to modify the set of labels in the <Choices> tag to match your specific task, for example:
<View>
<Text name="text" value="$text" />
<Choices name="label" toName="text" choice="single" showInLine="true">
<Choice value="label one" />
<Choice value="label two" />
<Choice value="label three" />
</Choices>
</View>
Parameters can be set in docker-compose.yml before running the container.
The following common parameters are available:
BASIC_AUTH_USER - Specify the basic auth user for the model serverBASIC_AUTH_PASS - Specify the basic auth password for the model serverLOG_LEVEL - Set the log level for the model serverWORKERS - Specify the number of workers for the model serverTHREADS - Specify the number of threads for the model serverBASELINE_MODEL_NAME: The name of the baseline model to use for training. Default is bert-base-multilingual-cased.The following parameters are available for training:
LABEL_STUDIO_HOST (required): The URL of the Label Studio instance. Default is http://localhost:8080.LABEL_STUDIO_API_KEY (required): The API key for the Label Studio instance.START_TRAINING_EACH_N_UPDATES: The number of labeled tasks to download from Label Studio before starting training. Default is 10.LEARNING_RATE: The learning rate for the model training. Default is 2e-5.NUM_TRAIN_EPOCHS: The number of epochs for model training. Default is 3.WEIGHT_DECAY: The weight decay for the model training. Default is 0.01.FINETUNED_MODEL_NAME: The name of the fine-tuned model. Default is finetuned_model. Checkpoints will be saved under this name.Note: The
LABEL_STUDIO_API_KEYis required for training the model. You can find the API key in Label Studio under the Account & Settings page.
The ML backend can be customized by adding your own models and logic inside the ./bert_classifier directory.