---
title: TimelineLabels ML Backend for Label Studio
type: guide
tier: all
order: 51
hide_menu: true
hide_frontmatter_title: true
meta_title: TimelineLabels ML Backend for Label Studio
meta_description: Tutorial on how to use an example ML backend for Label Studio with TimelineLabels
categories:
- Computer Vision
- Video Classification
- Temporal Labeling
- LSTM
This documentation provides a clear and comprehensive guide on how to use the TimelineLabels model
for temporal multi-label classification of video data in Label Studio.
By integrating an LSTM neural network on top of YOLO's classification capabilities —
specifically utilizing features from YOLO's last layer — the model handles temporal labeling tasks.
Users can easily customize neural network parameters directly within the labeling configuration
to tailor the model to their specific use cases or use this model as a foundation for further development.
In trainable mode, you'll begin by annotating a few samples by hand. Each time you click Submit, the model will retrain on the new annotation that you've provided. Once the model begins predicting your trained labels on new tasks, it will automatically populate the timeline with the labels that it has predicted. You can validate or change these labels, and updating them will again retrain the model, helping you to iteratively improve.
Tip: If you're looking for a more advanced approach to temporal classification, check out the VideoMAE model. While we don't provide an example backend for VideoMAE, you can integrate it as your own ML backend.
Before you begin, you need to install the Label Studio ML backend.
This tutorial uses the YOLO example. See the main README for detailed instructions on setting up the YOLO-models family in Label Studio.
<View>
<TimelineLabels name="label" toName="video"
model_trainable="true"
model_classifier_epochs="1000"
model_classifier_sequence_size="16"
model_classifier_hidden_size="32"
model_classifier_num_layers="1"
model_classifier_f1_threshold="0.95"
model_classifier_accuracy_threshold="0.99"
model_score_threshold="0.5"
>
<Label value="Ball touch" background="red"/>
<Label value="Ball in frame" background="blue"/>
</TimelineLabels>
<Video name="video" value="$video" height="700" frameRate="25.0" timelineHeight="200" />
</View>
IMPORTANT: You must set the frameRate attribute in the Video tag to the correct value.
All your videos should have the same frame rate. Otherwise, the submitted annotations will be misaligned with videos.
| Parameter | Type | Default | Description |
|---|---|---|---|
model_trainable |
bool | False | Enables the trainable mode, allowing the model to learn from your annotations incrementally. |
model_classifier_epochs |
int | 1000 | Number of training epochs for the LSTM neural network. |
model_classifier_sequence_size |
int | 16 | Size of the LSTM sequence in frames. Adjust to capture longer or shorter temporal dependencies, 16 frames are about ~0.6 sec with 25 frame rate. |
model_classifier_hidden_size |
int | 32 | Size of the LSTM hidden state. Modify to change the capacity of the LSTM. |
model_classifier_num_layers |
int | 1 | Number of LSTM layers. Increase for a deeper LSTM network. |
model_classifier_f1_threshold |
float | 0.95 | F1 score threshold for early stopping during training. Set to prevent overfitting. |
model_classifier_accuracy_threshold |
float | 1.00 | Accuracy threshold for early stopping during training. Set to prevent overfitting. |
model_score_threshold |
float | 0.5 | Minimum confidence threshold for predictions. Labels with confidence below this threshold will be disregarded. |
model_path |
string | None | Path to the custom YOLO model. See more in the section Your own custom models |
Note: You can customize the neural network parameters directly in the labeling configuration by adjusting the attributes in the <TimelineLabels> tag.
In the simple mode, the model uses pre-trained YOLO classes to generate predictions without additional training.
model_trainable="false" in the labeling config (or omit it as false is the default).xml <View> <Video name="video" value="$video" height="700" frameRate="25.0" timelineHeight="200" /> <TimelineLabels name="label" toName="video" model_trainable="false"> <Label value="Ball" predicted_values="soccer_ball"/> <Label value="tiger_shark" /> </TimelineLabels> </View>
The trainable mode enables the model to learn from your annotations incrementally.
It uses the pre-trained YOLO classification model and a custom LSTM neural network on the top
to capture temporal dependencies in video data. The LSTM model works from scratch,
so it requires about 10-20 well-annotated videos 500 frames each (~20 seconds) to start making meaningful predictions.
model_trainable="true" in the labeling config.TimelineLabels tag.partial_fit() method allows the model to train incrementally with each new annotation.Note: The predicted_values attribute in the <Label> tag doesn't make sense for trainable models.
Example:
<View>
<Video name="video" value="$video" height="700" frameRate="25.0" timelineHeight="200" />
<TimelineLabels name="label" toName="video"
model_trainable="true"
model_classifier_epochs="1000"
model_classifier_sequence_size="16"
model_classifier_hidden_size="32"
model_classifier_num_layers="1"
model_classifier_f1_threshold="0.95"
model_classifier_accuracy_threshold="0.99"
model_score_threshold="0.5">
<Label value="Ball in frame"/>
<Label value="Ball touch"/>
</TimelineLabels>
</View>
The trainable mode uses a custom implementation of the temporal LSTM classification model.
The model is trained incrementally with each new annotation submit or update,
and it generates predictions for each frame in the video.
yolov8n-cls.pt) to extract features from video frames.utils/neural_nets.py::cached_feature_extraction()).You can load your own YOLO models using the steps described in the main README.
However, it should have similar architecture as yolov8-cls models. See utils/neural_nets.py::cached_feature_extraction() for more details.
It's located in /app/cache_dir and stores the cached intermediate features from the last layer of the YOLO model.
The cache is used for incremental training on the fly and prediction speedup.
partial_fit()utils/converter.py::convert_timelinelabels_to_probs().model_classifier_sequence_size chunks.BCEWithLogitsLoss) and using class pos weights to address this issue.timeline_labels.py::fit() for more details.xml <View> <TimelineLabels name="videoLabels" toName="video"> <Label value="Ball touch" background="red"/> <Label value="Ball in frame" background="blue"/> </TimelineLabels> <Video name="video" value="$video" height="700" timelineHeight="200" frameRate="25.0" /> </View>
Connect the model Backend:
Create a new project, go to Settings > Model and add the YOLO backend.
1. Navigate to the yolo folder in this repository in your terminal.
2. Update your docker-compose.yml file.
3. Execute docker compose up to run the backend.
4. Connect this backend to your Label Studio project in the project settings. Make sure that Interactive Preannotations is OFF (this is the default).
<TimelineLabels> control tag to label time intervals where the ball is visible in the frame.If the model is not performing well, consider modifying the LSTM and classifier training parameters in the labeling config. These parameters
start with the model_classifier_ prefix.
The model will be reset after changing these parameters:
- model_classifier_sequence_size
- model_classifier_hidden_size
- model_classifier_num_layers
- New labels added or removed from the labeling config
So you may need to update (click Update) on annotations to see improvements.
If you want to modify more parameters, you can do it directly in the code in utils/neural_nets.py::MultiLabelLSTM.
If you need to reset the model completely, you can remove the model file from /app/models.
See timeline_labels.py::get_classifier_path() for the model path. Usually it starts with the timelinelabels- prefix.
To debug the model, you should run it with the LOG_LEVEL=DEBUG environment variable (see docker-compose.yml),
then check the logs in the (docker) console.
There are two main functions to convert the TimelineLabels regions to label arrays and back:
- utils/converter.py::convert_timelinelabels_to_probs() - Converts TimelineLabels regions to label arrays
- utils/converter.py::convert_probs_to_timelinelabels() - Converts label arrays to TimelineLabels regions
Each row in the label array corresponds to a frame in the video.
The label array is a binary matrix where each column corresponds to a label.
If the label is present in the frame, the corresponding cell is set to 1, otherwise 0.
For example: [ [0, 0, 1], [0, 1, 0], [1, 0, 0] ]
This corresponds to the labels [label3, label2, label1] in the frames 1, 2, 3.
See tests/test_timeline_labels.py::test_convert_probs_to_timelinelabels() for more examples.
This guide provides an in-depth look at the architecture and code flow of the TimelineLabels ML backend for Label Studio. It includes class inheritance diagrams and method call flowcharts to help developers understand how the components interact. Additionally, it offers explanations of key methods and classes, highlighting starting points and their roles in the overall workflow.
The following diagram illustrates the class inheritance hierarchy in the TimelineLabels ML backend.
classDiagram
class ControlModel
class TimelineLabelsModel
class TorchNNModule
class BaseNN
class MultiLabelLSTM
ControlModel <|-- TimelineLabelsModel
TorchNNModule <|-- BaseNN
BaseNN <|-- MultiLabelLSTM
ControlModel: Base class for control tags in Label Studio.TimelineLabelsModel: Inherits from ControlModel and implements specific functionality for the <TimelineLabels> tag.torch.nn.Module: Base class for all neural network modules in PyTorch.BaseNN: Custom base class for neural networks, inherits from torch.nn.Module.MultiLabelLSTM: Inherits from BaseNN, implements an LSTM neural network for multi-label classification.The following flowchart depicts the method calls during the prediction process.
flowchart TD
A[TimelineLabelsModel.predict_regions]
A --> B{Is self.trainable?}
B -->|Yes| C[create_timelines_trainable]
B -->|No| D[create_timelines_simple]
C --> E[cached_feature_extraction]
C --> F[Load classifier using BaseNN.load_cached_model]
C --> G[classifier.predict]
C --> H[convert_probs_to_timelinelabels]
C --> I[Return predicted regions]
D --> J[cached_yolo_predict]
D --> K[Process frame results]
D --> L[convert_probs_to_timelinelabels]
D --> M[Return predicted regions]
The following flowchart shows the method calls during the training process.
flowchart TD
N[TimelineLabelsModel.fit]
N --> O{Event is 'ANNOTATION_CREATED' or 'ANNOTATION_UPDATED'?}
O -->|Yes| P[Extract task and regions]
P --> Q[Get model parameters]
Q --> R[Get video path]
R --> S[cached_feature_extraction]
S --> T[Prepare features and labels]
T --> U[Load or create classifier]
U --> V[classifier.partial_fit]
V --> W[classifier.save]
W --> X[Return True]
O -->|No| Y[Return False]
File: timeline_labels.py
The TimelineLabelsModel class extends the ControlModel base class and implements functionality specific to the <TimelineLabels> control tag.
Key methods:
is_control_matched(cls, control): Class method that checks if the provided control tag matches the <TimelineLabels> tag.
create(cls, *args, **kwargs): Class method that creates an instance of the model, initializing attributes like trainable and label_map.
predict_regions(self, video_path): Main method called during prediction. Determines whether to use the simple or trainable prediction method based on the trainable attribute.
create_timelines_simple(self, video_path): Uses pre-trained YOLO classes for prediction without additional training.
cached_yolo_predict to get predictions from the YOLO model.create_timelines_trainable(self, video_path): Uses the custom-trained LSTM neural network for prediction.
cached_feature_extraction to extract features from the video.fit(self, event, data, **kwargs): Called when new annotations are created or updated. Handles the incremental training of the LSTM model.
Extracts features and labels from the annotated video.
partial_fit on the classifier to update model parameters.get_classifier_path(self, project_id): Generates the file path for storing the classifier model based on the project ID and model name.
File: neural_nets.py
The BaseNN class serves as a base class for neural network models, providing common methods for saving, loading, and managing label mappings.
Key methods:
set_label_map(self, label_map): Stores the label mapping dictionary.
get_label_map(self): Retrieves the label mapping dictionary.
save(self, path): Saves the model to the specified path using torch.save.
load(cls, path): Class method to load a saved model from the specified path.
load_cached_model(cls, model_path): Loads a cached model if it exists, otherwise returns None.
File: neural_nets.py
The MultiLabelLSTM class inherits from BaseNN and implements an LSTM neural network for multi-label classification.
Key methods:
__init__(...): Initializes the neural network layers and parameters, including input size, hidden layers, dropout, and optimizer settings.
forward(self, x): Defines the forward pass of the network.
Reduces input dimensionality using a fully connected layer.
preprocess_sequence(self, sequence, labels=None, overlap=2): Prepares input sequences and labels for training by splitting and padding them.
partial_fit(self, sequence, labels, ...): Trains the model incrementally on new data.
Pre-processes the input sequence.
predict(self, sequence): Generates predictions for a given input sequence.
Splits the sequence into chunks.
Prediction request: When a prediction is requested for a video, TimelineLabelsModel.predict_regions(video_path) is called.
Determine mode:
Trainable mode (self.trainable == True): Calls create_timelines_trainable(video_path).
cached_feature_extraction.BaseNN.load_cached_model.classifier.predict(yolo_probs).convert_probs_to_timelinelabels.Simple mode (self.trainable == False): Calls create_timelines_simple(video_path).
cached_yolo_predict.convert_probs_to_timelinelabels.Event Trigger: The fit(event, data, **kwargs) method is called when an annotation event occurs (e.g., 'ANNOTATION_CREATED' or 'ANNOTATION_UPDATED').
Event Handling:
Parameter Extraction: Retrieves model parameters from the control tag attributes, such as epochs, sequence size, hidden size, and thresholds.
Data Preparation:
cached_feature_extraction.BaseNN.load_cached_model.MultiLabelLSTM model.classifier.partial_fit(features, labels, ...) to train the model incrementally.classifier.save(path).Cached Prediction and Feature Extraction:
cached_yolo_predict(yolo_model, video_path, cache_params): Uses joblib's Memory to cache YOLO predictions, avoiding redundant computations.
cached_feature_extraction(yolo_model, video_path, cache_params): Extracts features from the YOLO model by hooking into the penultimate layer and caches the results.
Data Conversion Functions:
convert_probs_to_timelinelabels(probs, label_map, threshold): Converts probability outputs to timeline labels suitable for Label Studio.
convert_timelinelabels_to_probs(regions, label_map, max_frame): Converts annotated regions back into a sequence of probabilities for training.
The TimelineLabels ML backend integrates seamlessly with Label Studio to provide temporal multi-label classification capabilities for video data. The architecture leverages pre-trained YOLO models for feature extraction and enhances them with an LSTM neural network for capturing temporal dependencies.
Understanding the class hierarchies and method flows is crucial for developers looking to extend or modify the backend. By following the starting points and execution flows outlined in this guide, developers can navigate the codebase more effectively and implement custom features or optimizations.
Note: For further development or contributions, please refer to the README_DEVELOP.md file, which provides additional guidelines and architectural details.