--- title: TimelineLabels ML Backend for Label Studio type: guide tier: all order: 51 hide_menu: true hide_frontmatter_title: true meta_title: TimelineLabels ML Backend for Label Studio meta_description: Tutorial on how to use an example ML backend for Label Studio with TimelineLabels categories: - Computer Vision - Video Classification - Temporal Labeling - LSTM image: "/guide/ml_tutorials/yolo-video-classification.png" --- # TimelineLabels Model for Temporal Video Multi-Label Classification in Label Studio This documentation provides a clear and comprehensive guide on how to use the TimelineLabels model for temporal multi-label classification of video data in Label Studio. By integrating an LSTM neural network on top of YOLO's classification capabilities — specifically utilizing features from YOLO's last layer — the model handles temporal labeling tasks. Users can easily customize neural network parameters directly within the labeling configuration to tailor the model to their specific use cases or use this model as a foundation for further development. In trainable mode, you'll begin by annotating a few samples by hand. Each time you click **Submit**, the model will retrain on the new annotation that you've provided. Once the model begins predicting your trained labels on new tasks, it will automatically populate the timeline with the labels that it has predicted. You can validate or change these labels, and updating them will again retrain the model, helping you to iteratively improve.
**Tip:** If you're looking for a more advanced approach to temporal classification, check out the [VideoMAE model](https://huggingface.co/docs/transformers/en/model_doc/videomae). While we don't provide an example backend for VideoMAE, you can [integrate it as your own ML backend](https://labelstud.io/guide/ml_create). ## Installation and quickstart Before you begin, you need to install the [Label Studio ML backend](https://github.com/HumanSignal/label-studio-ml-backend/blob/master/README.md#quickstart). This tutorial uses the [YOLO example](https://github.com/HumanSignal/label-studio-ml-backend/tree/master/label_studio_ml/examples/yolo). See the [main README](https://github.com/HumanSignal/label-studio-ml-backend/blob/master/label_studio_ml/examples/yolo/README.md#quick-start) for detailed instructions on setting up the YOLO-models family in Label Studio. ## Labeling configuration ```xml ``` IMPORTANT: You must set the **`frameRate`** attribute in the `Video` tag to the correct value. All your videos should have the same frame rate. Otherwise, the submitted annotations will be **misaligned** with videos. ## Parameters | Parameter | Type | Default | Description | |---------------------------------------|--------|---------|---------------------------------------------------------------------------------------------------------------------------------------------------| | `model_trainable` | bool | False | Enables the trainable mode, allowing the model to learn from your annotations incrementally. | | `model_classifier_epochs` | int | 1000 | Number of training epochs for the LSTM neural network. | | `model_classifier_sequence_size` | int | 16 | Size of the LSTM sequence in frames. Adjust to capture longer or shorter temporal dependencies, 16 frames are about ~0.6 sec with 25 frame rate. | | `model_classifier_hidden_size` | int | 32 | Size of the LSTM hidden state. Modify to change the capacity of the LSTM. | | `model_classifier_num_layers` | int | 1 | Number of LSTM layers. Increase for a deeper LSTM network. | | `model_classifier_f1_threshold` | float | 0.95 | F1 score threshold for early stopping during training. Set to prevent overfitting. | | `model_classifier_accuracy_threshold` | float | 1.00 | Accuracy threshold for early stopping during training. Set to prevent overfitting. | | `model_score_threshold` | float | 0.5 | Minimum confidence threshold for predictions. Labels with confidence below this threshold will be disregarded. | | `model_path` | string | None | Path to the custom YOLO model. See more in the section [Your own custom models](https://github.com/HumanSignal/label-studio-ml-backend/blob/master/label_studio_ml/examples/yolo/README.md#your-own-custom-yolo-models) | **Note:** You can customize the neural network parameters directly in the labeling configuration by adjusting the attributes in the `` tag. ## Using the model ### Simple mode In the simple mode, the model uses pre-trained YOLO classes to generate predictions without additional training. - **When to Use**: Quick setup without the need for custom training. It starts generating predictions immediately. - **Configuration**: Set `model_trainable="false"` in the labeling config (or omit it as `false` is the default). - **Example**: ```xml ``` ### Trainable mode The trainable mode enables the model to learn from your annotations incrementally. It uses the pre-trained YOLO classification model and a custom LSTM neural network on the top to capture temporal dependencies in video data. The LSTM model works from scratch, so it requires about 10-20 well-annotated videos 500 frames each (~20 seconds) to start making meaningful predictions. - **When to Use**: When custom labels or improved accuracy are needed relative to simple mode. - **Configuration**: Set `model_trainable="true"` in the labeling config. - **Training Process**: - Start annotating videos using the `TimelineLabels` tag. - After submitting the first annotation, the model begins training. - The `partial_fit()` method allows the model to train incrementally with each new annotation. - **Requirements**: Approximately 10-20 annotated tasks are needed to achieve reasonable performance. **Note**: The `predicted_values` attribute in the `