---
title: Sync data from external storage
short: Add project storage
type: guide
tier: all
order: 151
order_enterprise: 151
meta_title: Cloud and External Storage Integration
meta_description: "Label Studio Documentation for integrating Amazon AWS S3, Google Cloud Storage, Microsoft Azure, Redis, and local file directories with Label Studio."
section: "Import & Export"
Integrate popular cloud and external storage systems with Label Studio to collect new items uploaded to the buckets, containers, databases, or directories and return the annotation results so that you can use them in your machine learning pipelines.
When working with an external cloud storage connection, keep the following in mind:
You can add source storage connections to sync data from an external source to a Label Studio project, and add target storage connections to sync annotations from Label Studio to external storage. Each source and target storage setup is project-specific. You can connect multiple buckets, containers, databases, or directories as source or target storage for a project.
Label Studio does not automatically sync data from source storage. If you upload new data to a connected cloud storage bucket, sync the storage connection using the UI to add the new labeling tasks to Label Studio without restarting. You can also use the API to set up or sync storage connections. See Label Studio API and locate the relevant storage connection type.
Task data synced from cloud storage is not stored in Label Studio. Instead, the data is accessed using presigned URLs. You can also secure access to cloud storage using VPC and IP restrictions for your storage. For details, see Secure access to cloud storage.
If you set the import method to "Files", Label Studio backend will only need LIST permissions and won't download any data from your buckets.
If you set the import method to "Tasks", Label Studio backend will require GET permissions to read JSON files and convert them to Label Studio tasks.
When your users access labeling, the backend will attempt to resolve URI (e.g., s3://) to URL (https://) links. URLs will be returned to the frontend and loaded by the user's browser. To load these URLs, the browser will require HEAD and GET permissions from your Cloud Storage. The HEAD request is made at the beginning and allows the browser to determine the size of the audio, video, or other files. The browser then makes a GET request to retrieve the file body.
Source storage functionality can be divided into two parts:
* Sync - when Label Studio scans your storage and imports tasks from it.
* URI resolving - when the Label Studio backend requests Cloud Storage to resolve URI links (e.g., s3://bucket/1.jpg) into HTTPS (https://aws.amazon.com/bucket/1.jpg). This way, user's browsers are able to load media.

!!! info
The "Treat every bucket object as a source file" option was renamed and reintroduced as the "Import method" dropdown.
Label Studio Source Storages feature an "Import method" dropdown. This setting enables two different methods of loading tasks into Label Studio.
When set to "Tasks", tasks in JSON, JSONL/NDJSON or Parquet format can be loaded directly from storage buckets into Label Studio. This approach is particularly helpful when dealing with complex tasks that involve multiple media sources.

You may put multiple tasks inside the same JSON file, but not mix task formats inside the same file.
{% details Example with bare tasks %}
task_01.json { "image": "s3://bucket/1.jpg", "text": "opossums are awesome" }
task_02.json { "image": "s3://bucket/2.jpg", "text": "cats are awesome" }
Or:
tasks.json [ { "image": "s3://bucket/1.jpg", "text": "opossums are awesome" }, { "image": "s3://bucket/2.jpg", "text": "cats are awesome" } ]
{% enddetails %}
{% details Example with tasks, annotations and predictions %}
task_with_predictions_and_annotations_01.json { "data": { "image": "s3://bucket/1.jpg", "text": "opossums are awesome" }, "annotations": [...], "predictions": [...] }
task_with_predictions_and_annotations_02.json { "data": { "image": "s3://bucket/2.jpg", "text": "cats are awesome" } "annotations": [...], "predictions": [...] }
Or:
tasks_with_predictions_and_annotations.json [ { "data": { "image": "s3://bucket/1.jpg", "text": "opossums are awesome" }, "annotations": [...], "predictions": [...] }, { "data": { "image": "s3://bucket/2.jpg", "text": "cats are awesome" } "annotations": [...], "predictions": [...] } ]
{% enddetails %}
{% details Example with JSONL %}
tasks.jsonl { "image": "s3://bucket/1.jpg", "text": "opossums are awesome" } { "image": "s3://bucket/2.jpg", "text": "cats are awesome" }
{% enddetails %}
In Label Studio Enterprise and Starter Cloud editions, Parquet files can also be used to import tasks in the same way as JSON and JSONL.
When set to "Files", Label Studio automatically lists files from the storage bucket and constructs tasks. This is only possible for simple labeling tasks that involve a single media source (such as an image, text, etc.).*

There are two secure mechanisms in which Label Studio fetches media data from cloud storage: via pre-signed URLS and via proxy. Which one you use depends on whether you have Use pre-signed URLs toggled on or off when setting up your source storage. Use pre-signed URLs is used by default. Proxy storage is enabled when Use pre-signed URLs is OFF.
{% details See more details %}
In this scenario, your browser receives an HTTP 303 redirect to a time-limited S3/GCS/Azure presigned URL. This is the default behavior.
The main benefit to using pre-signed URLs is if you want to ensure that your media files are isolated from the Label Studio network as much as possible.

The permissions required for this are already included in the cloud storage configuration documentation below.
When in proxy mode, the Label Studio backend fetches objects server-side and streams them directly to the browser.

This has multiple benefits, including:
To allow proxy storage, you need to ensure your permissions include the following:
{% details AWS S3 %}
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
}
]
}
{% enddetails %}
{% details Google Cloud Storage %}
storage.objects.get - Read object data and metadatastorage.objects.list - List objects in the bucket (if using prefix){% enddetails %}
{% details Azure Blob Storage %}
Add the Storage Blob Data Reader role, which includes:
- Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read
- Microsoft.Storage/storageAccounts/blobServices/containers/blobs/getTags/action
{% enddetails %}
!!! note Note for on-prem deployments
Large media files are streamed in sequential 8 MB chunks, which are split into different GET requests. This can result in frequent requests to the backend to get the next portion of data and uses additional resources.
You can configure this using the following environment variables:
* `RESOLVER_PROXY_MAX_RANGE_SIZE` - Defaults to 8 MB, and defines the largest chunk size returned per request.
* `RESOLVER_PROXY_TIMEOUT` - Defaults to 20 seconds, and defines the maximum time uWSGI workers spend on a single request.
{% enddetails %}
When annotators click Submit or Update while labeling tasks, Label Studio saves annotations in the Label Studio database.
If you configure target storage, annotations are sent to target storage after you click Sync for the configured target storage connection. The target storage receives a JSON-formatted export of each annotation. See Label Studio JSON format of annotated tasks for details about how exported tasks appear in target storage.
You can also delete annotations in target storage when they are deleted in Label Studio. See Set up target storage connection in the Label Studio UI for more details.
To use this type of storage, you must have PUT permission, and DELETE permission is optional.
Connect your Amazon S3 bucket to Label Studio to retrieve labeling tasks or store completed annotations.
For details about how Label Studio secures access to cloud storage, see Secure access to cloud storage.
Before you set up your S3 bucket or buckets with Label Studio, configure access and permissions. These steps assume that you're using the same AWS role to manage both source and target storage with Label Studio. If you only use S3 for source storage, Label Studio does not need PUT access to the bucket.
!!! note
A session token is only required in case of temporary security credentials. See the AWS Identity and Access Management documentation on Requesting temporary security credentials.
<your_bucket_name> with your bucket name:json { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::<your_bucket_name>", "arn:aws:s3:::<your_bucket_name>/*" ] } ] } !!! note
"s3:PutObject" is only needed for target storage connections, and "s3:DeleteObject" is only needed for target storage connections in Label Studio Enterprise where you want to allow deleted annotations in Label Studio to also be deleted in the target S3 bucket.
3. Set up cross-origin resource sharing (CORS) access to your bucket, using a policy that allows GET access from the same host name as your Label Studio deployment. See Configuring cross-origin resource sharing (CORS) in the Amazon S3 User Guide. Use or modify the following example:json [ { "AllowedHeaders": [ "*" ], "AllowedMethods": [ "GET" ], "AllowedOrigins": [ "*" ], "ExposeHeaders": [ "x-amz-server-side-encryption", "x-amz-request-id", "x-amz-id-2" ], "MaxAgeSeconds": 3000 } ]
After you configure access to your S3 bucket, do the following to set up Amazon S3 as a data source connection:
.* to collect all objects.us-east-1.After adding the storage, click Sync to collect tasks from the bucket, or make an API call to sync import storage.
After you configure access to your S3 bucket, do the following to set up Amazon S3 as a target storage connection:
us-east-1.After adding the storage, click Sync to collect tasks from the bucket, or make an API call to sync export storage
On April 7th 2025, new storage connections will require an update to the AWS principal in your IAM role policy.
If you set up your IAM role prior to April 7th, 2025 and you have already been using it with Label Studio, you must add the following to your principal list before you can set up new storage connection in Label Studio projects:
"arn:aws:iam::490065312183:role/label-studio-app-production"
For example:

(See step 3 below.)
Adding the new principal ensures you can create new connections. Keeping the old principal ensures that pre-existing storage connections can continue to load data.
Existing S3 IAM role-based-access storages added to Label Studio will continue to work as is without any changes necessary. This change is only required if you are setting up new connections.
On July 7th 2025, we will no longer support the legacy IAM user, and all policies should be updated to the new IAM role.
You can also create a storage connection using the Label Studio API.
- See Create new import storage then sync the import storage.
- See Create export storage and after annotating, sync the export storage.
To maximize security and data isolation behind a VPC, restrict access to the Label Studio backend and internal network users by setting IP restrictions for storage, allowing only trusted networks to perform task synchronization and generate pre-signed URLs. Additionally, establish a secure connection between storage and users' browsers by configuring a VPC private endpoint or limiting storage access to specific IPs or VPCs.
Read more about Source storage behind your VPC.
!!! warning
These example bucket policies explicitly deny access to any requests outside the allowed IP addresses. Even the user that entered the bucket policy can be denied access to the bucket if the user doesn't meet the conditions. Therefore, make sure to review the bucket policy carefully before saving it. If you get accidentally locked out, see How to regain access to an Amazon S3 bucket.
Helpful Resources:
- AWS Documentation: VPC Endpoints for Amazon S3
- AWS Documentation: How to Configure VPC Endpoints
Go to your S3 bucket and then Permissions > Bucket Policy in the AWS management console. Add the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyAccessUnlessFromSaaSIPsForListAndGet",
"Effect": "Deny",
"Principal": {
"AWS": "arn:aws:iam::490065312183:role/label-studio-app-production"
},
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::YOUR_BUCKET_NAME",
"arn:aws:s3:::YOUR_BUCKET_NAME/*"
],
"Condition": {
"NotIpAddress": {
"aws:SourceIp": [
//// IP ranges for app.humansignal.com from the documentation
"x.x.x.x/32",
"x.x.x.x/32",
"x.x.x.x/32"
]
}
}
},
//// Optional
{
"Sid": "DenyAccessUnlessFromVPNForGetObject",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*",
"Condition": {
"NotIpAddress": {
"aws:SourceIp": "YOUR_VPN_SUBNET/32"
}
}
}
]
}
Dynamically import tasks and export annotations to Google Cloud Storage (GCS) buckets in Label Studio. For details about how Label Studio secures access to cloud storage, see Secure access to cloud storage.
To connect your GCS bucket with Label Studio, set up the following:
- Enable programmatic access to your bucket. See Cloud Storage Client Libraries in the Google Cloud Storage documentation for how to set up access to your GCS bucket.
- Set up authentication to your bucket. Your account must have the Service Account Token Creator and Storage Object Viewer roles and storage.buckets.get access permission. See Setting up authentication and IAM permissions for Cloud Storage in the Google Cloud Storage documentation.
- If you're using a service account to authorize access to the Google Cloud Platform, make sure to activate it. See gcloud auth activate-service-account in the Google Cloud SDK: Command Line Interface documentation.
- Set up cross-origin resource sharing (CORS) access to your bucket, using a policy that allows GET access from the same host name as your Label Studio deployment. See Configuring cross-origin resource sharing (CORS) in the Google Cloud User Guide. Use or modify the following example:shell echo '[ { "origin": ["*"], "method": ["GET"], "responseHeader": ["Content-Type","Access-Control-Allow-Origin"], "maxAgeSeconds": 3600 } ]' > cors-config.json
Replace YOUR_BUCKET_NAME with your actual bucket name in the following command to update CORS for your bucket:shell gsutil cors set cors-config.json gs://YOUR_BUCKET_NAME
In the Label Studio UI, do the following to set up the connection:
.* to collect all objects.In the Google Application Credentials field, add a JSON file with the GCS credentials you created to manage authentication for your bucket.
On-prem users: Alternatively, you can use the GOOGLE_APPLICATION_CREDENTIALS environment variable and/or set up Application Default Credentials, so that users do not need to configure credentials manually. See Application Default Credentials for enhanced security below.
After adding the storage, click Sync to collect tasks from the bucket, or make an API call to sync import storage.
If you use Label Studio on-premises with Google Cloud Storage, you can set up Application Default Credentials to provide cloud storage authentication globally for all projects, so users do not need to configure credentials manually.
The recommended way to to do this is by using the GOOGLE_APPLICATION_CREDENTIALS environment variable. For example:
export GOOGLE_APPLICATION_CREDENTIALS=json-file-with-GCP-creds-23441-8f8sd99vsd115a.json
```
<div class="enterprise-only">
### Google Cloud Storage with Workload Identity Federation (WIF)
You can also use Workload Identity Federation (WIF) pools with Google Cloud Storage.
Unlike with application credentials, WIF allows you to use temporary credentials. Each time you make a request to GCS, Label Studio connects to your identity pool to request temporary credentials.
For more information about WIF, see [Google Cloud - Workload Identity Federation](https://cloud.google.com/iam/docs/workload-identity-federation).
#### Service account permissions
Before you begin, you will need a service account that has the following permissions
- Bucket: **Storage Admin** (`roles/storage.admin`)
- Project: **Service Account Token Creator** (`roles/iam.serviceAccountTokenCreator`)
- Project: **Storage Object Viewer** (`roles/storage.viewer`)
See [Create service accounts](https://cloud.google.com/iam/docs/service-accounts-create?hl=en) in the Google Cloud documentation.
#### Create a Workload Identity Pool
There are several methods you can use to create a WIF pool.
<details>
<summary>Using Terraform</summary>
<br>
An example script is provided below. Ensure all required variables are set:
* GCP project variables:
* `var.gcp_project_name`
* `var.gcp_region`
* SaaS provided by HumanSignal:
* `var.aws_account_id` = `490065312183`
* `var.aws_role_name` = `label-studio-app-production`
Then run:
terraform init
terraform plan
terraform apply
```
Once applied, you will have a functioning Workload Identity Pool that trusts the Label Studio AWS IAM Role.
## Variables
/* AWS variables are so that AWS-hosted Label Studio resources can reach out to request credentials */
variable "gcp_project_name" {
type = string
description = "GCP Project name"
}
variable "gcp_region" {
type = string
description = "GCP Region"
}
variable "label_studio_gcp_sa_name" {
type = string
description = "GCP Label Studio Service Account Name"
}
variable "aws_account_id" {
type = string
description = "AWS Project ID"
}
variable "aws_role_name" {
type = string
description = "AWS Role name"
}
variable "external_ids" {
type = list(string)
default = []
description = "List of external ids"
}
## Outputs
output "GCP_WORKLOAD_ID" {
value = google_iam_workload_identity_pool_provider.label-studio-provider-jwt.workload_identity_pool_id
}
output "GCP_WORKLOAD_PROVIDER" {
value = google_iam_workload_identity_pool_provider.label-studio-provider-jwt.workload_identity_pool_provider_id
}
## Main
provider "google" {
project = var.gcp_project_name
region = var.gcp_region
}
resource "random_id" "random" {
byte_length = 4
}
locals {
aws_assumed_role = "arn:aws:sts::${var.aws_account_id}:assumed-role/${var.aws_role_name}"
external_id_condition = (
length(var.external_ids) > 0
? format("(attribute.aws_role == \"%s\") && (attribute.external_id in [%s])",
local.aws_assumed_role,
join(", ", formatlist("\"%s\"", var.external_ids))
)
: format("(attribute.aws_role == \"%s\")", local.aws_assumed_role)
)
}
resource "google_iam_workload_identity_pool" "label-studio-pool" {
workload_identity_pool_id = "label-studio-pool-${random_id.random.hex}"
project = var.gcp_project_name
}
resource "google_iam_workload_identity_pool_provider" "label-studio-provider-jwt" {
workload_identity_pool_id = google_iam_workload_identity_pool.label-studio-pool.workload_identity_pool_id
workload_identity_pool_provider_id = "label-studio-jwt-${random_id.random.hex}"
attribute_condition = local.external_id_condition
attribute_mapping = {
"google.subject" = "assertion.arn"
"attribute.aws_account" = "assertion.account"
"attribute.aws_role" = "assertion.arn.contains('assumed-role') ? assertion.arn.extract('{account_arn}assumed-role/') + 'assumed-role/' + assertion.arn.extract('assumed-role/{role_name}/') : assertion.arn"
"attribute.external_id" = "assertion.external_id"
}
aws {
account_id = var.aws_account_id
}
}
data "google_service_account" "existing_sa" {
account_id = var.label_studio_gcp_sa_name
}
resource "google_service_account_iam_binding" "label-studio-sa-oidc" {
service_account_id = data.google_service_account.existing_sa.name
role = "roles/iam.workloadIdentityUser"
members = [
"principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.label-studio-pool.name}/attribute.aws_role/${local.aws_assumed_role}"
]
}
Replace the bracketed variables ([PROJECT_ID], [POOL_ID], [PROVIDER_ID], etc.) with your own values.
Make sure you escape quotes or use single quotes when necessary.
shell gcloud iam workload-identity-pools create [POOL_ID] \ --project=[PROJECT_ID] \ --location="global" \ --display-name="[POOL_DISPLAY_NAME]"
Where:
[POOL_ID] is the ID that you want to assign to your WIF pool (for example, label-studio-pool-abc123). Note this because you will need to reuse it later.[PROJECT_ID] is the ID of your Google Cloud project.[POOL_DISPLAY_NAME] is a human-readable name for your pool (optional, but recommended).Create the provider for AWS.
This allows AWS principals that have the correct external ID and AWS role configured to impersonate the Google Cloud service account. This is necessary because the Label Studio resources making the request are hosted in AWS.
```shell
gcloud iam workload-identity-pools providers create-aws [PROVIDER_ID] \
--workload-identity-pool="[POOL_ID]" \
--account-id="490065312183" \
--attribute-condition="attribute.aws_role=="arn:aws:sts::490065312183:assumed-role/label-studio-app-production"" \
--attribute-mapping="google.subject=assertion.arn,attribute.aws_account=assertion.account,attribute.aws_role=assertion.arn,attribute.external_id=assertion.external_id"
```
Where:
[PROVIDER_ID] is a provider ID (for example, label-studio-app-production).[POOL_ID]: The pool ID you provided in step 1.Grant the service account that you created earlier the iam.workloadIdentityUser role.
gcloud iam service-accounts add-iam-policy-binding [SERVICE_ACCOUNT_EMAIL] \
--role="roles/iam.workloadIdentityUser" \
--member="principalSet://iam.googleapis.com/projects/[PROJECT_NUMBER]/locations/global/workloadIdentityPools/[POOL_ID]/attribute.aws_role/arn:aws:sts::490065312183:assumed-role/label-studio-app-production"
Where:
[SERVICE_ACCOUNT_EMAIL] is the email associated with you GCS service account (for example, my-service-account@[PROJECT_ID].iam.gserviceaccount.com).[PROJECT_NUMBER]: Your Google project number. This is different than the project ID. You can find the project number with the following command:`gcloud projects describe $PROJECT_ID --format="value(projectNumber)"`
[POOL_ID]: The pool ID you provided in step 1.Before setting up your connection in Label Studio, note what you provided for the following variables (you will be asked to provide them):
[POOL_ID][PROVIDER_ID][SERVICE_ACCOUNT_EMAIL][PROJECT_NUMBER][PROJECT_ID]Before you begin, ensure you are in the correct project:

From the Google Cloud Console, navigate to IAM & Admin > Workload Identity Pools.
Click Get Started to enable the APIs.
Under Create an identity pool, complete the following fields:
label-studio-pool-abc123). Note this ID because you will need it again later.Under Add a provider pool, complete the following fields:
Label Studio App Production (you can use a different display name, but you need to ensure that the corresponding provider ID is still label-studio-app-production)label-studio-app-production.490065312183.Under Configure provider attributes, enter the following:
`attribute.aws_role=="arn:aws:sts::490065312183:assumed-role/label-studio-app-production"`
- `google.subject = assertion.arn`
- `attribute.aws_role = assertion.arn.contains('assumed-role') ? assertion.arn.extract('{account_arn}assumed-role/') + 'assumed-role/' + assertion.arn.extract('assumed-role/{role_name}/') : assertion.arn` (this might be filled in by default)
- `attribute.aws_account = assertion.account`
- `attribute.external_id = assertion.external_id`
Click Save.
Go to IAM & Admin > Service Accounts and find the service account you want to allow AWS (Label Studio) to impersonate. See Service account permissions above.
From the Principals with access tab, click Grant Access.

In the New principals field, add the following:
principalSet://iam.googleapis.com/projects/[PROJECT_NUMBER]/locations/global/workloadIdentityPools/[POOL_ID]/attribute.aws_role/arn:aws:sts::490065312183:assumed-role/label-studio-app-production
Where:
[PROJECT_NUMBER] - Replace this with your Google project number. This is different than the project ID. To find the project number, go to IAM & Admin > Settings.[POOL_ID] - Replace this with the pool ID (the Name you entered in step 3 above, e.g. label-studio-pool-abc123).Under Assign Roles, use the search field in the Role drop-down menu to find the Workload Identity User role.

Click Save
Before setting up your connection in Label Studio, note the following (you will be asked to provide them)
label-studio-app-production)From your Label Studio project, go to Settings > Storage to add your source or target storage.
Select the GCS (WIF auth) storage type and then complete the following fields:
After adding the storage, click Sync to collect tasks from the bucket, or make an API call to sync import storage.
Google Cloud Storage offers bucket IP filtering as a powerful security mechanism to restrict access to your data based on source IP addresses. This feature helps prevent unauthorized access and provides fine-grained control over who can interact with your storage buckets.
Read more about Source storage behind your VPC.
Common Use Cases:
- Restrict bucket access to only your organization's IP ranges
- Allow access only from specific VPC networks in your infrastructure
- Secure sensitive data by limiting access to known IP addresses
- Control access for third-party integrations by whitelisting their IPs
json { "mode": "Enabled", "publicNetworkSource": { "allowedIpCidrRanges": [ "xxx.xxx.xxx.xxx", // Your first IP address "xxx.xxx.xxx.xxx", // Your second IP address "xxx.xxx.xxx.xxx/xx" // Your IP range in CIDR notation ] } } For VPC network sources:json { "mode": "Enabled", "vpcNetworkSources": [ { "network": "projects/PROJECT_ID/global/networks/NETWORK_NAME", "allowedIpCidrRanges": [ RANGE_CIDR ] } ] }
Apply the IP filtering rules to your bucket using the following command:bash gcloud alpha storage buckets update gs://BUCKET_NAME --ip-filter-file=IP_FILTER_CONFIG_FILE
To remove IP filtering rules when no longer needed:bash gcloud alpha storage buckets update gs://BUCKET_NAME --clear-ip-filter
Read more about GCS IP filtering
Connect your Microsoft Azure Blob storage container with Label Studio. For details about how Label Studio secures access to cloud storage, see Secure access to cloud storage.
You must set two environment variables in Label Studio to connect to Azure Blob storage:
- AZURE_BLOB_ACCOUNT_NAME to specify the name of the storage account.
- AZURE_BLOB_ACCOUNT_KEY to specify the secret key for the storage account.
Configure the specific Azure Blob container that you want Label Studio to use in the UI. In most cases involving CORS issues, the GET permission (*/GET/*/Access-Control-Allow-Origin/3600) is necessary within the Resource Sharing tab:

In the Label Studio UI, do the following to set up the connection:
.* to collect all objects.AZURE_BLOB_ACCOUNT_NAME.AZURE_BLOB_ACCOUNT_KEY.azure-blob://container-name/image.jpg. Set this option to "Tasks" if you have multiple JSON/JSONL/Parquet files in the bucket with tasks.After adding the storage, click Sync to collect tasks from the container, or make an API call to sync import storage.
You can also create a storage connection using the Label Studio API.
- See Create new import storage then sync the import storage.
- See Create export storage and after annotating, sync the export storage.
You can also store your tasks and annotations in a Redis database. You must store the tasks and annotations in different databases. You might want to use a Redis database if you find that relying on a file-based cloud storage connection is slow for your datasets.
Currently, this configuration is only supported if you host the Redis database in the default mode, with the default IP address.
Label Studio does not manage the Redis database for you. See the Redis Quick Start for details about hosting and managing your own Redis database. Because Redis is an in-memory database, data saved in Redis does not persist. To make sure you don't lose data, set up Redis persistence or use another method to persist the data, such as using Redis in the cloud with Microsoft Azure or Amazon AWS.
Label Studio only supports string values for Redis databases, which should represent Label Studio tasks in JSON format.
For example:
'ls-task-1': '{"image": "http://example.com/1.jpg"}'
'ls-task-2': '{"image": "http://example.com/2.jpg"}'
...
> redis-cli -n 1
127.0.0.1:6379[1]> SET ls-task-1 '{"image": "http://example.com/1.jpg"}'
OK
127.0.0.1:6379[1]> GET ls-task-1
"{\"image\": \"http://example.com/1.jpg\"}"
127.0.0.1:6379[1]> TYPE ls-task-1
string
In the Label Studio UI, do the following to set up the connection:
localhost..* to collect all objects.After adding the storage, click Sync to collect tasks from the database, or make an API call to sync import storage.
You can also create a storage connection using the Label Studio API.
- See Create new import storage then sync the import storage.
- See Create export storage and after annotating, sync the export storage.
If you have local files that you want to add to Label Studio from a specific directory, you can set up a specific local directory on the machine where LS is running as source or target storage. Label Studio steps through the directory recursively to read tasks.
Add these variables to your environment setup:
- LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
- LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/home/user (or LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=C:\\data\\media for Windows).
Without these settings, Local storage and URLs in tasks that point to local files won't work. Keep in mind that serving data from the local file system can be a security risk. See Set environment variables for more about using environment variables.
In the Label Studio UI, do the following to set up the connection:

Specify an Absolute local path to the directory with your files. The local path must be an absolute path and include the LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT value.
For example, if LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/home/user, then your local path must be /home/user/dataset1. For more about that environment variable, see Run Label Studio on Docker and use local storage.
!!! note
If you are using Windows, ensure that you use backslashes when entering your Absolute local path.
.* to collect all objects.After adding the storage, click Sync to collect tasks from the bucket, or make an API call to sync import storage.
In cases where your tasks have multiple or complex input sources, such as multiple object tags in the labeling config or a HyperText tag with custom data values, you must prepare tasks manually.
In those cases, you have to repeat all stages above to create local storage, but skip optional stages. Your Absolute local path have to lead to directory with files (not tasks) that you want to include by task, it also can contain other directories or files, you will specified them inside task.
Differences with instruction above:
- 7. File Filter Regex - stay empty (because you will specify it inside tasks)
- 8. Import method - select "Tasks" (because you will specify file references inside your JSON task definitions)
Your window will look like this:
Click Add Storage, but not use synchronization (don't touch button Sync Storage) after the storage creation, to avoid automatic task creation from storage files.
When referencing your files within a task, adhere to the following guidelines:
* "Absolute local path" must be a sub-directory of LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT (see 6).
* All file paths must begin with /data/local-files/?d=.
* In the following example, the first directory is dataset1. For instance, if you have mixed data types in tasks, including
- audio files 1.wav, 2.wav within an audio folder and
- image files 1.jpg, 2.jpg within an images folder,
construct the paths as follows:
[{
"id": 1,
"data": {
"audio": "/data/local-files/?d=dataset1/audio/1.wav",
"image": "/data/local-files/?d=dataset1/images/1.jpg"
}
},
{
"id": 2,
"data": {
"audio": "/data/local-files/?d=dataset1/audio/2.wav",
"image": "/data/local-files/?d=dataset1/images/2.jpg"
}
}]
There are several ways to add your custom task: API, web interface, another storage. The simplest one is to use Import button on the Data Manager page. Drag and drop your json file inside the window, then click the blue Import button .

This video tutorial demonstrates how to setup Local Storage from scratch and import json tasks in a complex task format that are linked to the Local Storage files.
You can also create a storage connection using the Label Studio API.
- See Create new import storage then sync the import storage.
- See Create export storage and after annotating, sync the export storage.
If you're using Label Studio in Docker, you need to mount the local directory that you want to access as a volume when you start the Docker container. See Run Label Studio on Docker and use local storage.