--- title: RAG Retrieval type: templates category: LLM Fine-tuning cat: llm-fine-tuning order: 906 is_new: t meta_title: Create a ranked dataset for building a RAG system for LLMs with Label Studio meta_description: Create a ranked dataset for building a RAG system for LLMs with Label Studio for your machine learning and data science projects. --- This template provides you with a workflow to rank the quality of a large language model (LLM) responses. Using this template will give you the ability to compare the quality of the responses from different LLMs,and rank the dynamic set of items with a handy drag-and-drop interface. This enables the following use cases: 1. Categorize the LLM responses by different types: relevant, irrelevant, biased, offensive, etc. 2. Compare and rank the quality of the responses from different models. 3. Rank contextual items for retrieval-augmented generation based chat bots and in-context learning. 4. Build [the preference model for RLHF](https://github.com/HumanSignal/RLHF) 5. Evaluate results of semantic search 6. [LLM routing](https://betterprogramming.pub/unifying-llm-powered-qa-techniques-with-routing-abstractions-438e2499a0d0) Looking for a model to get started with the fine-tuning process? Check out [our guide on the Label Studio Blog](https://labelstud.io/blog/five-large-language-models-you-can-fine-tune-today/). ## How to create the dataset Collect a prompt and a list of items you want to display in each task in the following JSON format: ```json { "prompt": "What caused the ancient library of Alexandria to be destroyed?", "items": [ { "id": "llm_1", "title": "LLM 1", "body": "Wars led to library's ruin." }, { "id": "llm_2", "title": "LLM 2", "body": "Library's end through various wars." }, { "id": "llm_3", "title": "LLM 3", "body": "Ruin resulted from library wars." } ] } ``` Collect dataset examples and store them in `dataset.json` file. ## How to configure the labeling interface The `LLM Ranker` template includes the following labeling interface in XML format: ```xml
``` The configuration includes the following elements: - `` - the tag that instructs to display the prompt. The `value` attribute should be set to the name of the prompt element, i.e. `prompt` in this case. - `` - the tag that instructs to display the list of items. The `value` attribute should be set to the name of the list element (in this case `items`). - `` - the tag that instructs to ranker the items in the list. The `toName` attribute should be set to the name of the list element. - `` - the tag that instructs to create a bucket for the ranked items. Each bucket represents the high-level category of items to be ranked inside this category. The `name` attribute should be set to the name of the bucket. Items can be styled in Style tag by using `.htx-ranker-item` class. ## Starting your labeling project !!! info Tip Need a hand getting started with Label Studio? Check out our [Zero to One Tutorial](https://labelstud.io/blog/zero-to-one-getting-started-with-label-studio/). 1. Create new project in Label Studio 2. Go to **Settings > Labeling Interface > Browse Templates > Generative AI > LLM Ranker**. 3. Save the project Alternatively, you can create project by using our Python SDK: ```python import label_studio_sdk ls = label_studio_sdk.Client('YOUR_LABEL_STUDIO_URL', 'YOUR_API_KEY') project = ls.create_project(title='LLM Ranker', label_config='...') ``` ## Import the dataset To import your dataset, in the project settings go to `Import` and upload the dataset file `dataset.json`. Using the Python SDK, import the dataset with input prompts into Label Studio using the `PROJECT_ID` of the project you've just created. Run the following code: ```python from label_studio_sdk import Client ls = Client(url='', api_key='') project = ls.get_project(id=PROJECT_ID) project.import_tasks('dataset.json') ``` If you want to create prelabeled data (for example, ranked order of the items produced by LLM), you can import the dataset with pre-annotations: ```python project.import_tasks([{ "data": {"prompt": "...", "items": [...]}, "predictions": [{ "type": "ranker", "value": { "ranker": { "_": [ "llm_2", "llm_1" ], "biased_results": ["llm_3"], "relevant_results": [] } }, "to_name": "prompt", "from_name": "rank" }] }]) ``` Under `"value"` group, you can specify different bucket names. Note `"_"` used as a special key that represents the original, non-categorized list. ## Export the dataset Labeling results can be exported in JSON format. To export the dataset, go to `Export` in the project settings and download the file. Using python SDK you can export the dataset with annotations from Label Studio ```python annotations = project.export_tasks(format='JSON') ``` The output of annotations in `"value"` is expected to contain the following structure: ```json "value": { "ranker": { "_": [ "llm_2", "llm_1" ], "biased_results": ["llm_3"], "relevant_results": [] } } ``` where: - `"_"` is a special key that represents the original, non-categorized list (same as in the import preannotations example above). - `"biased_results"` and `"relevant_results"` are the names of the buckets defined in the labeling interface. ## Related tags - [Ranker](/tags/ranker.html) - [List](/tags/list.html) - [Text](/tags/text.html)