--- title: Multi-Turn Chat Evaluation type: templates category: LLM Evaluations cat: llm-evaluations order: 972 meta_title: Multi-Turn Chat Evaluation Template meta_description: Use the SDK to create a dynamic template for evaluating multi-turn chats. date: 2025-01-21 10:49:57 --- This template uses the example available here: [Multi-turn Chat Labeling: Evaluating Virtual Assistant Conversations](https://github.com/HumanSignal/label-studio-examples/blob/main/multi-turn-chat/Readme.md) You can use this example to evaluate multi-turn chat conversations in Label Studio, identifying areas to enhance your virtual assistant’s performance and user experience. For this example, you will need the following: - Label Studio instance - Label Studio SDK (`pip install label-studio-sdk`) - Python 3.8+ with pandas ## Labeling configuration In this example, the labeling configuration is dynamically generated. This is necessary because each chat has a different number of turns (questions and responses). To build your own template XML, you will need to follow the steps outlined in the following notebook: [**Evaluating Virtual Assistant Conversations.ipynb**](https://github.com/HumanSignal/label-studio-examples/blob/main/multi-turn-chat/Evaluating%20Virtual%20Assistant%20Conversations.ipynb) However, here is an example of the labeling configuration for a 5-turn chat: ```xml
``` ## About the labeling configuration #### Paragraphs ```xml ``` This displays the entire conversation in one column under “Full Conversation” using a Paragraphs tag. It shows each message (with role and content) as a dialogue. On the other column, it organizes annotation questions by turn. Each “Turn” is inside a collapsible `` component and has its own `` tag. For example: ```xml ``` This lets you see only the subset of the conversation relevant to that turn. #### Choices For each turn, there are multiple blocks, each focusing on different questions: 1. User’s intent in this turn (multiple choice). 2. Whether the assistant’s response addresses that intent (single choice). 3. Whether the assistant’s response is accurate/helpful (single choice). 4. The implied “action” of the assistant’s response (multiple choice). The `toName` attributes (for instance, `toName="turn1_prg"`) tie each set of choices to that turn’s Paragraphs object, so each question is specifically linked to the text of that turn. ## Related tags - [Paragraphs](/tags/paragraphs.html) - [Choices](/tags/choices.html)