Skip to main content
Evaluations let you create test scenarios to verify your agents behave correctly. Think of them as automated tests for your AI agents.

Creating an Evaluation

Navigate to AnalyticsEvaluationsCreate Eval

Step 1: Name Your Eval

Give it a descriptive name like “Lead Qualification Flow” or “Product FAQ Test”.

Step 2: Define Test Messages

Create a mock conversation with user messages:
User: "What products do you sell?"
User: "How much does the steel beam cost?"
User: "I'd like to place an order"

Step 3: Set Evaluation Criteria

Choose how to judge the agent’s response:
TypeUse Case
Exact MatchResponse must contain specific text
RegexResponse matches a pattern
AI JudgeAI evaluates if response meets criteria

Step 4: Run the Eval

Select a target:
  • Assistant: Test a single agent
  • Squad: Test a workflow with multiple agents

Evaluation Types

Exact Match

Check if response contains specific content:
Criteria: "steel beam"
Pass if: Agent response contains "steel beam"

Pattern Match

Pattern matching lets you check if the agent’s response includes text in a specific format, such as a dollar amount.

AI Judge

Use AI to evaluate complex criteria:
Criteria: "Agent should politely decline to discuss 
competitor pricing and redirect to our products"

Running Evaluations

Manual Run

  1. Open an eval
  2. Select target agent or workflow
  3. Click Run
  4. View results

Results

StatusMeaning
PassAll criteria met
FailOne or more criteria failed
ErrorTechnical issue during test

Best Practices

Test Key Scenarios

Create evals for:
  • Common customer questions
  • Lead qualification flows
  • Edge cases and error handling
  • Tool invocations

Run After Changes

Re-run evals after updating:
  • System prompts
  • Knowledge base documents
  • Tool configurations

Start Simple

Begin with a few critical test cases:
1. Basic greeting and response
2. Product information query
3. Lead capture flow
4. Transfer/escalation request

Viewing Results

Eval Detail Page

See for each run:
  • Pass/fail status
  • Full conversation transcript
  • Which criteria passed or failed
  • Timestamp and duration

Stats Overview

Track across all evals:
  • Total pass rate
  • Recent run history
  • Failing tests that need attention

Troubleshooting

  • Review the full transcript to see what the agent said
  • Check if criteria are too strict
  • Verify agent has necessary knowledge/tools
  • Agent may be stuck in a loop
  • Check tool configurations
  • Simplify the test scenario

Next Steps

Voice Agents

Configure agents

Testing

Manual testing guide

Call Logs

Review real conversations

Workflow Squads

Test workflows