Creating an Evaluation
Navigate to Analytics → Evaluations → Create EvalStep 1: Name Your Eval
Give it a descriptive name like “Lead Qualification Flow” or “Product FAQ Test”.Step 2: Define Test Messages
Create a mock conversation with user messages:Step 3: Set Evaluation Criteria
Choose how to judge the agent’s response:| Type | Use Case |
|---|---|
| Exact Match | Response must contain specific text |
| Regex | Response matches a pattern |
| AI Judge | AI evaluates if response meets criteria |
Step 4: Run the Eval
Select a target:- Assistant: Test a single agent
- Squad: Test a workflow with multiple agents
Evaluation Types
Exact Match
Check if response contains specific content:Pattern Match
Pattern matching lets you check if the agent’s response includes text in a specific format, such as a dollar amount.AI Judge
Use AI to evaluate complex criteria:Running Evaluations
Manual Run
- Open an eval
- Select target agent or workflow
- Click Run
- View results
Results
| Status | Meaning |
|---|---|
| Pass | All criteria met |
| Fail | One or more criteria failed |
| Error | Technical issue during test |
Best Practices
Test Key Scenarios
Create evals for:- Common customer questions
- Lead qualification flows
- Edge cases and error handling
- Tool invocations
Run After Changes
Re-run evals after updating:- System prompts
- Knowledge base documents
- Tool configurations
Start Simple
Begin with a few critical test cases:Viewing Results
Eval Detail Page
See for each run:- Pass/fail status
- Full conversation transcript
- Which criteria passed or failed
- Timestamp and duration
Stats Overview
Track across all evals:- Total pass rate
- Recent run history
- Failing tests that need attention
Troubleshooting
Eval keeps failing
Eval keeps failing
- Review the full transcript to see what the agent said
- Check if criteria are too strict
- Verify agent has necessary knowledge/tools
Timeout errors
Timeout errors
- Agent may be stuck in a loop
- Check tool configurations
- Simplify the test scenario
Next Steps
Voice Agents
Configure agents
Testing
Manual testing guide
Call Logs
Review real conversations
Workflow Squads
Test workflows