Configuration

Evaluation


Why evaluate

An AI that works on easy calls can still fail where it matters. Evaluation finds those failures before customers do.

Build an adversarial suite

Most test cases should be hard: ambiguous requests, callers pushing for more than they're owed, and unexpected turns mid-call.

What to measure

Track resolution, escalation, and re-contact together. Any one alone can be gamed; together they tell the truth.

Re-run on every change

Treat the evaluation suite like a test suite — run it whenever you change knowledge, actions, or approved patterns.