Configuration

Evaluation

Why evaluate

An AI that works on easy calls can still fail where it matters. Evaluation finds those failures before customers do.

Most test cases should be hard: ambiguous requests, callers pushing for more than they're owed, and unexpected turns mid-call.

Track resolution, escalation, and re-contact together. Any one alone can be gamed; together they tell the truth.

Treat the evaluation suite like a test suite — run it whenever you change knowledge, actions, or approved patterns.