Chapter 18¶
Metacognition: Evaluation and Monitoring in Agents¶
Interactive Graph (beta)¶
Toggle graph
Metacognition monitors thinking. In brains, the ACC and PFC detect errors, evaluate outcomes, and refine strategies. Agents need continuous evaluation and monitoring to avoid drift and ensure reliability.
Core Monitoring Mechanisms¶
- Performance tracking (accuracy, latency, resource use).
- A/B testing (strategy comparison).
- Compliance & safety audits (policy checks).
- Drift detection (environment sensitivity).
- Anomaly detection (unexpected behaviors/tool calls).
- Learning progress assessment (skill growth).
Evaluation in Practice¶
Accuracy, latency, token/cost tracking, helpfulness (LLM‑as‑a‑judge), and trajectory analysis (reasoning steps, tool calls, decisions).
Engineering Principle¶
- Unit reflection (tests), evalsets (scenarios), and dashboards/logs for systematic monitoring and auditing.
Conclusion¶
Evaluation transforms agents from black boxes into transparent, auditable systems that can adapt and improve over time.