How to Evaluate AI Agents: Observability, Rubrics vs. Ground Truth, Regression Testing | DailyDevLists