Loading video player...
Description - This episode explores new developments in AI testing infrastructure. We look at OpenAI’s security agent that found hundreds of vulnerabilities, Microsoft’s Agent 365 governance platform for enterprise AI agents, Revefi’s observability tools for multi-model systems, and a new human benchmarking platform for evaluating image generation quality. - These signals highlight how testing AI systems now requires automated analysis, observability, and structured human evaluation frameworks. Research & References https://openai.com/index/codex-security-now-in-research-preview/ https://thehackernews.com/2026/03/openai-codex-security-scanned-12.html https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/new-and-improved-agent-evaluations-computer-use-and-advanced-maker-training/ https://www.windowscentral.com/artificial-intelligence/microsoft-copilot/microsoft-365-copilot-wave-3-announcement https://owasp.org/www-project-top-10-for-large-language-model-applications/ https://www.honeycomb.io/resources/getting-started/what-is-llm-observability Companion newsletter and episode notes: https://daily.testingeducation.org/