Loading video player...
In this deep-dive session, Google Cloud and Arize AI break down what it really takes to build, evaluate, and operate AI agents in production. From evaluation frameworks and context engineering to observability, governance, and real-world deployment—this talk covers the full lifecycle of modern AI agents. If you’re building agents (or planning to), this is packed with practical insights you won’t want to miss. ⏱️ Timestamps: 00:00 – Welcome & overview of the event 02:01 – Agenda: building → evaluating → deploying agents 03:31 – Evaluation + optimization loop explained 06:52 – Eval framework: build, test, analyze 08:06 – Generating test data with simulations 09:50 – LLM-as-a-judge & evaluation metrics 13:08 – Context engineering fundamentals 15:21 – From demo agents → production challenges 18:51 – Model improvements & tradeoffs 19:55 – Tool hardening for reliability 22:35 – Context compaction (reduce tokens & cost) 25:05 – Multi-agent design & functional isolation 29:26 – Handling failures with circuit breakers 32:26 – Arize: observability & continuous improvement 45:29 – Demo: evaluating & improving agents 01:09:31 – Running agents in production (Google Cloud) 01:22:09 – Why governance & monitoring matter 01:31:03 – Tracing, debugging & observability deep dive 01:43:56 – Final takeaways & closing #AI #LLM #Agents #MachineLearning #MLOps #GenAI #GoogleCloud #ArizeAI