Loading video player...
This is the final video in the AI Backend Workflows series. You will learn how to add observability and automated quality evaluations to your AI application in Python — so you always know when something is going wrong before your users tell you. We cover: → Langfuse tracing — auto-log every LLM call with one decorator → Golden dataset — build a Q&A test suite for your system → DeepEval — run automated quality metrics before every deploy → pytest + CI/CD — block deploys when quality drops → Cost monitoring — alert when token spend exceeds your budget By the end you will have automatic tracing, a passing eval suite, and CI/CD integration that treats AI quality like code quality. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ WHAT YOU WILL LEARN ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ✔ Why AI systems degrade silently (and what to do about it) ✔ What observability means in an AI context ✔ What a trace and a span are ✔ What a golden dataset is and how to build one ✔ What CI/CD means and how to integrate evals into it ✔ What P95 latency means and why it matters ✔ What pytest parametrize does ✔ What Prometheus and Grafana are ✔ Common errors and exactly how to fix them ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ NEW WORDS EXPLAINED IN THIS VIDEO ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Observability, Trace, Span, Golden dataset, Evals (evaluations), CI/CD, Baseline score, AnswerRelevancyMetric, GEval, Threshold, P95 latency, Tail latency, Regression, pytest parametrize, Prometheus, Grafana, @observe() decorator, langfuse_context ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ PREREQUISITES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ → Video 00 — Backend AI workflows — https://youtu.be/atrWZPghDhg?si=Gn5Bqkd_Xse81rEn → Video 01 — Basic LLM API Call — https://youtu.be/CSqOC2aKTx4?si=C0J2C0F5t2WQ2wke → Video 02 — Streaming— https://youtu.be/RtcjFueBB3M?si=sP1sqaX1HPc6y8OH → Video 03 — Data Ingestion - https://youtu.be/QJLRJC0KwzQ → Video 04 — RAG - https://youtu.be/n8teHOv1OWA → Video 04 - Caching - https://youtu.be/eydzBBOvPj0 → Video 06 — Tool Calling (required — agents build directly on this)-https://youtu.be/uE3aD9U-sKY → Video 07 — Agentic Loop https://youtu.be/yohIVCINx7A → Video 08 — Async Job Queue https://youtu.be/YN3HASb7kmU → Video 09 — Guardrails https://youtu.be/aHAA01VAzxA ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ TIMESTAMPS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00 — Why AI degrades silently — the core problem 1:30 — The observability + evals workflow 3:30 — Key concepts — tracing, golden dataset, cost monitoring 5:30 — Project setup — Langfuse free account 6:30 — traced_llm.py — one decorator, full tracing 8:30 — golden_dataset.py — building your test suite 10:00 — run_evals.py — measuring quality with DeepEval 12:00 — test_evals.py — blocking deploys on regression 14:00 — main.py — production FastAPI with tracing 15:30 — Common errors and fixes 17:30 — Key takeaways + series wrap-up ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ TOOLS USED ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Langfuse — https://langfuse.com DeepEval — https://docs.confident-ai.com FastAPI — https://fastapi.tiangolo.com Anthropic — https://console.anthropic.com ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ COMPLETE SERIES — AI Backend Workflows ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 01 — Basic LLM API Call 02 — Streaming 03 — Data Ingestion 04 — RAG 05 — Caching 06 — Tool Calling 07 — Agentic Loop 08 — Async Job Queue 09 — Guardrails 10 — Observability + Evals ← you are here (final video) #Python #LLMObservability #Langfuse #DeepEval #AIBackend #Evals #CI #MachineLearning #FastAPI #Anthropic #Tutorial