Part 10 AI Observability in Python — Trace Every LLM Call and Catch Quality Drops Early | DailyDevLists

Loading video player...

Part 10 AI Observability in Python — Trace Every LLM Call and Catch Quality Drops Early

cn2tech

47 days ago

15:20

AI Evaluation & Monitoring

Rank #1

Description

This is the final video in the AI Backend Workflows series. You will learn how to add observability and automated quality evaluations to your AI application in Python — so you always know when something is going wrong before your users tell you. We cover: → Langfuse tracing — auto-log every LLM call with one decorator → Golden dataset — build a Q&A test suite for your system → DeepEval — run automated quality metrics before every deploy → pytest + CI/CD — block deploys when quality drops → Cost monitoring — alert when token spend exceeds your budget By the end you will have automatic tracing, a passing eval suite, and CI/CD integration that treats AI quality like code quality. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ WHAT YOU WILL LEARN ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ✔ Why AI systems degrade silently (and what to do about it) ✔ What observability means in an AI context ✔ What a trace and a span are ✔ What a golden dataset is and how to build one ✔ What CI/CD means and how to integrate evals into it ✔ What P95 latency means and why it matters ✔ What pytest parametrize does ✔ What Prometheus and Grafana are ✔ Common errors and exactly how to fix them ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ NEW WORDS EXPLAINED IN THIS VIDEO ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Observability, Trace, Span, Golden dataset, Evals (evaluations), CI/CD, Baseline score, AnswerRelevancyMetric, GEval, Threshold, P95 latency, Tail latency, Regression, pytest parametrize, Prometheus, Grafana, @observe() decorator, langfuse_context ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ PREREQUISITES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ → Video 00 — Backend AI workflows — https://youtu.be/atrWZPghDhg?si=Gn5Bqkd_Xse81rEn → Video 01 — Basic LLM API Call — https://youtu.be/CSqOC2aKTx4?si=C0J2C0F5t2WQ2wke → Video 02 — Streaming— https://youtu.be/RtcjFueBB3M?si=sP1sqaX1HPc6y8OH → Video 03 — Data Ingestion - https://youtu.be/QJLRJC0KwzQ → Video 04 — RAG - https://youtu.be/n8teHOv1OWA → Video 04 - Caching - https://youtu.be/eydzBBOvPj0 → Video 06 — Tool Calling (required — agents build directly on this)-https://youtu.be/uE3aD9U-sKY → Video 07 — Agentic Loop https://youtu.be/yohIVCINx7A → Video 08 — Async Job Queue https://youtu.be/YN3HASb7kmU → Video 09 — Guardrails https://youtu.be/aHAA01VAzxA ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ TIMESTAMPS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00 — Why AI degrades silently — the core problem 1:30 — The observability + evals workflow 3:30 — Key concepts — tracing, golden dataset, cost monitoring 5:30 — Project setup — Langfuse free account 6:30 — traced_llm.py — one decorator, full tracing 8:30 — golden_dataset.py — building your test suite 10:00 — run_evals.py — measuring quality with DeepEval 12:00 — test_evals.py — blocking deploys on regression 14:00 — main.py — production FastAPI with tracing 15:30 — Common errors and fixes 17:30 — Key takeaways + series wrap-up ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ TOOLS USED ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Langfuse — https://langfuse.com DeepEval — https://docs.confident-ai.com FastAPI — https://fastapi.tiangolo.com Anthropic — https://console.anthropic.com ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ COMPLETE SERIES — AI Backend Workflows ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 01 — Basic LLM API Call 02 — Streaming 03 — Data Ingestion 04 — RAG 05 — Caching 06 — Tool Calling 07 — Agentic Loop 08 — Async Job Queue 09 — Guardrails 10 — Observability + Evals ← you are here (final video) #Python #LLMObservability #Langfuse #DeepEval #AIBackend #Evals #CI #MachineLearning #FastAPI #Anthropic #Tutorial

Watch on YouTube

Video Details

Category

AI Evaluation & Monitoring

Featured Date

Quality Rank

#1

AI Recommended