Loading video player...
This video builds a customer support LLM application and adds observability using MLflow with FastAPI and Streamlit. It outlines an architecture with a backend agent, a FastAPI layer, a Streamlit frontend, middleware for request/response logging, and a separate notebook-based evaluation workflow to avoid production latency. The project is scaffolded with a Python package structure, installs key dependencies, and implements a support agent with an FAQ lookup tool. Prompts are registered, versioned, and loaded from the MLflow prompt registry, and FastAPI startup configures MLflow tracking via a SQLite store. A custom middleware logs metrics like status code, latency, endpoint, method, and error flags into MLflow. Finally, it creates an evaluation dataset and runs MLflow GenAI evaluation with LLM-judge scorers (correctness and guidelines), showing how traces and results reveal hallucinations and quality issues. Github repo https://github.com/kokchun/youtube_demos/tree/main/fastapi_mlflow_pydanticai #mlflow #pydanticai #fastapi 00:00 LLMOps Overview 00:32 Architecture Diagram 02:14 Middleware Logging 03:16 Evaluation Workflow 04:43 Project Setup UV 06:13 Scaffold Folders Files 07:48 Install Dependencies 08:50 Constants Models 09:53 Build Support Agent 12:44 FastAPI Endpoints 14:37 Run And Test API 16:01 Prompt Registry MLflow 17:30 Register Prompts 20:32 View Prompts In UI 21:20 Load Prompts In Agent 22:51 FastAPI Lifespan Setup 27:36 Fix Tracking Import Order 28:40 Prompt Versioning Iteration 30:06 Monitoring In MLflow UI 30:40 Comparing Bot Versions 31:04 Debugging Traces and Rate Limits 32:57 Building MLflow Logging Middleware 38:45 Running Services and Testing Requests 40:44 Registering Middleware and Viewing Metrics 44:17 Quick Streamlit Frontend 48:12 Pulling Traces for Monitoring 49:58 Creating an Evaluation Dataset 52:59 LLM Judge Scorers and Evaluate 58:45 Fixing Judge Model and Reviewing Results 01:02:02 Wrap Up and Next Steps