Loading video player...
In this video, Jules Damji demonstrates how to implement systematic experiment tracking for LLM applications using MLflow. The session focuses on moving beyond ad-hoc testing by capturing structured metadata, comparing model configurations, and analyzing the cost-performance trade-offs of different Large Language Models. You will learn how to use mlflow.openai.autolog to automatically capture traces, token usage, and latency, as well as how to use fluent APIs for custom telemetry. Key Learning Objectives š¹ Automated Instrumentation: Utilizing MLflow AutoLog to capture model parameters, inputs/outputs, and latency without manual boilerplate. š¹ Custom Metadata Logging: Using mlflow.log_text, log_metric, and log_parameter for data points not captured by AutoLog. š¹ Cost Optimization: Leveraging MLflow 3.10+ features to calculate and compare token costs across different models (e.g., GPT-4o vs. GPT-4o-mini). š¹ Systematic Experimentation: Organizing runs into hierarchical structures and using tags for programmatic searching and filtering. š¹ Hyperparameter Impact: Analyzing how configurations like temperature affect output creativity, token count, and execution cost. Next in the Series: The following video will dive deep into Observability and Tracing to inspect nested and multi-step agentic operations.