Loading video player...
Join us for a practical webinar on LLM evaluation frameworks and strategies for measuring the quality, reliability, and performance of AI applications, including chatbots, AI agents, and RAG systems. 💡 What we’ll cover: • Hallucinations, prompt sensitivity, and hidden failure modes • Human evaluation vs automated evaluation • Benchmark testing and regression workflows • Evaluating chatbots, AI agents, summarization, and RAG systems • Introduction to RAGAS and key LLM evaluation metrics • Measuring faithfulness, relevance, groundedness, and latency • Monitoring LLM applications in production 🛠 Hands-on exercise included: Participants will evaluate a small LLM/RAG assistant using structured rubrics and compare human evaluation with automated RAGAS scores. Perfect for AI engineers, developers, data scientists, and technical leaders working with LLM applications and AI systems.