
AI Agents 8 - Evaluation, Cost and Scalability
Prof. Ghassemi Lectures and Tutorials
The source provides an extensive playbook for training world-class Large Language Models (LLMs), using the development of the 3B multilingual reasoning model, SmolLM3, as a case study. It comprehensively addresses the entire lifecycle of model creation, beginning with the crucial question of why one should train a custom model over using existing ones. The document then offers detailed, systematic guidance on architectural decisions (like attention mechanisms, positional encoding, and MoE configurations), the importance of data curation and mixture design, and the selection of optimal training hyperparameters using scaling laws and ablations. Finally, it covers the process of post-training—including supervised fine-tuning and preference optimization—and shares critical insights into managing the complex infrastructure required for multi-trillion token training runs. https://huggingface.co/spaces/HuggingFaceTB/smol-training-playbook #llm #largelanguagemodels #training #playbook #ai #learn #huggingface
Category
YouTube - AI & Machine LearningFeed
YouTube - AI & Machine Learning
Featured Date
October 31, 2025Quality Rank
#2