Loading video player...
How does an AI find the right answer from thousands of documents without reading all of them? Embeddings ā and we're building one from scratch. Sign up for my FREE weekly newsletter, where I spill my unfiltered thoughts on the latest AI news, cool research, and projects I'm building: https://www.onchainaigarage.com/ š¦ Follow Tonbi on X for real-time AI x blockchain updates! https://x.com/tonbistudio Episode 4 of the ML Engineering series covers embeddings ā the vectors of numbers that represent meaning and power every semantic search and RAG system. We break down what embeddings are, how cosine similarity measures closeness, why you need specialized embedding models, and how vector databases like FAISS make search instant at scale. Then in the work section, we build a complete semantic search engine over 1,154 One Piece episode synopses and wire it into a full RAG pipeline with the Qwen 2.5 3B model ā comparing results with and without retrieval. ā Understand embeddings from concept to code ā vectors, mean pooling, cosine similarity, and why similar meaning produces nearby vectors. ā Build a working semantic search engine with FAISS that finds relevant episodes by meaning, not keywords ā even from vague queries like "the big whale at the entrance of the Grand Line." ā See RAG in action: without retrieval the LLM hallucinates wrong answers, with it the model pulls correct episodes from your vector database. Scampi & Tonbi are a human-AI duo building onchain projects in public. Tonbi brings taste, judgment, and domain expertise. Scampi brings tireless research, coding, and shrimp energy. š¦ š¦ Tonbi: https://x.com/tonbistudio š» Tonbi's GitHub: https://github.com/tonbistudio š Portfolio: https://www.tonbistudio.com Resources: š Sentence Transformers Library: https://www.sbert.net/ š FAISS (Facebook AI Similarity Search): https://github.com/facebookresearch/faiss š Hugging Face Hub: https://huggingface.co/ š all-MiniLM-L6-v2 Model: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 Timestamps: 0:00 - Intro: Embeddings and what we're building 3:03 - Vector arithmetic: King - Man + Woman = Queen 4:58 - Pooling and cosine similarity explained 8:06 - Embedding models and vector databases (FAISS) 10:42 - What is RAG and why not just fine tune? 14:29 - Work: Embedding 1,154 One Piece episodes 17:30 - Semantic search demo: keyword vs meaning-based 20:49 - Full RAG pipeline with Qwen 2.5 3B 21:53 - Without RAG vs with RAG: hallucination vs correct answers Coming Next: Next episode we dive into fine tuning ā baking knowledge directly into the model's weights. This is a big one with a lot to cover, and potentially a spinoff series if there's interest! š Have you built a RAG pipeline or semantic search engine? What embedding model are you using? Drop your setup in the comments! If this helped you understand embeddings, please like, subscribe, and hit the bell for more ML engineering builds! š¦āØ #Embeddings #RAG #SemanticSearch #FAISS #MachineLearning #MLEngineering #Transformers #ClaudeCode #VectorDatabase #LLM #OnePiece #AIagents