Loading video player...
In this video we move beyond text embeddings and explore multimodal AI in .NET. We’ll: - Understand how CLIP maps images and text into the same vector space - Run image embeddings locally using ONNX Runtime - Build a simple image search scenario - Connect it to a lightweight RAG-style chat - Discuss the journey from local experimentation to Azure AI scale Running locally helps you understand embeddings deeply. From there, moving to Azure AI Search or Azure OpenAI becomes a natural next step. 🔗 Resources 📦 Code & Samples - Main Repo: https://github.com/elbruno/elbruno.localembeddings - Image Samples: https://github.com/elbruno/elbruno.localembeddings/blob/main/samples/README_IMAGES.md - NuGet Package: https://www.nuget.org/packages/ElBruno.LocalEmbeddings.ImageEmbeddings 📚 Model & Technology References - OpenAI CLIP research: https://openai.com/research/clip - CLIP Paper: https://arxiv.org/abs/2103.00020 - Model on Hugging Face: https://huggingface.co/openai/clip-vit-base-patch32 - ONNX: https://onnx.ai - ONNX Runtime: https://github.com/microsoft/onnxruntime 🚀 Next Experiments - Combine text + image embeddings - Store vectors in Azure AI Search - Compare CLIP vs Azure Vision embeddings - Integrate with Microsoft Agent Framework If you're building AI in .NET, my advice will be: Start local - Understand the fundamentals - Scale in Azure ⏱️ Chapters 00:00 Introduction – From Text to Image Embeddings 01:10 Why Multimodal Matters 02:40 What is CLIP? 04:30 What is ONNX & ONNX Runtime? 06:20 Downloading the Model 07:40 Walking Through the Code 10:30 Running Image Search 13:20 RAG Chat Demo 16:00 Language Limitations 18:20 From Local to Azure 20:00 Reflections & Next Steps