Loading video player...
Google just released šš²šŗš¶š»š¶ ššŗšÆš²š±š±š¶š»š“ š®, their first fully multimodal embedding model - now also available in Weaviate. The model maps text, images, videos, audio, and PDFs into a šš¶š»š“š¹š² šš»š¶š³š¶š²š± š²šŗšÆš²š±š±š¶š»š“ šš½š®š°š². This means you can query with text and retrieve relevant videos, or search with an image and find related documents, or any other combination - all using the same model. In this video, I've included a walkthrough of building a šŗšš¹šš¶šŗš¼š±š®š¹ š£šš š„šš š½š¶š½š²š¹š¶š»š². We embed each PDF page as an image using Gemini Embedding 2, add it to Weaviate, then query with text to retrieve relevant PDF page images. These images are passed to Gemini Flash to generate answers using the document context. The dataset has "needles" hidden in the documents - so when we ask "what's the secret flower?", the pipeline needs to use multimodal understanding of both text and images to answer correctly. Check out the model release blog: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-embedding-2/ PDF RAG notebook: https://github.com/weaviate/recipes/blob/main/weaviate-features/model-providers/google/multimodal_pdf_rag_gemini.ipynb 00:00 - Intro 00:31 - Multimodal embedding models 01:31 - Google's Gemini Embedding 2 02:12 - PDF RAG architecture overview 03:00 - Building a multimodal PDF RAG pipeline 04:08 - Conclusion ā¬ā¬ā¬ā¬ā¬ā¬ā¬ā¬ā¬ā¬ā¬ā¬ CONNECT WITH US ā¬ā¬ā¬ā¬ā¬ā¬ā¬ā¬ā¬ā¬ā¬ā¬ - Visit http://weaviate.io/ - Star us on GitHub https://github.com/weaviate/weaviate - Stay updated and subscribe to our newsletter: https://newsletter.weaviate.io/ - Try out Weaviate Cloud for free here: https://console.weaviate.cloud/ Got a question? - Forum: https://forum.weaviate.io/ Connect with us on - Twitter: https://twitter.com/weaviate_io - LinkedIn: https://www.linkedin.com/company/weaviate-io/