Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings | DailyDevLists

Loading video player...

Full Transcript

510 words • EN

[Music]

Hi, I'm Alice.

I'm Lucas. We're product managers at

Google DeepMind.

And today we are incredibly excited to

introduce Embedding Gemma, our

state-of-the-art embedding model

designed for mobile first AI. Embedding

Gemma is a 300 million parameter text

embedding model designed to power

generative AI experiences directly on

your hardware.

Embeddings are numerical representations

of data. This model transforms text like

messages, emails or notes into a vector

of numbers to represent meaning in a

highdimensional space that a generative

model can then use for downstream tasks.

Embedding Gemma is small, fast, and

efficient. Thanks to quantization aware

training, you can run the model with as

little as 300 megabytes of RAM while

preserving state-of-the-art quality.

It generates embeddings of 768

dimensions, but thanks to MROSKA

representation learning, you can

customize the model's output dimensions

and go down to 128. Based on the same

technology and research that powers our

Gemini embedding models, embedding Gemma

brings that state-of-the-art capability

in a smaller and more lightweight model.

Think highquality semantic search, fast

and relevant information retrieval, or

customized classification and

clustering, just to name a few

opportunities. Embedding Gemma achieves

the best score on the comprehensive

massive text embedding benchmark for

models under 500 million parameters. The

gold standard for text embedding

evaluation

trained across 100 plus languages.

Embedding Gemma brings proven

performance to instantly connect with

diverse and global audiences. We've

engineered embedding Gemma specifically

for ondevice performance to ensure

efficient computations and minimal

memory footprint even on resource

constrained hardware. Embedding Gemma

facilitates ondevice embedding of local

documents. So sensitive user data never

leaves the device. And because it works

offline, it means Frontier search and

retrieval features work regardless of

connectivity. Together with our

generative models like Gemma 3N, you can

build powerful mobile first generative

AI experiences and efficient retrieval

augmented generation pipelines. This

means your applications can now leverage

user context from data to provide more

personalized and helpful responses such

as understanding that you need your

carpenters's number for help with

damaged floorboards.

Here's an example of what embedding

Gemma can power. What you are seeing is

how a user can utilize embedding Gemma

to query previously opened articles or

other web pages. The model embeds each

page as it's opened in real time. Then

with a browser extension that uses

embedding Gemma, the user can ask a

question to retrieve the contextually

relevant articles. And because the

embeddings are created on device, all

this is happening without leaving the

user's hardware.

And it's designed with customization in

mind. fine-tune embedding Gemma for your

domain or in a particular language. It

works across popular tools and platforms

such as hugging face and Kaggle. Check

out our notebook examples part of the

Gemma cookbook to get started. Our next

generation of ondevice embedding models

is here and it's open for everyone. It's

small, fast, and efficient. Download

Embedding Gemma and get started building

right now.

You can find links in the description

below. We can't wait to see what

Embedding Gemma unlocks for you.

[Music]

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings

Google for Developers

73 days ago

4:13

RAG & Vector Search

Rank #4

Description

Discover EmbeddingGemma, a state-of-the-art 308 million parameter text embedding model designed to power generative AI experiences directly on your hardware. Ideal for mobile-first Al, EmbeddingGemma brings powerful capabilities to your applications, enabling features like semantic search, information retrieval, and custom classification – all while running efficiently on-device. In this video, Alice Lisak and Lucas Gonzalez from the Gemma team introduce EmbeddingGemma and explain how it works. Learn how you can run this model on less than 200MB of RAM with quantization, customize its output dimensions with Matryoshka Representation Learning (MRL), and build powerful offline Al features. Resources: Learn about EmbeddingGemma → https://developers.googleblog.com/en/introducing-embeddinggemma EmbeddingGemma documentation → https://ai.google.dev/gemma/docs/embeddinggemma Gemma Cookbook → https://github.com/google-gemini/gemma-cookbook Quickstart RAG notebook → https://github.com/google-gemini/gemma-cookbook/blob/main/Gemma/%5BGemma_3%5DRAG_with_EmbeddingGemma.ipynb Discover Gemma models → https://deepmind.google/models/gemma Chapters 0:00 - Intro 0:26 - Model overview 1:18 - Model features 2:29 - RAG 2:54 - Website embedding demo 3:23 - Tools and platforms 3:41 - Conclusion Subscribe to Google for Developers → https://goo.gle/developers Speaker:Alice Lisak Lucas Gonzalez Products Mentioned: Google AI, Gemma,Generative AI

Video Details

Category

RAG & Vector Search

Featured Date

November 17, 2025

Quality Rank

#4

AI Recommended