Loading video player...
Are you tired of spending time with
serving infrastructure when you just
want to deploy an open model? What if
you could go from model weights to a
scalable production ready API quickly?
In this new series, we are going to
answer this question and show you how to
serve open models on Vert.Exi. So, let's
get started. Hi everyone, my name is
Ivan Ardini and welcome to this first
video in our developers guide to serving
open models on Vert.exi. XAI. In this
series, we are going to provide a
complete road map with practical code
for every serving option on Vert.Ex AI.
We will start with simplest serverless
APIs, cover self-deployed model, and go
all the way to high performance custom
containers using VLMs both on GPUs and
TPUs, and even show you how to run
benchmarks. But before we dive into the
how, we need to understand the what and
the why. Today's goal is to give you a
clear decision framework. And by the end
of this video, you will have a map to
navigate all the serving options so you
can confidently choose the perfect
strategy for your project. So you need
to serve an open model. The first thing
to ask is how much control do you need
versus how much simplicity do you want.
This decision tree will be our guide for
the entire series. We will walk through
each path today at the eye level and in
the future videos we will deep dive into
each of them with hands on code. If your
goal is um maximizing simplicity maybe
for wring prototyping or if you just
want don't want to touch infrastructure
then the full manage path is for you.
This is vertex AI model as a service or
mass. We provide popular models as a
serverless and pay as you go APIs. You
just find the model, enable the API, and
then you get an endpoint to start eating
immediately. The tradeoff is less
control, but you get speed to value and
this is imbidable. This will be the
focus of our first hands-on video in
this series. Now, what if mass it's a
bit too restrictive? You want a balance
between easy to use and flexibility. And
so, the singleclick deployment path is
your sweet spot. With model garden
self-deploy models, you can choose from
a huge list of created open open models,
but the key difference is that you
choose the hardware. This gives you
direct control over performance and
cost. So, it's a fantastic alternative
and we will dedicate a future episode to
showing you exactly how to configure and
deploy these models. Finally, for those
who need control performance or just
fine-tune a model and want to deploy it,
your path leads to containerbased
serving. Here we have two powerful
option. First, using pre-built optimized
containers that leverage backends like
VLM or SG lang. You get a great
performance without building the
container yourself. The trade-off here
is the number of deployment parameters
that you can tweak which is limited to
what those container expose. That's why
for the full flexibility and control,
you can serve with your own custom
container. This is where you can package
any model, use any framework and bake in
any custom logic you need. This approach
offer total control and later we will
have videos in this series showing you
how to build these containers from
scratch from both GPUs and TPUs. So
that's our map. These are the core
strategy for serving open models on
Vert.exai. At this point, you have a
better understanding of all the
landscape of options. Remember, the goal
isn't to find the single best way, but
the best way for your project's needs.
And this framework is your guide for the
rest of our journey together. Now, you
have the map and in our next episode, we
will start exploring and getting
hands-on. We will start with the
simplest path, model as a service. We
will walk you through step by step how
to find and enable and call a serverless
open model API for your application. If
you don't want to miss that or the rest
of the series where we will cover self-
deployment, custom VLM containers and
more, don't forget to subscribe to the
Google Cloud Developer channel and turn
on notifications and let me know if you
have comment which uh part of this
series you are most excited about.
Connect with me on social media and
until next time, happy building.
Navigate the landscape of open models serving on Vertex AI with this essential developer's guide. We introduce a practical roadmap that helps you choose between maximum simplicity and full control for your deployments. Discover how to leverage Vertex AI's Model as a Service for rapid prototyping or self deployed models with Model Garden for balanced flexibility. This video ensures you can confidently select the best serving option for your project. Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech #GoogleCloud #AIAgents #ADK #MCP Speakers: Ivan Nardini Products Mentioned: AI Infrastructure, Vertex AI