Loading video player...
[Music]
Everyone is using MCP servers these
days. They're connecting AI agents to
databases, Kubernetes clusters, GitHub
repos, cloud resources, you name it. MCP
is becoming the standard way to give AI
tools access to external systems. But
here's the question. How should you
actually run those MCP servers? The
documentation typically shows you one
way, running it locally with NPX. But is
that secure? Is it scalable? Can your
team share it? What about production?
There are actually multiple deployment
options, each with different tradeoffs.
So, here's what we're going to do. I
will show you four different ways to
deploy MCP servers from the dead simple
to the enterprise ready. We'll look at
local execution with MPX, Docker
containers, Kubernetes deployments, and
operator managed resources. Plus, I will
cover a few notable cloud platforms like
Fly IO, Cloudflare Workers, and AWS
Lambda. For each approach, I will show
you exactly how it works, what problems
it solves, and what new problems it
creates. We will explore all of this
through practical examples. I will use
my DevOps toolkit as the MCP server we
are deploying not to sell you on the
project but because I need a real MCP
server to demonstrate those patterns and
this one works with Kubernetes vector
databases and all the complexity we'll
need to see. Everything you will learn
applies to any MCP server you want to
run. Here's another challenge. AI
agents, even when they're wrapped inside
the MCP protocol, need to authenticate
with web services and access APIs. You
could hardcore credentials, but that's
risky. Environment variables, well
still not ideal. Building your own
credential injection system, well, now
we've got another problem to solve.
That's where the sponsor of this video
comes in. One password and browserbased
launch integration for secure agentic
authentication. You connect your one
password wa to browserbased directory in
under three minutes. When your agent
needs credentials, they're injected just
in time at runtime. The agent doesn't
see them, log them or store them. The
injection includes human in the loop
authorization. When an agent tries to
authenticate, you get a push
notification for approval. Clanas are
automatically mapped to the right
services and there's an audit trail that
logs item ids and timestamps without
exposing the actual secrets. If you want
to learn more, check out
browserbase.com.
And now back to MCP deployments.
Let's start with the simplest way to run
MCP servers, executing them directly on
your local machine. This is typically
the default approach you will find in
most MCP documentation. Whether it's MPX
for JavaScript servers, Python for
Python based ones, or whatever runtime
the server needs, doesn't matter. You
just run the command directly. It's
what's usually documented often because
authors assume you will figure out how
to transform it into something better or
because they were too lazy to show you
the alternative. One of the two. Let me
show you what this looks like. Here's
the configuration file that tells Clo
how to run my AAI MCP server locally.
I'm using my own MCP server for this
demo since that's the one I'm working on
in the penalty of this video. But the
same, and I repeat again, the same
principles apply to any MCP server.
That's pretty straightforward. We're
telling Claud to use npx to run the MCP
server, passing it the package name and
the environment variables it needs.
Notice that quadrant URL over there.
That's our vector database dependency
that needs to be running separately. We
already started it during the setup, but
in a real world scenario, you would need
to manage that yourself. The point is
that not always there is only MCP
server. There are sometimes dependencies
of that MCP server. Now, let's fire up
Cloud with this configuration and see it
in action. Once clo starts up, let's
test if the MCP server is working by
asking it to list the patterns available
in that MCP server. There we go.
Perfect. The MCP server is running
locally and Cloud can communicate with
it. We can see the patterns that are
stored in quadrant. Everything works as
expected. Now, this local approach might
seem great at first, but there are some
serious trade-offs you need to consider.
This is typically the default and
sometimes the only documented way to run
an MCP server, but that does not mean
it's the best approach for your
situation. First of all, adding
dependencies is a pain in the ass. See
how we need quadrant running separately.
In a more complex setup, you might need
multiple services, databases, or other
dependencies. Now, agents typically
start the MCP server when they launch
and shut it down when they exit them.
But if you wrap everything in a script
to handle those dependencies, you might
end up with zombie processes. The agent
might kill the script but leave all the
child processes running. Before you know
it, you'll get a bunch of orphan
services eating up your resources. Then
there's the installation requirement.
You need node and npx installed on your
machine for this JavaScriptbased MCP. If
you're using a Python MCP, you need
Python. Ruby MCP, you need Ruby. Your
machine starts to become a mess of
different runtimes and package managers.
But here's the biggest issue. There's
absolutely no isolation. This MCP server
has direct access to your entire laptop.
It can read your files, access your
network, and do whatever the hell it
wants. Sure, you might trust the MCP
server code, but do you trust all its
dependencies? All their dependencies?
It's a security nightmare waiting to
happen. Look, this might be the easiest
approach aside from needing the right
runtime installed. It might be what's
documented everywhere, but let's be
honest, it's potentially the worst way
to run MCP servers. You've got
dependency management issues, process
life cycle problems, and zero, I repeat
zero isolation. There has to be a better
way, right? Let me show you some
alternatives that address those
problems.
Now, let's try a better approach.
Running MCP servers in Docker
containers. And here's the thing. At the
end of the day, an MCP server isn't that
different from any other server. Sure
it uses stdio instead of HTTP for
communication, but it's still just a
server that needs to run somewhere. And
how do we typically run servers locally
these days? Docker. It gives us
isolation, better dependency management
and we don't need to pollute our
machines with various runtimes. Let me
show you how this works. The
configuration is slightly different.
Now, instead of running npx directly, we
are telling cloud to use docker compose.
See the difference? We're using docker
compose run as the command. The docker
compose file handles all the complexity.
Starting quadrant, setting up networking
between containers, managing volumes
all that stuff. The dash rm plug ensures
containers are cleaned up when we're
done and remove orphans takes care of
any leftover containers from previous
runs. Now let's fire up cloud with this
dockerbased configuration. This time
let's test it with a different command.
Instead of listing patterns, we'll ask
for Kubernetes capabilities that the MCP
server has discovered just to vary a
bit. There we go. Excellent. The Docker
based MCP server is working perfectly.
It discovered a bunch of capabilities
from our Kubernetes cluster. Now notice
how everything just works without us
having to manage quadon separately or
worry about process life cycles. Docker
compose handles all of that for us. So
what can we gain with this Docker
approach? First and foremost, we got
proper isolation. The MCP server runs in
its container with controlled access to
resources. Unless you do something silly
like mounting your entire file system or
running containers in privileged mode
you're much safer than with the direct
local approach. Everything runs in
containers, which means you don't need
to install node, npx, python, or
whatever runtime the MCP needs. Just
Docker and you're good to go. All the
dependencies are bundled together in the
compos file. Quadrant, the MCP server
networking between them, it's all
defined in one place and managed as a
unit. But here's the thing, it's still
running locally. This is still a single
user setup on your machine. Now, you
might be thinking, can't I just run
Docker on a remote server? Sure, you
could expose Docker's API over the
network, but that opens up a whole can
of security worms. You would need to
manage TLS certificates, authentication
network access. It gets complicated fast
and you're basically rementing
infrastructure that already exists.
There's a better way to go truly remote
with proper multi-user support, high
availability, and all the enterprise
features you might need. Let me show you
what that looks like.
Time to go truly remote. Here's where we
make a fundamental shift. Think about
it. Running MCP servers locally is like
local development. Everyone spins up
their own instance, manages their own
dependencies, deals with their own
problems. But what if we could run MCP
servers like production services? deploy
them once properly and let the entire
team or company connect to them. That's
exactly what Kubernetes gives us. So
this isn't just about containerization
anymore. It's about turning MCP servers
into shared organizational
infrastructure. Instead of every
developer running their own instance of
various MCP servers, we deploy them once
to Kubernetes and everyone connects to
the same properly managed services.
Whether it's myAI server or your custom
MCP for internal tools or that third
party MCP for cloud resources, the
approach is the same. So let's deploy an
MCP server to Kubernetes using Helm. I'm
using myAI server as example. But as I
already said, everything you see here
applies to any MCP server you want to
run in production. Did you notice
something important here? I am running
these commands, not load. This is
fundamental shift. In the previous
examples, the agent was responsible for
spinning up MCP servers when it started
and shutting them down when it stopped.
The configuration told the agent how to
lounge the server. Now we separated
those responsibilities. A human or
GitHubs or your CI/CD pipeline deploys
the MCP servers to Kubernetes just like
any other production service. The agents
just connect to them. They don't manage
their life cycle anymore. Let's see what
Kubernetes created for us. There we go.
Perfect. We got the MCP server running
as a deployment. Quadrant as a stateful
set for persistent storage services for
internal communication and an ingress
exposing it all at specific URL. This is
proper production infrastructure running
independently of any agent. Now let's
look at how agents connect to this
remote MCP server. Look at this
configuration carefully. We not telling
code to run the MCP server anymore. We
are telling it to connect directly to
the already running MCP server and that
URL using HTTP transport. The type HTTP
tells Cloud to use HTTP transport to
communicate with the remote MCP server
and the URL points to our Kubernetes
endpoint. This allows Cloud to
communicate directly with the MCP server
running in Kubernetes over HTTP. This is
clean direct connection without any
local bridge processes or protocol
translation. The agent speaks HTTP
unlike the default which is std IO. It
speaks HTTP directly to the remote MCP
server. So let's connect to our MCP
server remote one and check its status
to make sure everything's working.
Excellent. We're connected to the MCP
server remote MCP server running in
Kubernetes. Notice that it shows the
Kubernetes version. Quadrant is
connected. All the services are healthy.
This is the same MCP server is deployed.
But now multiple users can connect to it
simultaneously. So what have we achieved
with this Kubernetes approach? We got
real isolation through Kubernetes
namespaces, ARB and network policies.
The MCP server can't access your laptop
anymore. It's confined to its name space
with only the permissions you explicitly
granted. Everything still runs in
containers. So there's nothing special
to install on developer machines. But
now we also get all the Kubernetes
goodies, higher availability if you want
it, autoscaling, security policies
audit log, the whole of shenanigans, the
whole enterprise package. Most
importantly, this is truly remote and
multi-user. You deploy the MCP server
once and your entire team connects to
it. Everyone shares the same patterns
the same capabilities, the same
configuration. It's like the difference
between everyone running their own
database locally versus connecting to a
shared production database. Now to be
frank, the setup is more complex than
local deployment, but that's the nature
of production infrastructure. You need
the Kubernetes cluster, Helm charts
controllers, and so on and so forth. But
here's the thing. You do this setup once
and everyone benefits. It's
infrastructure, not something each
developer needs to figure out. Still
there's another approach using
Kubernetes operators that claims claims
to simplify MCP server management. Let
me show you what happens when you add
ToolHive to the mix.
Stack clock created an operator called
toolhive that promises to simplify MCP
server management. The idea is that
ToolHive manages MCP servers as
Kubernetes custom resources with
additional features like permission
profiles and resource management. Let's
see if it actually delivers on that
promise. Toolhive treats MCP servers as
firstass Kubernetes citizens. Instead of
deploying standard deployments and
services, you create an MCP server
resource and the operator takes care of
the, rest., Or, at least, that's, the, theory.
So let's deploy the same MCP server
using toolive instead of standard
Kubernetes resources. And notice the
deployment method equals toolhive
parameter in the help command. That's
important. And there we go. There's our
MCP server custom resource instead of a
regular deployment. Now let's dig deeper
and see what this custom resource
actually contains. That over there is a
lot of YAML. The key things to notice is
that it's got the container spec
embedded in pod template spec. It
specifies transport as streamable HTTP
and in the status you can see the proxy
URL. Toolhive took our MCP server
definition and created the necessary
pods and services. Uh wait, notice the
proxy mode set to SSC that server sent
events. And here's the problem. SSC
transport is deprecated in the MCP
protocol as of November 2024. MCP moved
to streamable HTTP instead. So Toolhive
is using a deprecated transport mode.
not exactly filling me with confidence
about this operators feature. But let's
continue. Let's see all the resources
that were created both by toolhive and
by our helmchart. Look at all those
resources. The helmchart created the MCP
server custom resource quadrant the
ingress and other supporting resources.
Then toolhive operator saw that MCP
server resource and created additional
pods and services like MCPO and uh
MCP.AI MCP proxy. So we've got resources
created by Helm and resources created by
toolhive based on what Helm created.
It's layers upon layers of abstraction.
Now here's the interesting part. Let's
see how the client connects to this tool
have managed MCP server and whether it's
any different from our standard
Kubernetes approach. Hey, notice that
the configuration is identical to our
standard Kubernetes deployment. We're
using HTTP transport to connect directly
to the MCP server. Toolhives proxy
supports both SDIO and HTTP transports
but HTTP is clearly the better choice.
According to Toolhive's own performance
testing, stdio transport has severe
performance limitations. In their tests
with 50 concurrent requests, only two
succeeded. The stdio implementation is
unsuitable for production use. So by
using HTTP transfer directly, we get
better performance and eliminate the
complexity of stdio to HTTP translation
that would otherwise be needed. So
let's connect to the MCP server anyway
just to verify it works. And there we
go. It works. The MCP server is running
and accessible. But let's be honest
about what we actually achieved here.
The end result is almost the same as
when using standard Kubernetes
resources, except now some of them are
created by the operator. We still need
the ingress. We still need to manage
secrets. We still need supporting
resources. The tool hive custom resource
didn't eliminate complexity. It just
moved it around. That's why I had to
wrap everything in a helm chart anyway.
So was the theoretical advantage of
toolhive? The main selling point was
supposed to be better MCP server life
cycle management through Kubernetes
operators. But in practice, it's just
another layer of abstraction doesn't
solve the fundamental deployment
challenges. The stddio transport it
supports has catastrophic performance
issues, which is why we're using HTTP
transport. Anyway, at the end of the
day, running MCP servers is just like
running any other HTTP server. You
deploy them, you expose them through an
ingress and connect to them over HTTP.
Toolhive adds a custom resource
obstruction on top. But I'm honestly not
seeing the value. It's using deprecated
SSE mode and it adds another layer of
complexity without clear benefits. So if
you prefer using operators and custom
resources, sure to hive is an option.
But given its limitations and the fact
that it doesn't actually deliver on its
main promise, I would stick with
standard Kubernetes deployments. At
least those are straightforward and
don't pretend to solve problems they
can't actually solve. Let me show you a
few other deployment options before we
wrap it up.
So before we wrap up, let me quickly
quickly quickly quickly run through some
other ways to deploy MCP servers.
There's a whole ecosystem emerging
around MCP deployment. And while I
focused on Kubernetes because it's
vendor agnostic and production ready
you should know what else is out there.
Fly.io takes an interesting approach.
They run MCP servers as tightly isolated
VMs, which they call fly machines. You
can deploy with a simplify MCP launch
command and they handle authentication
routing for you. They support both
single tenant each user gets their own
app and multi-tenant patterns. It's
pretty sleek if you're already in their
ecosystem. It is then there is
cloudflare workers uh that went all in
on MCP. They provide allout
authentication out of the box, zero
egress fees and CPU based billing that's
perfect for streaming connections. You
can deploy MCP servers as edge functions
that run close to your users. Their
workers MCP tooling handles the protocol
translation for you. If you're looking
for truly serverless MCP, this is
probably your best bet. Probably AWS
Lambda offers the AWS serverless MCP
server. It works technically but the
developer experience is rough. Cold
starts are painful and the stdio
transport has serious performance issues
on lambda. Then there is Versell that
lets you add MCP endpoints directly to
your Nex.js apps using the MCP handler
package. If you already have NexJS uh
applications on Versel, this is the path
of least resistant. But watch out for
their egress charges and memory based
billing for idle connections. Watch out.
Railway keeps things simple. It's a
deployment platform that just works
without needing platform engineers.
Deploy your MCP server like any other
app. Nothing fancy, but sometimes that's
exactly what you need. Then there is
Podman with MCP server uh implementation
that deserves a mention though it's not
really a deployment solution. It's an
MCP server that lets AI agents manage
Podman containers on your local machine.
Think of it like the Docker MCP server
but for Podman users. still local still
has all the same limitations we
discussed earlier and here's the thing
about all those alternatives they each
have their niche fly IO is great for
multi-tenant isolation cloudflare excels
at edge deployment with minimal latency
AWS Lambda well it exists where cell
makes sense if you're already there but
they all have one problem vendor locking
you pick cloudflare you're stuck with
cloudflare you pick AWS you're stuck
with AWS most of them are still figuring
out MCP they are still not sure the
implementations are evolving the
performance varies widely and the
developer experience ranges from decent
to painful that's why I keep coming back
to Kubernetes it's vendor agnostic you
can run it anywhere AWS Google cloud
Azure on premises or that server under
your desk the deployment patterns are
mature the tooling is solid and you're
not betting your infrastructure on
single vendor's interpretation of MCP
unless you're a small company with just
a few apps or you're already deeply
committed to a specific cloud vendor.
Kubernetes remains the most flexible
option for production MCP servers. But
hey,, at least, now, you, know, what's, out
there. Choose what works for your
situation, not what I tell you to use.
Don't do that.
All right,, let's, take, a, step, back, and, uh
look at what we've covered. We've gone
through a journey of MCP server
deployment options from the simplest to
the most complex from local to remote
from vendor specific to vendor agnostic.
We started with local MPX execution the
simplest approach sure but it comes with
zero isolation dependency hell and
security nightmares. Your MCP server has
full access to your machine and managing
dependencies becomes your personal
problem. It's what documentation shows
you but that doesn't make it right for
production. Then we move to Docker
locally. Better basilation. True.
Everything in containers great
dependencies bundled together, but it's
still a single user setup on your
machine. Fine for development, not so
much for team collaboration. Next came
Kubernetes with standard deployments.
This is where things get serious. Proper
production infrastructure, multi-user
access, high availability, all the
enterprise features you would expect.
Deploy once, everyone connects. It's
like uh the difference between running a
database on your laptop versus having a
proper database server. We also tried
Kubernetes with a toolive operator which
promised to simplify MCP server
management but ended up adding
complexity without clear benefits. It
uses deprecated SEC mode and adds
another layer of obstruction without
solving fundamental deployment
challenges. Finally, we looked at cloud
platform options. Check them yourself.
So, which one should you actually use?
If you're developing an MCP server
itself, Docker makes sense. You need to
test locally, iterate quickly, and see
immediate results. But let's be clear
this is for MCP server developers, not
MCP server users. For any team or
company setting, the local options are
ridiculous. Why would every developer
spin up their own instance of the same
MCP servers? That's like asking everyone
to run their own Jira or Slack instance.
You want shared services that everyone
connects to. That means Kubernetes or
one of the cloud platforms. For
production workloads, Kubernetes is the
answer. Unless you have a damn good
reason to avoid it. It's vendor
agnostic, battle tested, and gives you
all the operational capabilities you
need. Deploy each MCP server once and
your entire organization can use it.
That's how infrastructure should work.
Now, if you're already committed to a
specific cloud vendor, the native
solutions might make sense. got
everything on Cloudflare? Use workers
deep in AWS? Maybe Lambda will work for
you eventually, but understand that
you're trading flexibility for
convenience. The key insight here is
that MCP servers are just servers.
They're not special snowflakes. They
need to be deployed, exposed, and
accessed like any other service. And
just like you don't run your own copy of
every micros service in your company
you shouldn't run your own copy of every
MCP server. Share the infrastructure
share the costs, share the maintenance
burden. Now, if you want to experiment
with those approaches, especially the
Kubernetes deployments I've been
showing, check out my DevOps CI2P
project. Why not? It's the MCP server
I've been using throughout these
examples. Start the repo. If you find it
useful, open issues. If something is not
working, most likely it's not. Submit
PRC if you want to contribute. All in
all, the MCP ecosystem is still young
and we are all figuring this out
together. The more we share what works
and what doesn't, the better these uh
deployment patterns will become. So
don't just consume, contribute. Help
make MCP deployment less painful for the
next person who comes along. Thank you
for watching. See you in the next one.
Cheers.
Discover the four main ways to deploy MCP servers, from simple local execution to enterprise-ready Kubernetes clusters. This comprehensive guide explores the trade-offs between NPX local deployment, Docker containerization, Kubernetes production setups, and cloud platform alternatives like Fly.io and Cloudflare Workers. You'll see practical demonstrations of each approach using a real MCP server, learning about security implications, scalability challenges, and team collaboration benefits. The video covers why local NPX execution creates security risks and dependency nightmares, how Docker provides better isolation but remains single-user, and why Kubernetes offers the best solution for shared organizational infrastructure. We also examine the ToolHive operator's limitations and explore various cloud deployment options with their respective vendor lock-in considerations. Whether you're developing MCP servers or deploying them for your team, this guide will help you choose the right deployment strategy for your specific needs. ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Browserbase 🔗 https://browserbase.com ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #MCP #ModelContextProtocol Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join ▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/mcp-server-deployment-guide-from-local-to-production 🔗 Model Context Protocol: https://modelcontextprotocol.io ▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below). ▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/ ▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox ▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Model Context Protocol (MCP) Deployment 01:40 Browserbase (sponsor) 02:50 MCP Local NPX Deployment 06:32 MCP Docker Container Deployment 09:23 MCP Kubernetes Production Deployment 14:09 MCP ToolHive Kubernetes Operator 19:15 Alternative MCP Deployment Options 22:46 Choosing the Right MCP Deployment