How to Run MCP Servers: Docker vs Kubernetes vs Cloud Platforms | DailyDevLists

Loading video player...

Full Transcript

4,048 words • EN

[Music]

Everyone is using MCP servers these

days. They're connecting AI agents to

databases, Kubernetes clusters, GitHub

repos, cloud resources, you name it. MCP

is becoming the standard way to give AI

tools access to external systems. But

here's the question. How should you

actually run those MCP servers? The

documentation typically shows you one

way, running it locally with NPX. But is

that secure? Is it scalable? Can your

team share it? What about production?

There are actually multiple deployment

options, each with different tradeoffs.

So, here's what we're going to do. I

will show you four different ways to

deploy MCP servers from the dead simple

to the enterprise ready. We'll look at

local execution with MPX, Docker

containers, Kubernetes deployments, and

operator managed resources. Plus, I will

cover a few notable cloud platforms like

Fly IO, Cloudflare Workers, and AWS

Lambda. For each approach, I will show

you exactly how it works, what problems

it solves, and what new problems it

creates. We will explore all of this

through practical examples. I will use

my DevOps toolkit as the MCP server we

are deploying not to sell you on the

project but because I need a real MCP

server to demonstrate those patterns and

this one works with Kubernetes vector

databases and all the complexity we'll

need to see. Everything you will learn

applies to any MCP server you want to

run. Here's another challenge. AI

agents, even when they're wrapped inside

the MCP protocol, need to authenticate

with web services and access APIs. You

could hardcore credentials, but that's

risky. Environment variables, well

still not ideal. Building your own

credential injection system, well, now

we've got another problem to solve.

That's where the sponsor of this video

comes in. One password and browserbased

launch integration for secure agentic

authentication. You connect your one

password wa to browserbased directory in

under three minutes. When your agent

needs credentials, they're injected just

in time at runtime. The agent doesn't

see them, log them or store them. The

injection includes human in the loop

authorization. When an agent tries to

authenticate, you get a push

notification for approval. Clanas are

automatically mapped to the right

services and there's an audit trail that

logs item ids and timestamps without

exposing the actual secrets. If you want

to learn more, check out

browserbase.com.

And now back to MCP deployments.

Let's start with the simplest way to run

MCP servers, executing them directly on

your local machine. This is typically

the default approach you will find in

most MCP documentation. Whether it's MPX

for JavaScript servers, Python for

Python based ones, or whatever runtime

the server needs, doesn't matter. You

just run the command directly. It's

what's usually documented often because

authors assume you will figure out how

to transform it into something better or

because they were too lazy to show you

the alternative. One of the two. Let me

show you what this looks like. Here's

the configuration file that tells Clo

how to run my AAI MCP server locally.

I'm using my own MCP server for this

demo since that's the one I'm working on

in the penalty of this video. But the

same, and I repeat again, the same

principles apply to any MCP server.

That's pretty straightforward. We're

telling Claud to use npx to run the MCP

server, passing it the package name and

the environment variables it needs.

Notice that quadrant URL over there.

That's our vector database dependency

that needs to be running separately. We

already started it during the setup, but

in a real world scenario, you would need

to manage that yourself. The point is

that not always there is only MCP

server. There are sometimes dependencies

of that MCP server. Now, let's fire up

Cloud with this configuration and see it

in action. Once clo starts up, let's

test if the MCP server is working by

asking it to list the patterns available

in that MCP server. There we go.

Perfect. The MCP server is running

locally and Cloud can communicate with

it. We can see the patterns that are

stored in quadrant. Everything works as

expected. Now, this local approach might

seem great at first, but there are some

serious trade-offs you need to consider.

This is typically the default and

sometimes the only documented way to run

an MCP server, but that does not mean

it's the best approach for your

situation. First of all, adding

dependencies is a pain in the ass. See

how we need quadrant running separately.

In a more complex setup, you might need

multiple services, databases, or other

dependencies. Now, agents typically

start the MCP server when they launch

and shut it down when they exit them.

But if you wrap everything in a script

to handle those dependencies, you might

end up with zombie processes. The agent

might kill the script but leave all the

child processes running. Before you know

it, you'll get a bunch of orphan

services eating up your resources. Then

there's the installation requirement.

You need node and npx installed on your

machine for this JavaScriptbased MCP. If

you're using a Python MCP, you need

Python. Ruby MCP, you need Ruby. Your

machine starts to become a mess of

different runtimes and package managers.

But here's the biggest issue. There's

absolutely no isolation. This MCP server

has direct access to your entire laptop.

It can read your files, access your

network, and do whatever the hell it

wants. Sure, you might trust the MCP

server code, but do you trust all its

dependencies? All their dependencies?

It's a security nightmare waiting to

happen. Look, this might be the easiest

approach aside from needing the right

runtime installed. It might be what's

documented everywhere, but let's be

honest, it's potentially the worst way

to run MCP servers. You've got

dependency management issues, process

life cycle problems, and zero, I repeat

zero isolation. There has to be a better

way, right? Let me show you some

alternatives that address those

problems.

Now, let's try a better approach.

Running MCP servers in Docker

containers. And here's the thing. At the

end of the day, an MCP server isn't that

different from any other server. Sure

it uses stdio instead of HTTP for

communication, but it's still just a

server that needs to run somewhere. And

how do we typically run servers locally

these days? Docker. It gives us

isolation, better dependency management

and we don't need to pollute our

machines with various runtimes. Let me

show you how this works. The

configuration is slightly different.

Now, instead of running npx directly, we

are telling cloud to use docker compose.

See the difference? We're using docker

compose run as the command. The docker

compose file handles all the complexity.

Starting quadrant, setting up networking

between containers, managing volumes

all that stuff. The dash rm plug ensures

containers are cleaned up when we're

done and remove orphans takes care of

any leftover containers from previous

runs. Now let's fire up cloud with this

dockerbased configuration. This time

let's test it with a different command.

Instead of listing patterns, we'll ask

for Kubernetes capabilities that the MCP

server has discovered just to vary a

bit. There we go. Excellent. The Docker

based MCP server is working perfectly.

It discovered a bunch of capabilities

from our Kubernetes cluster. Now notice

how everything just works without us

having to manage quadon separately or

worry about process life cycles. Docker

compose handles all of that for us. So

what can we gain with this Docker

approach? First and foremost, we got

proper isolation. The MCP server runs in

its container with controlled access to

resources. Unless you do something silly

like mounting your entire file system or

running containers in privileged mode

you're much safer than with the direct

local approach. Everything runs in

containers, which means you don't need

to install node, npx, python, or

whatever runtime the MCP needs. Just

Docker and you're good to go. All the

dependencies are bundled together in the

compos file. Quadrant, the MCP server

networking between them, it's all

defined in one place and managed as a

unit. But here's the thing, it's still

running locally. This is still a single

user setup on your machine. Now, you

might be thinking, can't I just run

Docker on a remote server? Sure, you

could expose Docker's API over the

network, but that opens up a whole can

of security worms. You would need to

manage TLS certificates, authentication

network access. It gets complicated fast

and you're basically rementing

infrastructure that already exists.

There's a better way to go truly remote

with proper multi-user support, high

availability, and all the enterprise

features you might need. Let me show you

what that looks like.

Time to go truly remote. Here's where we

make a fundamental shift. Think about

it. Running MCP servers locally is like

local development. Everyone spins up

their own instance, manages their own

dependencies, deals with their own

problems. But what if we could run MCP

servers like production services? deploy

them once properly and let the entire

team or company connect to them. That's

exactly what Kubernetes gives us. So

this isn't just about containerization

anymore. It's about turning MCP servers

into shared organizational

infrastructure. Instead of every

developer running their own instance of

various MCP servers, we deploy them once

to Kubernetes and everyone connects to

the same properly managed services.

Whether it's myAI server or your custom

MCP for internal tools or that third

party MCP for cloud resources, the

approach is the same. So let's deploy an

MCP server to Kubernetes using Helm. I'm

using myAI server as example. But as I

already said, everything you see here

applies to any MCP server you want to

run in production. Did you notice

something important here? I am running

these commands, not load. This is

fundamental shift. In the previous

examples, the agent was responsible for

spinning up MCP servers when it started

and shutting them down when it stopped.

The configuration told the agent how to

lounge the server. Now we separated

those responsibilities. A human or

GitHubs or your CI/CD pipeline deploys

the MCP servers to Kubernetes just like

any other production service. The agents

just connect to them. They don't manage

their life cycle anymore. Let's see what

Kubernetes created for us. There we go.

Perfect. We got the MCP server running

as a deployment. Quadrant as a stateful

set for persistent storage services for

internal communication and an ingress

exposing it all at specific URL. This is

proper production infrastructure running

independently of any agent. Now let's

look at how agents connect to this

remote MCP server. Look at this

configuration carefully. We not telling

code to run the MCP server anymore. We

are telling it to connect directly to

the already running MCP server and that

URL using HTTP transport. The type HTTP

tells Cloud to use HTTP transport to

communicate with the remote MCP server

and the URL points to our Kubernetes

endpoint. This allows Cloud to

communicate directly with the MCP server

running in Kubernetes over HTTP. This is

clean direct connection without any

local bridge processes or protocol

translation. The agent speaks HTTP

unlike the default which is std IO. It

speaks HTTP directly to the remote MCP

server. So let's connect to our MCP

server remote one and check its status

to make sure everything's working.

Excellent. We're connected to the MCP

server remote MCP server running in

Kubernetes. Notice that it shows the

Kubernetes version. Quadrant is

connected. All the services are healthy.

This is the same MCP server is deployed.

But now multiple users can connect to it

simultaneously. So what have we achieved

with this Kubernetes approach? We got

real isolation through Kubernetes

namespaces, ARB and network policies.

The MCP server can't access your laptop

anymore. It's confined to its name space

with only the permissions you explicitly

granted. Everything still runs in

containers. So there's nothing special

to install on developer machines. But

now we also get all the Kubernetes

goodies, higher availability if you want

it, autoscaling, security policies

audit log, the whole of shenanigans, the

whole enterprise package. Most

importantly, this is truly remote and

multi-user. You deploy the MCP server

once and your entire team connects to

it. Everyone shares the same patterns

the same capabilities, the same

configuration. It's like the difference

between everyone running their own

database locally versus connecting to a

shared production database. Now to be

frank, the setup is more complex than

local deployment, but that's the nature

of production infrastructure. You need

the Kubernetes cluster, Helm charts

controllers, and so on and so forth. But

here's the thing. You do this setup once

and everyone benefits. It's

infrastructure, not something each

developer needs to figure out. Still

there's another approach using

Kubernetes operators that claims claims

to simplify MCP server management. Let

me show you what happens when you add

ToolHive to the mix.

Stack clock created an operator called

toolhive that promises to simplify MCP

server management. The idea is that

ToolHive manages MCP servers as

Kubernetes custom resources with

additional features like permission

profiles and resource management. Let's

see if it actually delivers on that

promise. Toolhive treats MCP servers as

firstass Kubernetes citizens. Instead of

deploying standard deployments and

services, you create an MCP server

resource and the operator takes care of

the, rest., Or, at least, that's, the, theory.

So let's deploy the same MCP server

using toolive instead of standard

Kubernetes resources. And notice the

deployment method equals toolhive

parameter in the help command. That's

important. And there we go. There's our

MCP server custom resource instead of a

regular deployment. Now let's dig deeper

and see what this custom resource

actually contains. That over there is a

lot of YAML. The key things to notice is

that it's got the container spec

embedded in pod template spec. It

specifies transport as streamable HTTP

and in the status you can see the proxy

URL. Toolhive took our MCP server

definition and created the necessary

pods and services. Uh wait, notice the

proxy mode set to SSC that server sent

events. And here's the problem. SSC

transport is deprecated in the MCP

protocol as of November 2024. MCP moved

to streamable HTTP instead. So Toolhive

is using a deprecated transport mode.

not exactly filling me with confidence

about this operators feature. But let's

continue. Let's see all the resources

that were created both by toolhive and

by our helmchart. Look at all those

resources. The helmchart created the MCP

server custom resource quadrant the

ingress and other supporting resources.

Then toolhive operator saw that MCP

server resource and created additional

pods and services like MCPO and uh

MCP.AI MCP proxy. So we've got resources

created by Helm and resources created by

toolhive based on what Helm created.

It's layers upon layers of abstraction.

Now here's the interesting part. Let's

see how the client connects to this tool

have managed MCP server and whether it's

any different from our standard

Kubernetes approach. Hey, notice that

the configuration is identical to our

standard Kubernetes deployment. We're

using HTTP transport to connect directly

to the MCP server. Toolhives proxy

supports both SDIO and HTTP transports

but HTTP is clearly the better choice.

According to Toolhive's own performance

testing, stdio transport has severe

performance limitations. In their tests

with 50 concurrent requests, only two

succeeded. The stdio implementation is

unsuitable for production use. So by

using HTTP transfer directly, we get

better performance and eliminate the

complexity of stdio to HTTP translation

that would otherwise be needed. So

let's connect to the MCP server anyway

just to verify it works. And there we

go. It works. The MCP server is running

and accessible. But let's be honest

about what we actually achieved here.

The end result is almost the same as

when using standard Kubernetes

resources, except now some of them are

created by the operator. We still need

the ingress. We still need to manage

secrets. We still need supporting

resources. The tool hive custom resource

didn't eliminate complexity. It just

moved it around. That's why I had to

wrap everything in a helm chart anyway.

So was the theoretical advantage of

toolhive? The main selling point was

supposed to be better MCP server life

cycle management through Kubernetes

operators. But in practice, it's just

another layer of abstraction doesn't

solve the fundamental deployment

challenges. The stddio transport it

supports has catastrophic performance

issues, which is why we're using HTTP

transport. Anyway, at the end of the

day, running MCP servers is just like

running any other HTTP server. You

deploy them, you expose them through an

ingress and connect to them over HTTP.

Toolhive adds a custom resource

obstruction on top. But I'm honestly not

seeing the value. It's using deprecated

SSE mode and it adds another layer of

complexity without clear benefits. So if

you prefer using operators and custom

resources, sure to hive is an option.

But given its limitations and the fact

that it doesn't actually deliver on its

main promise, I would stick with

standard Kubernetes deployments. At

least those are straightforward and

don't pretend to solve problems they

can't actually solve. Let me show you a

few other deployment options before we

wrap it up.

So before we wrap up, let me quickly

quickly quickly quickly run through some

other ways to deploy MCP servers.

There's a whole ecosystem emerging

around MCP deployment. And while I

focused on Kubernetes because it's

vendor agnostic and production ready

you should know what else is out there.

Fly.io takes an interesting approach.

They run MCP servers as tightly isolated

VMs, which they call fly machines. You

can deploy with a simplify MCP launch

command and they handle authentication

routing for you. They support both

single tenant each user gets their own

app and multi-tenant patterns. It's

pretty sleek if you're already in their

ecosystem. It is then there is

cloudflare workers uh that went all in

on MCP. They provide allout

authentication out of the box, zero

egress fees and CPU based billing that's

perfect for streaming connections. You

can deploy MCP servers as edge functions

that run close to your users. Their

workers MCP tooling handles the protocol

translation for you. If you're looking

for truly serverless MCP, this is

probably your best bet. Probably AWS

Lambda offers the AWS serverless MCP

server. It works technically but the

developer experience is rough. Cold

starts are painful and the stdio

transport has serious performance issues

on lambda. Then there is Versell that

lets you add MCP endpoints directly to

your Nex.js apps using the MCP handler

package. If you already have NexJS uh

applications on Versel, this is the path

of least resistant. But watch out for

their egress charges and memory based

billing for idle connections. Watch out.

Railway keeps things simple. It's a

deployment platform that just works

without needing platform engineers.

Deploy your MCP server like any other

app. Nothing fancy, but sometimes that's

exactly what you need. Then there is

Podman with MCP server uh implementation

that deserves a mention though it's not

really a deployment solution. It's an

MCP server that lets AI agents manage

Podman containers on your local machine.

Think of it like the Docker MCP server

but for Podman users. still local still

has all the same limitations we

discussed earlier and here's the thing

about all those alternatives they each

have their niche fly IO is great for

multi-tenant isolation cloudflare excels

at edge deployment with minimal latency

AWS Lambda well it exists where cell

makes sense if you're already there but

they all have one problem vendor locking

you pick cloudflare you're stuck with

cloudflare you pick AWS you're stuck

with AWS most of them are still figuring

out MCP they are still not sure the

implementations are evolving the

performance varies widely and the

developer experience ranges from decent

to painful that's why I keep coming back

to Kubernetes it's vendor agnostic you

can run it anywhere AWS Google cloud

Azure on premises or that server under

your desk the deployment patterns are

mature the tooling is solid and you're

not betting your infrastructure on

single vendor's interpretation of MCP

unless you're a small company with just

a few apps or you're already deeply

committed to a specific cloud vendor.

Kubernetes remains the most flexible

option for production MCP servers. But

hey,, at least, now, you, know, what's, out

there. Choose what works for your

situation, not what I tell you to use.

Don't do that.

All right,, let's, take, a, step, back, and, uh

look at what we've covered. We've gone

through a journey of MCP server

deployment options from the simplest to

the most complex from local to remote

from vendor specific to vendor agnostic.

We started with local MPX execution the

simplest approach sure but it comes with

zero isolation dependency hell and

security nightmares. Your MCP server has

full access to your machine and managing

dependencies becomes your personal

problem. It's what documentation shows

you but that doesn't make it right for

production. Then we move to Docker

locally. Better basilation. True.

Everything in containers great

dependencies bundled together, but it's

still a single user setup on your

machine. Fine for development, not so

much for team collaboration. Next came

Kubernetes with standard deployments.

This is where things get serious. Proper

production infrastructure, multi-user

access, high availability, all the

enterprise features you would expect.

Deploy once, everyone connects. It's

like uh the difference between running a

database on your laptop versus having a

proper database server. We also tried

Kubernetes with a toolive operator which

promised to simplify MCP server

management but ended up adding

complexity without clear benefits. It

uses deprecated SEC mode and adds

another layer of obstruction without

solving fundamental deployment

challenges. Finally, we looked at cloud

platform options. Check them yourself.

So, which one should you actually use?

If you're developing an MCP server

itself, Docker makes sense. You need to

test locally, iterate quickly, and see

immediate results. But let's be clear

this is for MCP server developers, not

MCP server users. For any team or

company setting, the local options are

ridiculous. Why would every developer

spin up their own instance of the same

MCP servers? That's like asking everyone

to run their own Jira or Slack instance.

You want shared services that everyone

connects to. That means Kubernetes or

one of the cloud platforms. For

production workloads, Kubernetes is the

answer. Unless you have a damn good

reason to avoid it. It's vendor

agnostic, battle tested, and gives you

all the operational capabilities you

need. Deploy each MCP server once and

your entire organization can use it.

That's how infrastructure should work.

Now, if you're already committed to a

specific cloud vendor, the native

solutions might make sense. got

everything on Cloudflare? Use workers

deep in AWS? Maybe Lambda will work for

you eventually, but understand that

you're trading flexibility for

convenience. The key insight here is

that MCP servers are just servers.

They're not special snowflakes. They

need to be deployed, exposed, and

accessed like any other service. And

just like you don't run your own copy of

every micros service in your company

you shouldn't run your own copy of every

MCP server. Share the infrastructure

share the costs, share the maintenance

burden. Now, if you want to experiment

with those approaches, especially the

Kubernetes deployments I've been

showing, check out my DevOps CI2P

project. Why not? It's the MCP server

I've been using throughout these

examples. Start the repo. If you find it

useful, open issues. If something is not

working, most likely it's not. Submit

PRC if you want to contribute. All in

all, the MCP ecosystem is still young

and we are all figuring this out

together. The more we share what works

and what doesn't, the better these uh

deployment patterns will become. So

don't just consume, contribute. Help

make MCP deployment less painful for the

next person who comes along. Thank you

for watching. See you in the next one.

Cheers.

How to Run MCP Servers: Docker vs Kubernetes vs Cloud Platforms

DevOps & AI Toolkit

27 days ago

26:32

Infrastructure as Code

Rank #2

Description

Discover the four main ways to deploy MCP servers, from simple local execution to enterprise-ready Kubernetes clusters. This comprehensive guide explores the trade-offs between NPX local deployment, Docker containerization, Kubernetes production setups, and cloud platform alternatives like Fly.io and Cloudflare Workers. You'll see practical demonstrations of each approach using a real MCP server, learning about security implications, scalability challenges, and team collaboration benefits. The video covers why local NPX execution creates security risks and dependency nightmares, how Docker provides better isolation but remains single-user, and why Kubernetes offers the best solution for shared organizational infrastructure. We also examine the ToolHive operator's limitations and explore various cloud deployment options with their respective vendor lock-in considerations. Whether you're developing MCP servers or deploying them for your team, this guide will help you choose the right deployment strategy for your specific needs. ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Browserbase 🔗 https://browserbase.com ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #MCP #ModelContextProtocol Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join ▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/mcp-server-deployment-guide-from-local-to-production 🔗 Model Context Protocol: https://modelcontextprotocol.io ▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below). ▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/ ▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox ▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Model Context Protocol (MCP) Deployment 01:40 Browserbase (sponsor) 02:50 MCP Local NPX Deployment 06:32 MCP Docker Container Deployment 09:23 MCP Kubernetes Production Deployment 14:09 MCP ToolHive Kubernetes Operator 19:15 Alternative MCP Deployment Options 22:46 Choosing the Right MCP Deployment

Video Details

Category

Infrastructure as Code

Featured Date

November 15, 2025

Quality Rank

#2

AI Recommended