Loading video player...
Hey guys, welcome to another Kubernetes
hands-on training. I've been looking
into the model context protocol by
Antropic for quite some time now and I
decided to make a video tutorial on it
um to solidify my knowledge and share it
with the community and today I'm going
to build something that connects two
cutting edge technologies, the MCP
protocol and the Kubernetes operators. I
hope you have enough time to follow
along. Let's dive right in.
[Music]
In this tutorial, we're going to build a
Kubernetes operator for managing MCP
servers. And before I dive deep into the
details, I need to clear some basics.
First, we need to talk about the MCP
product, its architecture, and basically
understand what problem it solves. Why
do a uh application use MCP server to
connect to their data sources and tools?
Then we're going to look at the MCP
server operator that aims to automate
the life cycle of our MCP servers. By
the way, I forgot to mention um this
readme and all the instructions in it
and all the text and the resulting code
that we will be building together will
be put in a repository. I'll put the
link to that repository in the
description part of the video. So now
let's talk about the MCP protocol. So
what is MCP? MCP or the model context
protocol is an open-source standard or
protocol
uh developed by entropic that enables AI
application to connect to uh external
services, tools, data sources
that already exist are running somewhere
in the cloud on prem on a server
somewhere
and uh provide textual content text
generated out of those tools and
services and give that uh textual
context to the AI application or the
language model. You can think of it as
USB for AI integration.
So now let's talk about what problem MCP
solves. Um before MCP was introduced by
Anthropic, uh when people wanted to
connect their AI applications and the
language models embedded in those AI
applications to their existing um tools
or systems,
they needed to build a custom
integration between that AI application
and their tool. For every
um pair of AI application and tool,
there needed to be a custom integration.
On the left side, you can think of um
the GPT model from OpenAI, Gemini, from
Google Cloud, from uh Entropic and so
on. And on the right side uh you have
your APIs, your custom tools that you
want to connect them to your um language
model. You want to somehow implement
some sort of automation
for those APIs with the help of language
models. Uh if you had m AI applications
and n tools on the right side, you
needed to build m byn
uh custom integrations.
So what MCP does is that it reduces the
complexity of this problem into
an M plusN problem. So you still have
the uh applications or different
language models on the left side, but
this time you build the standardized
um universally accessible server, an MCP
server of your tool and then you can
connect it to either Gemini or uh
OpenAI.
Now let's talk about the MCP
architecture.
Similar to web applications, MCP also
follows a client server architecture.
But we have three main building blocks.
We have the host or the AI application
so to say. You can think of it like
cloud uh copilot in VS code or cursor
chat and so on. Then we have the MCP
client
which is a software component inside
your AI application that maintains a
connection to an MCP server and receives
the context from the MCP server. And
then we have the MCP server itself which
uh provides context tools and
capabilities to MCP clients. Those
capabilities or types of context that
are provided by the MCP server are
called resources, tools and prompts. So
what are those? Um resources as the name
suggests is a static data or content
that just gives the application or
language model just some more
information, some more useful relevant
information and helps it generate more
useful content. Then we have the tools
which is most probably the most uh
important primitive out of the three
which allows the AI application to
perform actions and execute those uh
tools. Then we have prompts which as you
can guess is a set of predefined
templates for interacting with AI. So
let's say if your application is about
solving certain things or doing certain
things let's say in a specific scope in
finance in in law in in medicine you can
define predefined templates that
whenever is needed by the AI application
it knows what would be a a good to-do
list to get that certain task done. It
helps the AI approach problems more
systematically.
All right, let's now talk about why MCB
matters for Kubernetes.
Before we answer this, um, let's zoom
out a bit and first answer why MCB
matters at all.
So as we talked about MCP enables AI
applications to call tools which as we
saw are basically wrappers around our
APIs and do things on our behalf which
is pretty amazing. But if you want to
now apply this tool to Kubernetes
what does it mean like what problem does
Kubernetes has that we want to apply MCP
on it? Kubernetes is also a ton of APIs.
It's a pretty complex
orchestration platform.
It has a lot of moving parts and it it
evolves uh continuously.
A lot of those APIs are built as
abstractions to hide the complexity of
the underlying infrastructure of the
developers or the users of those API.
But in the end, those APIs, this this
this bag of APIs is pretty complex and
imposes a huge cognitive load makes
Kubernetes not easy to handle or not
easy to digest so to say. If we build
the right primitives in our MCP server
for Kubernetes API, we can make MCP
query our Kubernetes cluster state,
deploy and manage workloads and so on.
So think of workloads pods for example.
If you want to create a pod you need to
call an API the pod API from the core
API group. If you want to create a
deployment or do anything with the
deployment you need to call the
deployment API from the apps API group.
So all of your operations will have to
go through the Kubernetes API.
All of your operations are API calls. So
we can make AI do these API calls for
us. So if you build the right MCP
server, we can query the cluster state.
We can troubleshoot issues within our
cluster with the help of AI and
basically automate operations based on
the events and what happens in our
cluster. So we're going to start by
building a very simple MCP server for
Kubernetes
which basically can operate or
troubleshoot pods. And then we're going
to make that MCP server
a first class citizen in the Kubernetes
world which basically means it's going
to be it's going to turn into Kubernetes
API itself. So it's going to we're going
to build an operator for it. Let
Kubernetes manage its life cycle. So now
let's talk about the resources that
we're going to build for MCB server. So
now let's look at a sample resource that
we're going to define for our Kubernetes
MCP server. Uh here I'm using TypeScript
uh and the MCP um TypeScript library.
The server construct basically is a
construct defined uh or imported from
the MCP TypeScript library. It has a
method called register resource that we
can use to register our resources.
A good example for a resource when it
comes to implementing MCP server for
Kubernetes is the cluster information.
So here basically give it a name and
then designate a um URL path for our uh
resource in our MCP server. So whenever
the AI application wants to query the
cluster information and it calls this
URL path uh of our MCP server and uh
then we have the metadata block. So
basically tell the AI application the
MCP server tells the AI application what
this resource is about and what is the
output format. uh basically use mime
type application JSON to tell our
application that the output is in JSON
format and then whenever the application
calls this resource or requests this
resource from our MCP server uh the this
content is going to be returned like
we're going to return to the AI
application the number of nodes the
number of pods and the version of our
cluster it is important to note that
resources are read only. So when AI
application calls them they don't have
any side effects on the uh state in our
case the cluster state and also they
should be item potential. So which
basically means you can call them as
many times as you want. The outcome in
our case might change because the number
of pods or nodes can change. Kubernetes
is dynamic environment but calling the
resources does not have any effect on
getting a different result that what
that's what makes it ident.
So now here we're going to look at the
MCP tools. Um this is a sample tool. Uh
as you saw in the resources there is no
input. We just query the state. But for
tools we have inputs. So is we again now
we have the server construct we call the
register tool method and uh we give it a
name against some metadata and this time
we expect some input from the AI model
from the AI application
and this input has a format has a schema
uh it needs to be string the name image
and namespace. So we want if whenever we
want to create a a pod the base minimum
uh input that we expect is the name of
the pod the image that the pot will be
running and the name space in which name
space the pot needs to be created. The
rest of the stuff will be assumed by our
MCP server. So our create pod takes care
of uh putting the right uh security
context or uh putting the right uh
resource requests or limits and so on.
Uh whenever the uh tool is called a uh
Kubernetes client, Kubernetes API client
in Typescript in our case will be called
to do the actual job to actually call
the Kubernetes API to create the pot. Uh
so as we see here Kubernetes API will
create a namespace pod
um uh with the right image and the right
name and in the end it returns this
content of type text not application
JSON. It will just tell the AI
application that I created a pod with
this name in name space. Uh, as we see
tools have side effects and they mutate
the state of our clusters. And here's an
example of a prompt very similar. Um,
prompts uh have uh also a name, a
metadata and uh the arguments that needs
to be uh basically fed into a prompt. In
prompts, MCP does not call them inputs.
it MCP calls them arcs or arguments but
very similar to inputs we have for
example for this troubleshooting a pod
prompt um we have a pod name and the
name space as the arguments that can be
replaced in this generic prompt so
whenever for example in case of
troubleshooting whenever the AI
application decides to troubleshoot a
certain pod it already has a predefined
um prompt or to-do list how to
troubleshoot a pod or where to look into
to troubleshoot a pod. So this prompt
basically defines if you want to
troubleshoot the pot uh with this name
in this name space check the status and
events container logs resource request
limits and so on. So it gives a very
nice um predefined to-do to the AI
application to approach the problem
systematically.
All the MCP messages exchanged between
the AI application and our MCP server
are sent over the JSON RPC 2.0 protocol.
So MCP is a protocol itself. Under the
hood, it uses this JSON RPC 2.0 protocol
which is a stateless
uh RPC or remote procedure call protocol
that uses JSON as a data format. The
transports that are supported by the MCP
protocol as of now are SCDio which
basically means standard uh IO or input
output. The AI application and the MCP
server are both running on the same
machine. the MCP server um writes to the
standard IO and the AI application has
access to the same standard IO of the
same uh of that machine and can receive
the messages. Another option is
HTTP/server
send events which uh was initially
introduced by Antropic when they
introduced the MCV protocol which
basically means you
uh deploy your
MCP server just like a typical HTTP web
server somewhere and then you provide
some URL to the to the AI applications
or to the clients of that that MCP
server.
and the the AI applications
can configure themselves to call the
remote uh MCP server. Uh the third
option which is a more modern approach
to the HTTP uh serverside event is the
streamable HTTP
uh which allows the streaming the
generated text from the MCP server to
the AI application similar to what we
see in modern
AI chat environments like chat or cloud.
Next, let's talk about the um a session.
The concept of session in MCP. Whenever
the AI application wants to connect to
an MCP server, it calls the initialize
method of the MCP server
and uh then the MCP server negotiates
its capabilities or communicates its
capabilities to the AI application and
basically tells it for example I have
these tools, resources, prompts and so
on and um it doesn't have to then the AI
application
uh tries to list what the is this MCP
server capable of and this will be sort
of a knowledge for the AI application
and at the time of active usage whenever
the language model or the AI application
um decides to
call the MCP servers or uh for tools or
resources or prompts the call or read
operations are performed
and in the end uh there is a close or
clean up um operation that uh closes the
session between the application and the
MCP server. MCP protocol is a stateful
protocol which basically means the state
of your um protocol is the communication
or the generated text and JSON and and
communicated between the AI application
and the MCP server. Here's some real
world uh MCP implementations that are
already developed and built by the
community. Uh we have MCP servers for
GitHub uh for automated code operations,
Postgress, Slack, uh Kubernetes,
uh and for file system operations. In
case of Kubernetes,
uh Kubernetes has a lot of APIs. it you
can really think of it as a huge bag of
APIs and it doesn't really make much
sense to have one MCP server for all of
Kubernetes
because then we are going to provide too
much context to the language model
and researchers show that too much
context makes the AI or language models
uh performers worse. So uh there are
Kubernet FCP servers for the vanilla
Kubernetes APIs and uh MCP servers for
third parties like there's a MCP server
for Argo CD or Flux and so on and um
usually the
third party uh tools that extend the
Kubernetes API now to make it easier to
manage their own API they build an MCP
server so that you can create for
example Argo rollouts with the help of
AI or you can uh perform Argo operations
with the help of the MCP server built
for Argo CD. So now enough with the
theory let's get into the hands-on part.
Great. Now let us build our first MCP
server. Before we do that we need to
make sure that our dev environment is
properly set up for development. We need
node npm and a simple kubernetes cluster
like kind or mini cube. I'm using kind.
And uh you need to make sure that your
node version is also 18 or later. So um
I have all the commands that we need to
check stuff or uh install stuff.
To make my life easier, I have installed
a an extension on cursor. You can also
install it on VS code. um which
basically helps me run the commands
easily by by clicking on this run button
here. So if I search for markdown,
this is the extension that I have
installed. You can also install it on uh
your editor. Let's go back to the code.
So I'm going to run this block of bash
and see the results. You see I have node
version 2020 and I have access to a kind
cluster which was created 5 minutes ago.
Everything looks
good. So let's move on to the next part.
The next part we want to create the
structure of our repository.
So, we're going to put a create a
directory called workspace and put an
MCPL lab directory in there and uh
initialize a a node module
by running that. So, if I do that, I
have my workspace mcpl and package json.
And um there's a shortcoming of this
extension when if I do cd commands in
it, it basically does it in a temporary
bash session and uh it does not persist.
I have to manually
cd into that
uh directory.
Cool. Now we need to install the
dependencies. The dependencies we need
are
the MCP SDK, the the ZOD library for
schema validation,
the Kubernetes client library in
TypeScript. We also need um Express
because we're going to build an HTTP
streamable version of our MCP server.
We are using TypeScript and we need the
types of those libraries uh for our
application to work properly. So I'm
going to click on this run button here
also
and make sure I install all the
dependencies properly.
And uh if you move on, we're going to
now create a boilerplate package JSON
for our node module. We're going to
create some common handlers for our MCP
server. And then we're going to build
our MCP servers uh in two different
transport modes. We're going to uh build
an MCP server that works with SDIO and
we're going to build an MCP server that
is HTTP streamable. And for that the
most important part of this package.json
is the scripts directory. Uh we have
separated the um compilation and um
start commands of uh those different MCP
server types.
So if we basically do run npm rundev, we
compile our stdio.io dev server. And if
you do mpm rundev http, we're going to
compile the http
streamable mcp server. I'm going to
simply run this command again to make
sure that I have the
uh
package json that I need.
And uh then I also need typescript um a
typescript config. I'm not going to
focus on u the different fields in the
TypeScript configuration. It is beyond
the scope of this tutorial. But
importantly for us is the root directory
and source directory. We are going to um
for compiling TypeScript, we're going to
look at whatever is in the source
directory and the out outcome of the
TypeScript uh compilation will be stored
in the um distro
uh directory. So I'm going to run this
command and create our TS config.
Awesome. Now um we have all the files
that we need for node and typescript.
And now we need to create our repository
structure or project structure. Uh as I
said we need a source directory that
contains uh basically all our all our
sources and we can also create
directories for examples or some
configurations that we want to apply to
our code. Let's just simply run this
command. And now we see we have the
configuration and the examples and the
source basically the repository of our
project. And now to make sure that we
have all the dependencies installed and
everything looks good, we can run this
block. So we do npm installed on these
package JSON and uh check the versions
of the installed packages to verify the
successful installation.
Also to check the connectivity to the
Kubernetes API and the permission basic
permissions, we can run this block of
code. If I have simply created my kind
cluster by uh doing kind create cluster
which basically means the credentials or
the cube config I get has the uh basic
access to uh nodes, pods and so on. and
uh our MCP server uh is going to use uh
the cube config the default cube config
of our client cluster. So it makes sense
that the to check that our credentials
have access to the po pods and nodes
API. So I'm going to run this and you
see that I have access to a cluster. I
can get the pods. I can get nodes and uh
so on. So we did set up everything in
our environment. Now uh I'm not going to
run this uh block of echo commands but
basically we did install a NodeJS
TypeScript MCPS SDK and we make sure we
made sure that our Kubernetes is
accessible and up and running. Perfect.
Now let's focus on the handlers or the
common reusable uh shared code that
we're going to implement for our MCP
server. All right. Now we want to build
the handlers or the common reusable code
for our MCP server. We need the server
class from the MCP SDK. We need the
Kubernetes client uh in TypeScript.
We're going to import everything and use
the K8 alias for it. And we need the
schemas for the primitives that our MCP
server is going to support. So we need
to uh be able to uh list resources, read
resources, list tools like the AI
application needs to first know about
all the possible tools and resources and
prompts and then also need to be able to
read the resources or call tools or get
prompts. uh that's why we we need to
import all the schemas from the types of
our MCP SDK
and then we need to initialize the
uh Kubernetes client. Uh we create a new
instance and then when we call the load
from default we're going to use the
default um uh cube config that exists on
my machine config that has the API
server address in it. it has the context
set and the credentials that is going to
be used to talk to a certain API server.
So we're just going to simply load it.
Then we're going to instantiate a
Kubernetes client for the core group to
create pods and so on and ais client for
the apps group for deployments.
All right. Now let's look into the
resource handlers.
As we said the MCP server supports three
different primitives. the resources, the
tools and the prompts.
For us to uh enable our MCP server to
support these, we need to um register
handlers for different types of
primitives and the different methods
that can be run on those primitives. So
the AI application might reach out to
the MCP server and ask for uh the list
of resources that the MCP server
supports. Whenever that's the case, the
server needs to have needs to have a
proper handler for that. So when the AI
application tells the MCP server what
are the resources that you provide, uh
our MCP server is going to reply and
these are the resources. I have two
resource types. Basically just returns
uh some metadata around the um resources
that it supports. uh our MCP server can
tell us about the nodes in the cluster
and also can uh tell us about the pods
running in the default name space. The
mime type application JSON. So then the
AI application knows whenever it calls
these resources should expect a an a
JSON formatted text that contains
information about cluster nodes or the
pause in the default name space. uh if
the AI application wants to actually
read those resources, not just listing
them, but actually reading information
from our um cluster. Um we need to have
a request handler for reading resources.
One of the um things that are uh
specific to uh resources are the URIs
that we specify to uh uniquely designate
the resources.
We use the K8 protocol and
clusters/nodes
for example for the nodes. And if the
requested URI from the AI application is
this, then we're going to ask the
Kubernetes client to list the nodes and
then uh use the map function to create a
properly formatted nodes um JSON object.
And then we're going to stringify this
nodes object and also put the number of
nodes in JSON string and return it to
the AI application. And in case of
error, we're going to return um the
error message that occurred during this
operation. For uh the resource of
reading pods uh the URI
starts with slash namespace and then
comes the name of the namespace and
slashpods.
If the there is no namespace provided by
the application, the MCP server will
assume that the U AI application needs
to know about the default name space.
Very similar. It just u forwards this
information to the K uh API or the HTTP
or the Kubernetes client list the
namespace part of namespace that was
extracted from the URI. again create a
nice map and then stringify this um
generated map and return it to the AI
application and in case of error just uh
propagate this error back to the AI
application. So I'm just going to run
this for now very simple as you see the
the core logic is done by the um
Kubernetes API client and MCP is just a
wrapper on an existing API. So I'm going
to run this now. And now in our
handlers, we have the
handlers file. Now let's talk about the
tool handlers.
Similar to resources, the application
wants to list the tools that the MCP
server provides. Uh whenever such
request comes in, the MCP server should
have uh the proper handler for it and
should return some metadata around uh
the tools that it supports. Most
importantly is the input schema. So the
MCP server should let the AI application
know if it wants to call a certain tool
what inputs it should provide and this
is the context that is written. So the
MCP server tells the application if you
want to create the pod you need to
provide uh name image and name space at
least and name and image are absolutely
necessary. If you do not provide the
name set I'm going to go ahead and
create it in the default name space. We
have another um tool get pod logs simply
looks at the um logs of a certain pod in
a certain name space. Again the required
input is going to be the name of the pod
delete pod list pod and so on. Very
simple just uh uh some metadata around
the tools that the MCP server supports.
So now if the AI application wants to
actually call the tools that the MCP
server provides, it needs to um send its
request using this call to request
schema. The MCP server can extract the
um incoming input from from the request.
For example, if the create pod is
called, the application then creates a
default pod manifest. You see these u
resource request limits and so on are
hardcoded.
it um gives this manifest generated
manifest to the Kubernetes client and
the Kubernetes client is going to create
a namespace pod and um uh return the
result of that operation as a text to
the AI application. It's either
successful or uh it's a failure. Similar
to creating pods, we have an operation
or a tool for getting pod locks,
deleting pods, listing pods, and so on.
Very simple. You can just go ahead and
look at the logic of the handler for the
tools. I'm going to simply run this
block to add it to our uh handlers file.
So now let's talk about the uh prompt
handlers. Very similar. The AI
application should be able to list the
existing prompts supported by the MCP
server. The MCP server tells the AI
application that it is capable of
troubleshooting pod or optimizing
resources and returns the proper uh
metadata and most importantly it tells
the application that the argu these are
the arguments that are needed for for
this prompt and um if the AI application
actually decide to use those prompts uh
the the request will come in with the uh
get request request schema and u for
example for troubleshooting pod we have
this well-defined um prompt that can be
used to troubleshoot pods. There are
certain places that uh the the uh MCP
server can look into to troubleshoot a
pod. This nice prompt or to-do list um
is something that um can be given to the
application upon request and also for
the optimize resources very similar uh
when the MCP server wants to uh optimize
the resource it needs to look into
certain places and then make some
suggestions
and then now we have this function this
helper function for setting up all of
these handlers. So we still have
handlers for listing and uh reading the
resources, listing tools and calling the
tools and listing and reading the uh
getting the prompts. I'm going to simply
run this block of code to add all the
handlers to our handlers file. So now if
we scroll down on this file at the
bottom, we should be able to have
everything we need for our MCP server.
So we have all the uh reusable blocks
for MCP server. All right. We created
the handlers for our uh different MCP
server primitives, the resources, tools,
and prompts. And we also have a helper
function for setting them up on an MCP
server. So this function gets a server
instance as an input. And we'll go ahead
and set up those uh register those
handlers on that server. Now we want to
create our stdio MCP server first and
test it locally. Uh we need to uh
against import the uh server class from
the MCP SDK and the seddio transport and
our helper function for setting it up
setting up the handlers from our uh case
handlers file. We have a function here
for creating the MCP server. We're going
to create a new instance of the server.
give it a name and version and uh for
now the capabilities are going to be
empty and then once we call the setup
MCP handlers and pass in our server
instance to it uh that function is going
to attach those uh handlers that we
created to this to our server instance
and in our main main function we are
going to create this MCP server uh
instance and create an instance of the
seddio transport and connect server to
the stdio transport and then log the
successful creation of the MCP server or
if there's an error just return write
something to the console and uh exit the
process. So I'm going to create this um
server file now in the servers. As you
see it's in the K uh it's in the service
directory called K MCP server. And now
we need to compile this. I can simply do
npm run build
which will call the TypeScript compiler
for me. And uh the the outcome of this
command will generate the
JavaScript files out of the TypeScript
that we have. Most importantly, we have
this case MCP server JavaScript file.
Now I want to test the stdio MCP server
that I have created. I'm already using
cursor. Cursor is a well-known u AI
application or AI host which can be
configured to work with custom MCP
servers. If you want to let cursor use
your custom MCP server, you need to let
cursor know about your MCP server and
what command is needed to run it and so
on. So you need this a similar type of
configuration for your uh MCP servers.
Uh
if you want to manually navigate to this
configuration file on cursor, you need
to switch to the agents tab and then
create a new chat and in the chat window
you need to uh go to the chat settings
and in the tools and integrations you
can add a custom MCP. I'm just going to
simply copy and paste this thing
read.json
and I let cursor know that I have a MCP
server called Kubernetes which can be
run by uh doing node this file and
accepts this file as a environment
variable and puts it in an environment
variable cube config. Now if I look at
my cursor settings, I see that it
immediately identified my MCP server and
uh it realized that my MCP server is
exposing four tools and two prompts.
Amazing. So let's just uh do some demo
for now. I'm going to make this terminal
a bit bigger and
open up K9S to look at my Kubernetes
cluster. I have created a kind cluster.
My cluster is empty. There's nothing in
it. If I go to the default name space,
no pods running. Uh I'm going to uh test
this by asking AI which is now equipped
with my MCP server. Um what are the pods
or list the pods?
And
when it tries to list the pods, it is
going to look at the uh MCP server. It
is going to call the list pods and it is
going to tell me zero pods found in the
default name space which is exactly
matching the set of our cluster. So now
I want to ask it um create a pod with
HTTPD image. And now um it tries to use
the MCP server. It realizes that there's
a create pod tool in the MCP exposed by
the MCP server. It calls it. It creates
the HTTPD pod and then calls the list
pods and uh tells me that the HTTP pod
is created. Awesome. So everything
pretty straightforward. AI is basically
um interacting with Kubernetes cluster
with the help of our MCP server. Now let
us do something interesting and create a
faulty pod. So create a pod with image
ngx2
which does not exist. So then it will
create a pod again with the help of the
create pod tool but then it faces the
like the pod fails with the error image
pool but the mcb server creates it. Now
if we want to ask um to troubleshoot
the faulty pot for us going to look at
the um state of our cluster
and
now it is going to get the pods list the
pods and troubleshoot the pod for us. It
realized that it is running a a wrong
image. It goes ahead and uh calls the
delete pod tool and then calls the
create pod and basically uh within the
tools that it has at it uh disposal, it
decided to uh go with the delete pod
first and then the create pod. There was
no um tool for setting image or patching
stuff. So uh the AI application tried to
solve the issue by using the existing uh
tools that it has available to it. And
now we see it just basically uh got uh
fixed the issue somehow for us. So that
is basically a demo of how an AI
application can interact with a
Kubernetes cluster with the help of an
MCP server u for the pod API of the core
group of Kubernetes.
So now let's uh move on to the next
part. All right, we created the SCDio
version of our MCP server and now we
want to create the HTTP streamable
version of it. We need a web framework.
We're going to use Express. Uh we need a
random UID generator from the node
crypto package. Uh I'll get to the uh
reason of uh why we need a random UI
generator uh in a second, but for now
let's uh continue. the server uh class
as we saw in the SCIO version from the
MCP SDK, the streamable HTTP server
transport from the um SDK um and the our
handler setup function
from our handlers file and the this type
for checking if the incoming request
from the application is an
initialization request. We create our
express application. We um assign the
port. We either read it from the
environment variable. If not set, we
just go with 3001. We uh enable the JSON
for express application and we create an
empty map of um session ids and
transport. Uh so for we're going to
store all the transport in a map and the
key of those transports are going to be
the session ids and this id is going to
be exactly that unique identifier that
we talked about here that is going to be
randomly generated after a session is
established. So this is how we define
our map and this is how we uh create our
MCP server. again uh a server instance
uh using the server class from the SDK
the name the version we declare the
capabilities as empty for now and then
we call the setup MCP handlers um to
attach those uh primitives to our MCP
server instance.
Then we um create a simple health check
endpoint for MCP server that uh returns
some uh useful JSON uh to whoever calls
this endpoint. This is something we can
use for example in Kubernetes uh as a
health check endpoint if you want to
deploy our MCP server as a pod uh to
Kubernetes
and then the main or most important
endpoint of our HTTP MCP server the is
the /mcp endpoint which uh if we use the
post method like if the client use the
port post method it will be about uh the
client to server communication so the
initiation of the communication
If it's a post method, it's going to be
from the AI application or from the MCP
client to the server.
So here we need to
uh uh read the headers of the request.
If there is an already a session ID
coming along the request header that
will be extracted and um the the server
checks whether the session ID is empty
or not and also check if that for that
session there is already a transport in
our uh map of transports
and we'll uh look up that transport if
such transport exists otherwise it's
going to create a brand new transport
for that session.
Um let's give it an ID using this uh
random UU ID generator and uh also
register event handler. So when the
session is initialized that transport
that was created here will be added to
the um map of transports that we
maintain in our MCP server. And on
transport um unclo like we register also
the unclose uh event handler. If the
session ID attached to the transport is
empty like if the client wants to
actually close the session the transport
should be removed from our map of
transports.
Here we create the server and um we
connect our server to the transport that
we just either looked up or created
and if uh something goes wrong we return
the u message to the application.
Otherwise uh we're going to uh we have
the transport and we have the uh uh
application uh running. Now we have uh
we can let our transport handle the
requests received by our uh express web
application. This was for the um AI to
MCP server communication path. For the
other way around uh which is called
server to client sort of notification or
server side events
we need a get uh method for our MCP
endpoint. How's that going to work?
Imagine uh the server or the MCP server
wants to notify the AI application of
something that happened inside the
server. For example, in case of a
Kubernetes MCP server, some events in
the cluster or some logs showing up
something basically that we are
basically interested in
wants to notify the AI application. The
AI application or the MCP client will
call the get endpoint.
it will establish a session already and
then the MCP server will already use the
existing transport to send messages or
propagate messages from the server to
the client. So this method is used to
look up an existing session for uh
server side events and if there is such
session we look it up and let it handle
the uh request. And here when the um
session is about to be terminated like
the AI application wants to actually
close the session this as you can guess
calls the MCP endpoint with a delete
method and u uh let's extracts the
transport and let that transport handle
that delete request and the transport
actually then will go ahead and uh here
uh delete the transport uh from the map
of transports when uh the delete session
deletion request is being handled on
close event handler. Now here we have
the main function. We start the um
application by listening to the port
that we specified some logs and uh
nothing special. I'm going to go ahead
and create the MCP server. Now we have
the MCP server in here. If I want to
test it, just a quick check if I can
compile it and uh uh run it with node or
import it with node. See that I get u
green check marks.
Uh if if we want to test the endpoints
that we um created for our HTTP MCP
server, we can also run this script.
It's simply going to start the
uh start HTTP
uh script that we have for our node
application and uh use curl to do health
check testing. Very simple. Let's do
this again. We see that the health check
is working. We're uh receiving a uh an
expected JSON.
And that was it with the um MCP server
um manual testing. If you want to
uh test our MCP server in a different
way, Antropic actually has built a very
nice tool called inspector which you can
use to um troubleshoot or inspect your
uh MCP servers.
You can uh simply start inspector by
running this command and then pass as
the argument uh your MCP servers. This
is for seddio for HTTP. You can um run
the inspector command and pass the URL
to your MCP server. So you can just
inspect uh your MCP server simply by
passing the URL. Uh I'm going to go
ahead and uh test my uh stdio and uh
HTTP MCP servers using inspector with
the script. Now to see basically how it
looks like and if we we see that like
inspector managed to uh hit our
uh MCP endpoint and get some results and
uh and so on also help could manage to
hit the uh uh health check endpoints and
so on and everything seems to be working
as expected. Um, we can also configure
our
AI applications uh to use this HTTP
based uh MCP server. Uh, I'm not going
to go over that for now. Now, let's look
at the inspector a bit uh in more
detail. I have some scripts in my
package.json. the the uh inspector and
inspector HTTP if you want to uh uh
start the inspector for your CDIO or
your
HTTP web uh MCP servers. Let me just do
npm run inspector http
and inspector
uh UI starts. It establishes a session.
I see uh I have to choose the transfer
type, the URL of my uh MCP server. And
then if I press connect, I see uh the
initialize
uh method was called by the AI
application which in in in this case is
this inspector environment. And now I
have established session to my MCP
server. And here I can for example list
resources. I see I have cluster nodes
and uh name space pods and so on. I can
list the prompts what prompts I have. I
can list the tools and so on. And you
can also basically click on this and uh
see the metadata about you can get
information about the input schema and
and so on. basically all the details
that you need to work with your MCP
server which is uh a pretty cool tool by
entropic. So if I now close this we can
also also see all my interactions with
the inspector has been logged on the
console. So if I now go back to the
readme uh we can continue with the
different testing setup. If you want to
test your uh MCP servers
programmatically you can definitely do
that. Uh here's a sample test file. It
just uh creates a client. Instead of
manually using cursor or cloud or uh
inspector to connect to your MCP server,
you can programmatically do it and uh
query the tools, the prompts and uh hit
the endpoints and see what kind of
response you can get from your MCP
server. Let me just quickly do it um and
create this test file. I have this test
client. Now if I compile it and run it,
I see my test is actually successful in
terms of calling the MCP server and uh
figuring out what tools and what prompts
it offers. So um we talked about um the
MCP server
and the integration with different AI
application and uh simple ways of
testing it. Next thing would be to
containerize this uh binary that we are
building. We are going to build the the
container image for the HTTP MCP server.
We need a docker file for that in the
end. It's a simple express um node
application in Typescript
uh which I'm going to use this docker
file for containerization of this
service. So I'm just going to run this
script. And now I have the Docker file.
I can also create a darker ignore to be
on the safe side. These are the commands
that I can run to um build the image if
I want. I can also test it again like uh
run the container in a demon mode in a
background mode and uh do the port
forwarding and hit the end points of
health check and MCP and see if I can
get the expected results. So let's just
do that. Let's let's build the container
and start it. Wait for some time and
then hit the end points
and stop the session.
And now we see that we get a green check
mark that um this tests were successful.
Uh this part is interesting. You need to
provide a um valid session ID to call a
certain um method of your MCP server.
This ID one that I uh provided was not
valid. That's why you see in the request
that request no valid session ID
provided for an actual test. Um if
there's already a session established so
that MCPs you can use that ID of that
session to perform this testing.
Uh if you want we can also um push this
container image that uh we built to some
registry.
Now we want to move on to the next part
and make our MCP server a first class
Kubernetes citizen which basically means
uh we're going to uh create a custom
resource definition and an operator that
manages the life cycle of our MCP
server. We created the MCP server and
now we want to truly make it a first
class citizen of Kubernetes by uh
extending the Kubernetes API and
creating a custom resource a custom
resource definition for our MCP servers
and a controller that watches for that
custom resource and handles the
reconciliation events of this
declarative API that we create for it.
For designing the API, we need to uh
first focus on designing the desired
state. uh the container image that we
want to run for our MCP server, the
transport type, some port and networking
configuration, the number of replicas
that we want to have for our MCP server
if we want to scale it up or down and
the the resource requests and limits
that we want to assign to those pods
running um the MCP servers and uh some
MCP server specific configuration. Let's
say some environment variables that we
want to pass to our MCP server. The
status part or the observed state have a
field for uh the current phase of the
MCP server resource pending ready failed
uh depending on the situation
uh that happens during the
reconciliation event. the number of
ready replicas, the endpoint that is
eventually um um callable or reachable
in case of HTTP or HTTP streamable MCP
servers and some conditions that are
basically detailed uh uh information
about the status or the observers and
some generated stuff that are written to
the status of resource by the
controller. I'm going to use the cube
builder project for creating this uh API
for our MCP servers. If you are new to
Cube Builder or Kubernetes operators in
general, please check out the uh Cube
Builder crash course that I've already
created on my YouTube channel. I'll put
the link in the description. Also, um
I'm not going to spend too much time on
the details of this um controller. the
most important stuff that are related to
MCP servers I'm going to go through. So
I'm going to now run this block of code
create my MCP operator
directory. Now we have an uh MCP
operator directory next to our um M MCP
lab which contain all the MCP server
code. Again we need to uh cd into the
right directory. Then we now we have the
project. Now we want to create the API.
The API group is going to be uh MCP. The
version is going to be v1.
So the fully qualified domain name of
our API is going to be mcp.acample.com.
And the kind is going to be MCP server.
we tell uh cube builder to create a
controller for us to watch our custom
resource
and a custom resource uh definition that
we're going to need for our MCP server.
So if I run this now, I should be able
to see that there is an empty uh boiler
plate MCP server API created which does
not have the fields that we want. We're
going to uh implement those next. Now
let's look at the API type that we want
to implement. As I said, we want to have
the image, the transport, the port, the
replica, the config, the uh resource uh
configuration for the pods, some
security context for those pods, the
service accounts, those uh uh pods or
those uh MCP servers
will be assigned and some cube config
secret which is interesting. If our MCP
server is um
supposed to um manage the state of a
cluster that is remote that the cube
config of that remote cluster needs to
be stored in this cube config secret
and the environment variables that our
MCP
uh server needs. The resources is just
the um the the list of MCP server
resources when we want to query uh like
QC cut MCP to get the list of our MCP
resources
and the cube config secret ref which uh
determines the information about the
secret holding the cube config or MCP
server needs the name of the secret the
name space the secret is living in and
the the key of the cube config
in the secret and the status or the
observed state you need the phase as we
said it's going to be an enom of pending
ready and failed or terminating in case
we delete the MCP custom resource the
conditions
uh some details around the observed
state number of ready replicas uh number
of replicas uh total number of MCP
replicas number of ready replicas total
number of replicas plus the endpoint to
the MCP server and some observed
generation which reflects the generation
of the most recently observed uh MCP
server. And the last transition time is
to the last time the condition was
changed like something happened uh to
the to our custom resource for the MCP
server.
You can read the comments. It explains
what that line is about. The phase, the
different phases that we define as a
constant and some string that can be
then uh set to the on the status of
resource and the most important the
condition types. Um depending on how the
reconciliation loop goes during the
reconiliation, our controller is going
to create a deployment a service and a
config map for our MCP server. If all of
this goes well, uh the condition ready
for the the condition ready will be
written to the status sub resource which
basically means that our MCP server
custom resource is ready and our FCP
server is ready and and usable.
Then we have the finalizer.
Uh again uh the finalizer route is
/finalizer.
Uh if you're new to final partner is a
piece of logic that uh handles um
the post deletion uh phase of uh a
custom resource. So if things needs to
be gracefully shut down, if sessions
needs to be closed or if uh resources
needs to be free, the finalizer takes
care of that. The MCP server construct
will have the um spec status, the
desired observed state, the list of it
and some helper methods which is going
to be used by the controller if it wants
to quickly uh read some fields from the
desired state and the helper method for
setting conditions um which we're going
to use in our controller and some helper
method for setting the resources for the
pots depending on um the transport type.
If uh the transport type is a CDIO, it
is for testing purposes. It's not that
uh important. It's it's not going to u
be deployed in into production. We go
with uh minimal uh configuration for
HTTP and streamable HTTP. Um we have
higher requirements and u that's pretty
much it. So I'm going to run this and
update our existing MCP server.
And now we have the MCP server type. Now
let's look at the controller which as we
know uh the most important uh function
in the controller is the reconcile
function. We have all the necessary cube
builder markers on top of our reconcile
function. Our uh controller has full
access to uh anything that has to do
with our API. It can do uh get list
watch create update patch delete on
mcp.acample.com example.com resources
and uh on the status or on the uh on the
status of resource is it has this
permissions and on the finalizer it can
add or remove the finalizers
and it can also as this controller and
it can also manage some um sub resources
and those sub resources are going to be
deployments and services and config maps
and the events of those resources. With
this markers, we just tell cubital what
rolebase access needs to be assigned to
to our controller uh or the controller
manager running our controller. And this
is a reconcile function. When there is a
request for reconciliation,
uh we need to uh the API client needs to
read the MCP resource from the API. If
it's not there, if it's not found, we
just ignore it and log in that it is
probably deleted. If um it is not
deleted um and it is being created, if
the deletion time stop is null and um
the our custom resource does not contain
a finalizer, we add a finalizer to it
and update the
uh our custom resource that we uh with
with the finalizer being added. And if
uh deletion time stamp is not null, it
actually means that we request the
kubernetes API to delete this uh MCP
server and our customers already
contains a finalizer. We need to handle
the deletion. For example, gracefully
shut down the pods and so on. And then
we need to uh remove the finalizer after
the deletion has been handled properly.
And here we have a a function that says
the default resources for MCP server. If
the some um specs are uh if some fields
from the spec are missing, here's a
function that validates the MCP server
in terms of configuration
and um if uh and recuse the
reconciliation requests with some sort
of backoff
and um here we have some helper
functions that we're going to add to the
uh to our controller um logic.
later. Uh which basically reconciles a
config map, a service and a deployment.
Just creates those resources or update
them depending on what has been um given
to us uh via the desired state. So
basically reconcile on those resources
and then the update status uh with the
results of the reconciliations.
And here's the handle deletion function
of our controller which is called by the
finalizer at the time of um deleting the
resource.
Uh it sets the phase of the status to
terminating and set some condition to
explain why it is set to terminating. it
is set to zero because the MC server is
being deleted and updates the status and
then u calls another function that
actually u performs the graceful
shutdown and the graceful shutdown
um depending on transport type applies
different configuration and eventually
we'll scale down the deployment to zero
to um uh shut down the MCP servers for
us. Uh you see the different
configuration that we have for the grace
periods of different um transport types.
And then we have a bunch of helper
functions that are used by our
controller either to uh detect some
misconfiguration and return some from
the reconciliation group with some error
update the status depending on how the
reconciliation goes. Uh it is pretty
self-explanatory. I'm not going to spend
too much time on this. I'm going to
scroll down to the uh most important
function left in this uh code block
which is uh setting up our controller
with the controller manager. So we're
going to tell the controller manager of
Kubernetes that um this controller is
going to watch the MCP server type
resources in the MCP v1 alpha API
group and uh this controller owns uh the
deployments
uh the services and the config maps that
are being managed by this controller. So
this is how we tell the uh manager what
resources we are watching and what
resources we are we can manage with our
controller. I'm going to go ahead and
run this to create the core logic for
the controller. If I now open up my
controller directory, I have the MCP
server controller which has some errors
because some u helper functions for
reconciling the manage resources, the
config map service uh service and
deployment are missing. I'm going to add
them next. So for managing um resources
we have to have a reconciliation
function for deployments for uh services
and for pods uh and for services and for
config maps. I'm going to go ahead and
uh create those by uh scrolling down to
the bottom of the file and add them. So
now we have the reconciliation for
deployments for the services can uh go
ahead and create them that the config
map also uh is has its own reconcile
function. I'm going to create that for
services and config map as well. And now
we see that the
the controller does not have any uh
syntax or compile error. Now it's a good
time to try building and uh making uh
and installing our API on our Kubernetes
cluster. What I'm going to do is that
I'm going to run this block. If I can
generate the manifest.
All right, the build was successful.
If I do now look at the
config CRD bases and here now we see our
custom resource definition that was
generated after I ran the make generate.
You see all all those things that we
defined for our spec and for our status
are in here with all the necessary
validation
uh that was implemented by uh those
cubeolder markers and so on. Now let us
move on to the next part. We're going to
create some sample custom resources in
the config samples directory. If you
find it here, it's going to be here. So
we're going to add the samples for MCP
server custom resource in here. Let's
just uh create the basic one and the
advanced one. The basic one just has the
image transport type, the port replicas
and some config.
And the advanced one is more verbose. It
has configuration
for pods, security context and uh some
environment variable. I'm going to run
that.
And now we have the basic and advanced
custom resources which we can use to
test our operator to install the uh
Kubernetes uh custom resource definition
for our MCP server. We can do make
install and once we do this the CRD is
going to be installed on our Kubernetes
cluster. So we see it is created. So if
we go to get CRD anchor for MCP we see
it is created
mcp servers. MCP.example.com.
Now if I want to run the operator and
see if the
it can reconcile on the
sample custom resources that we have the
controller is up and running and waiting
for reconciliation events. If I open a
new tab, create for example the basic
MCP server, I see some logs going on uh
in the controller.
If I now for example, let me open a new
terminal and here just do K9s
and search for MCP servers. I see I have
my basic MCP server pending and if I
look at the reason it basically mean uh
it is telling me that the deployment is
not ready. So if I look at the
deployment,
I see the basic MCP server is not ready.
And if I look at the replica set behind
this deployment,
I see
still not so much valid. Let me look at
the pods. I see the pod is failing with
error image pool. So um it basically
means our kind cluster has not loaded
our MCP server image yet. I have also
put that in the documentation. If you're
using kind or mini cube uh you need to
make sure that you you uh load your
docker image of your HTTP server. That's
the command for it. So the the image we
build for MCP server is going to be if
you look at the basic the image is going
to be KS MCP server latest. So
so I load the image into my coin cluster
by running this command. KS MCP server
is the name of our image and it is what
is used in the basic custom resource.
I've already deleted the custom
resource. If I create it again,
if I I'm not sure. Okay, I need to be in
the MCP
uh operator directory. If I create the
MCP server resource again and look at
custom resource. Now I see that my basic
MCP server is running. The pod is
running. If I look at the MCP
server
resources, I have the basic MCP server.
If I look at the services, I see I have
my basic MCP server
uh which points to the pot and the
container shows me the logs of the MCP
server that I have running. And I can
look at the status of my custom
resource. If I describe it and if I
scroll down, I see I have for example
the endpoint. So this is the fully
qualified domain name of my MCP server.
And later if I expose this endpoint uh
via a via an ingress or gateway the AI
applications are able to call the MCP
server that is remote to them. All
right. So that's it.
We covered everything.
Once again we made it to the end of the
tutorial. I appreciate your dedication
and your patience for following along
the tutorial with me. As always, if you
like the video, please give it a like or
share it with your friends. Um, and
subscribe to my YouTube channel and uh
also leave your thoughts in the comment
section. Let me know what you think
about the MCP protocol specifically in
the context of Kubernetes. Until next
time, take care.
[Music]
Model Context Protocol (MCP) is revolutionizing how AI applications integrate with external systems and data sources. In this comprehensive tutorial, you'll learn how to build production-ready MCP servers and create a Kubernetes operator to manage them at scale. MCP provides a standardized way for LLMs to access context from your systems - think of it as "USB for AI integrations". Instead of building M×N integrations between AI apps and tools, MCP transforms it into an M+N problem with a unified protocol. #kubernetes #mcp #operator #ai #llm #cloudnative #platformengineering #kubebuilder #programming ➡ GitHub: https://github.com/aghilish/k8s-mcp ➡ Follow me on GitHub: https://github.com/aghilish ➡ Follow me on LinkedIn: https://linkedin.com/in/aghilish/ ➡ Kubebuilder Crash Course: https://youtu.be/azJsyLjvHsI (0:00) Intro (01:31) What is Model Context Protocol? (04:04) MCP Architecture & Core Primitives (04:40) Resources, Tools, and Prompts (09:03) Building Your First MCP Server (14:40) Stdio vs HTTP Transport (19:45) Kubernetes Integration (57:10) MCP Server Operator Design (58:40) Creating the Operator with Kubebuilder (01:01:40) Custom Resource Definition (01:05:16) Reconcile Loop Implementation (01:13:18) Deploy and Test the Operatorx (01:17:23) Outro By the end of this tutorial, you'll have: ✅ Built MCP servers that integrate LLMs with Kubernetes ✅ Created a complete operator for managing MCP server deployments ✅ Understood production considerations for AI infrastructure This tutorial bridges the gap between AI tooling and cloud-native infrastructure!