Loading video player...
Hey everyone, thanks for joining us for
the next session of our Python and MCP
from local to production series. My name
is Anna. I'll be your producer for this
session. I'm an event planner for
Reactor joining you from Redmond,
Washington.
Before we start, I do have some quick
housekeeping.
Please take a moment to read our code of
conduct.
We seek to provide a respectful
environment for both our audience and
presenters.
While we absolutely encourage engagement
in the chat, we ask that you please be
mindful of your commentary, remain
professional, and on topic.
Keep an eye on that chat. We'll be
dropping helpful links and checking for
questions for our presenter to answer
live.
Our session is being recorded. It will
be available to view on demand right
here on the Reactor channel.
With that, I'd love to turn it over to
our speaker for today, Pamela. Thanks so
much for joining.
>> Hello. Hello everyone. Welcome to our
Python plus MCP series. This is a
three-part series. This is part two of
the series. If you're just joining today
and you missed yesterday, you can
totally go back and watch the recording
on YouTube of yesterday's talk and catch
up. Everything is recorded and available
for you. All the slides, all the code,
so that you can watch after, rewatch,
share, whatever you want to do. One sec.
So, I see we've got people joining from
all over. Someone said they're joining
from Japan, and I think it's 3:00 a.m.
there. So, kudos to people for joining
from so many different time zones. That
is really fun to just see everybody come
together in order to learn MCP. uh which
is one of the most exciting new
technologies of this year and I think
we're going to hear a lot about it in
2026 as well. So yesterday we talked
about how to build a basic MCB server in
Python using the fast MCB package and
how to run that server locally and use
it locally both from uh an agent like
you have copilot and from programmatic
agents. Today we're going to talk about
how to deploy Python MCB servers to the
cloud. Uh, and of course we're going to
be using Azure as our cloud since we're
from Microsoft, but a lot of these
concepts can apply to other clouds as
well. And then tomorrow we'll talk about
authentication. So definitely join
tomorrow if you're interested in user
authentication.
So let's talk about deploying MCP
servers to the cloud. You can grab these
slides from this URL if you want to have
a copy of the slides and follow along
with them and we'll share that URL in
the chat here.
Uh for those of you who don't know me,
my name is Pamela Fox. I'm a Python
cloud advocate. That means my job is to
figure out how to use Python with
Microsoft technologies. And these days
it's a lot of generative AI and related
technologies like agents, rag, and of
course now MCP.
So in our topic today about deploying,
we're going to talk about how we can
take a fast MCP app, dockerize it, uh
make it into a container with Docker,
deploy that to Azure container apps, um
add some observability using open
telemetry and then exporting to
different observability platforms like
Azure app insights and logfire and then
do private networking with Azure virtual
networks.
All of the code for today is in this
repo here, aka MS Python MCP demos. This
is the our our main repo for this
series. So, if you haven't uh if you
haven't bookmarked this repo yet, please
do. You can fork it, you can star it,
you can clone it. Uh you definitely want
to keep track of it somehow. And uh
everything that we're showing today is
from this repo. Now for today's session
because we are deploying to a cloud, you
can't just follow along locally. Uh you
actually do need an Azure subscription
in order to be able to deploy. So if you
do have an Azure subscription, you can
uh follow the readme instructions for
deploying. And if you don't have an
Azure subscription, uh, that this could
be a good time to get an Azure free
trial. And we'll put a link to the free
trial in the chat as well. And you can
try deploying on the free trial. Now,
sometimes a free trial does have some
limitations. So, if you have any issues
deploying with a free trial, uh, just
let me know and hopefully I can figure
them out for you.
All right. So, let's talk about cloud
deployment of our MCP servers.
So first let me recap the MCP
architecture that we talked about
yesterday, right? So you know we're
building an MCP server. Our MCP server
exposes a bunch of tools. It might also
expose prompts and resources and you
know a few other parts of MCP but most
of you know the heart of the MCP server
is really exposing a bunch of tools that
can then be used by um by other
applications. Right? So that's our MCP
server that we're building here. And
then we have on this side we have uh MCP
clients that live inside MCP hosts.
Right? So in this case this MCP host
could be something like claw desktop. It
could be chat GBT. It could be GitHub
copilot in VS Code. Those are all
applications that act as an MCP host
that use an MCP client in order to
interact with MCP servers. So this
client can connect to the MCP server
using the MCP protocol in order to say,
"Hey, what tools do you have?" Okay,
great. Thank you. Hey, I want to call
this tool. Okay, great. Here's a
response. Right? So, it's just going
back and forth between the MCP client
and MC server in order to use everything
that that MCP server has. Now, our MCP
host could also be a programmatic agent
like we've got agents written in lang
chain and agent framework. I want to put
dantic, you know, whatever your favorite
agent framework is. All of those tend
to, you know, most of them have the
ability to act as MC clients as well so
that they can communicate with servers.
So that's our architecture.
Now when we're running an MCP server, we
have these two options for the
transport, meaning how are we going to
do that communication between the
clients and the servers. So in
yesterday's session, we started off with
stdio standard input output and that's
where we actually ran the server using a
you know like a terminal command. So we
were like uvun mcpbs server.py, right?
And so that would run the server and
then all the communication would
actually happen uh just just over
standard in and standard out, right? And
so that can work pretty well when you're
you know just running servers locally uh
that are using you know a Python script
or a JavaScript package uh in order to
give you some functionality.
But the other option is HTTP
specifically streamable HP. And so this
with this option we actually set up a
server at a particular port right so
yesterday we set up localhost 8000mmcp
so that was our URL and all the
communication between the MCP client and
the MCP server happened over that you
know over HTTP to that locally running
MCP server URL. So that's generally
going to be the more production ready
choice uh because we get all the
benefits of HTTP right we know we know
how to set up HP servers we we know many
ways of protecting HP servers uh and how
to you know uh expose HP servers in in
different ways to other services. So
generally I'm going to be recommending
HTTP as the transport to use when you're
taking your server into production.
So when we do that, how it's actually
going to work when we do the protocol is
that the MCP client will send an HTTP
request to the SLMCP endpoint.
And in this case, it's doing a post
request and saying, hey, I want to call
a tool and this is using JSON RPC. So
JSON RPC is basically just send a bunch
of JSON uh that has all the information
inside it about what method you're
calling on the other server. So it says
okay this is JSON RPC the method we're
calling is tools/call
and here's the parameters that we're
sending to that to that method here. So
we post this uh this JSON to the MCV
server
and then the MCB server responds and
says okay great I got your I got your
method request I have called the tool
here is the response here's the result
and now the client can process it so
this is what's going to actually happen
when we expose this MCP server over HTTP
So when we want to run an Fast MCP
server in production,
uh we're going to we're going to make an
app based off the fastm and then run it
using a production level server. Uh so
let me actually show show the code for
that. So going into this repo, uh we
have this file here deployed MCP. py.
Uh, so I'll point out a few things,
right? So here we import fast MCP.
That's what we're using to make our
server.
Uh, and then down here is where we
create the the fast MCP server instance,
right? So we say, okay, we're making a
fast MCP
MCP server. Uh, we've got some
middleware, which we'll talk about later
for open telemetry. Um, but really it's
just an MCP server with a name. And then
as you go down, you can see we decorate
and say, "Okay, this is going to be a
tool, right?" So, we hang that off the
MCP server, right? The MCP MCP tool. Uh
we've got another tool here, get
expenses data,
and I've got a prompt here. So, all of
that is uh part of this fast MCP app.
And I've also for production, I've added
a custom route. This is my health
endpoint. This is a general best
practice when you're putting servers
into production is to have a health
endpoint. So you can check to see have a
common way of checking to see if the
server is alive and and working. Uh so
you know so we can add this custom route
here. It's not part of the MCB protocol.
Uh but we're you know we can attach it
and say hey when you go to health we're
just going to return back that
everything looks good. All right. So
then the important thing here is where
we say give me the HTTP app for this
fast MCP instance. So what this does it
returns a starlet application. Now if
you haven't heard of Starlet and most
people actually haven't heard of Starlet
it's actually incredibly popular but
most people don't realize they're using
it. So, Starlet is a Python framework
and it's a Python framework for
creating.
[laughter]
Okay, we should donate to Pi. Good. Uh,
[laughter] so Starlet is a is a Python
framework for creating Az applications.
Now, ASZG means async applications,
right? And generally our recommendations
in in this modern age of um you know
networking and generative AI is to use
async uh web frameworks.
Uh so Starlet makes makes it easy for
you make an uh async framework. Now many
people have not heard of starlet but in
fact many people are using starlet
because fast API is the most popular
Python framework these days and pi fast
API is built on top of starlet. So if
you have used fast API then you have
actually used starlet. Uh fast API just
adds a bit of additional functionality
on on top of starlet but all of the
async routes all that functionality
comes from starlet. So similarly fast
mcp is built on top of starlet right so
starlet is this you know lovely async
framework that's uh that's the core of
many of these frameworks that sit on top
right. So, uh, when we're using fast
MPCP, we can specifically say like, hey,
we need we need to get that that Starlet
ASI app back out. So, that's what this
does here. It says, okay, give me the
Starlet application because I need
something I want to run.
So, we get that app and here we're
making it a global variable. Uh so then
once we have that exposed then we can
run that ASI app and there we want to
use a production
uh level uh server and typically with
async apps we're going to use u
uh so we're going to use yuvicorn to run
that app right so here we say okay uh
I'm just going to cd into the servers
folder and say yuicorn
And this is the file. This is the
variable name. So it says look inside
this file, this Python module. Find this
variable.
And then we're going to just run this on
local host just to show you. And we'll
do port 8000 like we did yesterday.
All right. So Unicorn is a, you know, a
server uh that can, you know, run ASI
apps and it's what we're going to use
for production. You can also use it
locally as as well uh when you're
testing out your your servers locally.
So here we can see that now it is
running on 8000. We get a not found but
that's intentional. If we go to /halth
we can see tada it's healthy. It's up.
It's working.
So that is that is how um you know so
we're preparing our app for production.
Right. So, we've got an ASI app. We can
run that ASKI app using Unicorn, which
is a production uh production ready
server. The next step is figuring out
where are we going to actually, you
know, put this this application. Where
are we going to deploy it?
So, we're going to show how we can
deploy on Azure uh because we're from
Microsoft. Uh but there's other options
as well besides Azure, obviously. uh but
hopefully you know a lot of these
concepts apply across the clouds. So for
Azure, we've got this range of options
for where to deploy Python applications,
right? These are all possible places,
all valid places that you could deploy a
MCP server, right? And there's a
spectrum here of how much control and
flexibility you get versus how many
managed features you get, right? And
it's really up to you as a developer to
decide, hey, do I need more control or
do I want to take advantage of some of
these more managed features, right? uh
and then you give up some flexibility
but hey you get the managed features
right so here you know on the maybe the
the the most managed size we have Azure
functions and they can be a great fit
for uh for MCB servers right they're
very good at uh you know scaling up
quickly responding to um you know
variable variable load being able to you
know quickly respond to something so uh
so you know as your functions can be
really good fit there's lots of example
MCP servers written in Azure functions
and the that team has made it um very
easy to bring your MCP servers to
functions. So with functions if you're
going to deploy on functions you know
you need your actual server file like we
just showed you a requirements file.
We're using piprotoml. Uh let me show
our pi project.totml. So pipro dottoml
is a standard way of defining all the
requirements for a python uh module. So
here you can see uh all the dependencies
that we have. The the really important
one is fastmcp. Um but we've got some
other dependencies here as well. So as
long as we have a pi projectl and our
server python file um that's can often
you know tell an environment everything
it needs to know. For functions you also
do need a host.json which describes uh
how the function is going to work. Uh
another option is app service. App
service and functions are um similar.
they they actually share similar
infrastructure and app service will uh
if it sees a pi project.taml Tamil, it
will automatically build your Python
package for you just based off of that
pipro.l. So all you need for app service
is actually the Python file and pipro.l.
Then there's Azure container apps and
that is a great way of deploying
containerized applications using docker
files. So there we're just going to add
on a docker file and the docker file is
going to describe how to set up that
Python package. And that's actually what
we're going to be doing today because
that's my favorite platform for
deploying.
And on the most extreme end of
flexibility, we have Kubernetes. Right?
With Kubernetes, you need a little more
uh infrastructure setup. You're going to
also add in a Docker compose.
You're setting up and you you might need
a little more infra besides that as
well. So that's the range of options.
Today we are going to focus on container
apps and use that as our platform.
So now let's dig into deploying on
container apps.
The first thing we need is to
containerize our fast MCP server.
For that we need a docker file. So here
on the slide I have a simplified version
of a docker file and I'll step through
this and then I'll show you what our
real one looks like. It's a little more
complicated. Um, but this is a Docker
file. It starts off with a base image.
So here we tell it to start with Python
313. So that's the the Python version
that we're using for our um our
environment. Uh then we create a
directory to to hold everything and then
we install the requirements. So we're
using the pi project.totml in order to
install the requirements. We also have a
lock file from UV. So we copy over both
that PI project and the UV.lock file and
we uh make sure we have UV on the system
and then run UV sync and UV sync will
check the lock file and make sure that
everything in that lock file is
available on this system here.
Then we're going to copy the actual code
into the uh into this container here. uh
we're going to expose port 8000 because
that's what we're g running the server
on. And then we actually start up the
app, right? So we're using that uicorn
command again and saying, "Hey, Unicorn,
run this app on 8,000." And so then
whatever is running on 8,000 will get
exposed
um to to you know, anyone who's trying
to access that 8,000 from outside the
container. they'll be able to go to
8,000 and 8,000 will map to the process
running here. So that is a simplified
version of the docker file.
Uh now if we look at the real docker
file,
it is a little more complicated. So this
might be more what yours looks like for
actual production. So first we split it
up into two stages. We've got a build
stage and that's what sets up all the uh
Python package requirements
and then we have the final stage and
that's what copies the code over and
runs it. And in this case I actually
used a shell script, an entry point
script for running it uh because I did
want easy customization of exactly what
module it's going to run because
tomorrow we change it to a different
module, right? So you can you can get a
little bit fancier with your Docker
file. Um this you know I mostly split it
up into two stages. Um both so I could
reduce the image size and uh so that I
could take advantage of Docker's caching
abilities um to cache the dependencies
separate uh better separately from the
code. Uh so so there you go. So we have
a Docker file. You can, you know, use
the simple one or or the more complex
one, but the point is to have a
consistent way of installing those
packages, copying the code over, and
then running that fast MCP.
So now we need to create our
infrastructure on Azure for this
container to run. So we're using Azure
container apps. And uh when you create
when you use Azure container apps uh you
first create a container apps
environment and that's basically a
virtual network that contains um all
your container apps. So you can actually
put multiple container apps inside that
environment
and then that container app needs to run
an image and that image could come from
any registry. It could come from the
Docker registry, a GitHub registry, a
public registry, a private registry. In
this case I'm using Azure container
registry. So I build the image uh upload
it to the container registry and then
the container app pulls from that
container registry. Right? But you could
pull you could pull an image from
anywhere
now. If you wanted to actually try this
out with the with the the repo uh you're
you know you have to check out the repo
and then there are instructions in the
readme for deploying to Azure. Uh we use
the Azure developer CLI in order to make
it really easy to deploy. Uh so you just
run ASD off login and then ASD up and it
will create all this infrastructure and
container for you. And the reason it's
able to do that is because we have an
infraolder and that infraolder has a
bunch of bicep files. Bicep files is
what's known as infrastructure as code.
It's similar to Terraform if you've
worked with Terraform. And these bicep
files just declare everything that we
need, right? So it says okay, we need uh
you know we need cosmos, we need
container apps, we need openai, right?
Everything that we might need is all
declared in this bicep file and you all
you have to do is run a up and it will
create all of that for you.
So I've already run that. So I can show
you over here, right? So when I run that
um it created
created a bunch of resources here in my
Azure account. So you can see the
container registry. Um I've got some app
insights for storing the logs. I've got
the container apps environment. So if I
click on this container apps environment
uh we can see it actually does have a
couple container apps. Uh there's one
just for an off thing that we're showing
tomorrow. So you ignore that right now.
Uh then there's the actual server. So
that's the one that is that has that
Docker file that I just showed. And we
also have a container app that has an
agent in it. uh because I'm going to
show how the agent you could have an
agent in one container that's calling a
server in another container because that
could be a really common architecture is
you know to be to be putting your you
know servers in one container uh and
then your agents in other containers
and then they're all communicating just
inside that environment.
All right. So now all of that is
deployed. So let's actually use it,
right? So if we click on this server,
uh we can see the deployed URL.
And if we go to that URL, we'll see that
not found. But we go to /halth.
Okay, we see it's healthy. All right. So
it is running there. So, what I'm going
to do
is go to mcp.json
and um you know, so this currently just
has all of our local MCB servers.
Now, I'm going to say add server. So, I
click this little button down here, add
server,
http.
And for the URL, I need to put in the
the the container apps URL and then
slashmcp
because the actual MCP server is at that
SLMCP endpoint and we need to tell it
exactly the endpoint of that MCP server.
So we're doing SLMCP
and give it a name and then it just adds
it here. So it's actually quite simple.
You could also just write this yourself.
um you know we just have the URL and the
type.
So then I can start that server. Okay,
it is running
and here when I go to my copilot chat I
can make sure it's enabled. So I can see
it's enabled here and here you can see
it has two tools. So one of them is add
expense and the other one is get
expenses data. Uh so this is one this is
a tool. Yesterday we had this as a
resource. This time I decided to make it
a tool.
All right. So then we're going to say
like, okay, yesterday I bought
a $40 avocado toast with um fried egg on
top using my MX.
All right. So now GitHub copilot is
going to see that we have this MCP
server and hopefully is going to decide
to call a tool from that server
and uh it's uh working on a working on a
plan here. This is GVD5 I'm using right
now. So we get to see it's all its whole
thought process here. It looks like it
did decide to use the MCB server.
Okay. So you can see it says it's going
to run add expense. It gives the name of
that server that we just added. So that
matches and it says it's adding a new
expense to Cosmos DB. So now in
production we are using Cosmos DB to
store um to store the data. Um you know
locally we're using a CSV but obviously
we don't want to use a CSV in
production. So instead we use Cosmos DB.
So you can see here right we get all the
information we add it to the Cosmos
container and and boom then it'll be in
the container. So let's try allow
so it is running that tool now
and now it's oh and it decided to that's
very nice of it. It decided to then call
the other tool in order to verify that
it worked. So you know this is the
beauty of MCP servers is that I gave it
some tools. it decided to both use add
expense and get expense data in order to
double check that it really truly
worked. Um, that's actually the first
time I've seen it do that. Uh, so it
just depends on, you know, the which
agent you're using.
All right, so it looked like it worked.
Um, so now to verify it worked, what I
can do is well, it already verified, but
what I am going to do is go to the
Cosmos DB. Let me make sure this is the
right one. Okay, so here's the Cosmos DB
account. I can go to the data explorer
and look at the um items container and
then uh you know click on one of these.
So here we can say the the last one was
food and it says amount $40 uh avocado
toast with fried eggs on top. So we can
indeed see that it successfully added a
new item to this Cosmos container using
that production server.
>> [snorts]
>> Yes, yesterday I bought a lot of pizza.
Today I'm getting fancier with this
avocado toast.
Uh I saw there were a couple questions.
Uh so one question was about this infra,
right? All the infra that set all this
up, right? Like the the cosmos. So the
question was, oh did you know did a
human write this or you know did I
generate it? Um I wrote it. Uh
[laughter] well I will say one thing I
am using what's called um the AVM as
your verified modules. So the Azure
verified models try to use the best
practices the best security practices by
default. So I do recommend using Azure
verified modules. Uh if you if you know
Terraform they're similar to like
Terraform modules. So they are you know
officially maintained by Microsoft uh
you know employees that have the best
practices baked in and then you just
have to override what's particular to
your setup right so like this cosmos
um I uh did uh add on the which
containers I wanted right so for this
one I said oh okay um I want to add this
container and I'm using category as the
partition ID right Um so so there are
ways of generating bicep. Um my
preference is to use the Azure verified
modules and then to just customize them
with what I need. And also I recommend
just just taking other people's bicep,
right? So if you're looking for bicep,
go to my GitHub repo. You see all of
these things that have a check mark in
this AZD column. That means they all
have lovingly handwritten bicep in it.
And uh I recommend just taking bicep
that already works. That's that's what I
do. I just I learn from my past self and
I take from my past self.
Um the other question was why are we you
know once again why are we using HTTP
instead of standard input output right
because it's true if you look at you
know if you look at a lot of the MCP
servers that are in the registries a lot
of them are standard input output and I
think there's two I think there's a few
reasons for that one is that at the
beginning standard input output was the
only way of writing MCB servers. So
that's what most that's what everybody
used because that was the only way to do
it, right? That was our first option.
Then they added um SSE which was an HTTP
uh way of doing it but it was kind of
hard to do. And then they added
streamable HTTP. So streamable HP is
actually the most recent transport
option. So that's why you know uh people
maybe slower to adopt it. Uh the second
reason is that you know if you if if
it's a local script then it doesn't cost
people any money if they put out an MC
server that's just a a you know local
standard input output right like it's oh
okay you're just going to download the
package right that doesn't cost them any
money when you're doing a deployed
endpoint that does cost money right
because people are actually sending
requests to your server so there is a
cost involved there and so it has to be
worth it to you to actually um want to
do a deployed server
Um so so there are reasons for it. There
are ways that you can basically turn if
you have an SDIO, you can turn it into a
um into an HP server really easily.
There's all these packages that do that.
I can't remember the name right now, but
I could find it during office hours. Um
so it should in theory be quite easy to
to switch between servers. So if there's
one that's you know only available only
over a particular transport and you need
it in a different way you can just put
you know basically put a wrapper in
front of it. Um but it's a great point.
So you know you might decide to use
stddio instead of HP but I think that
for production that HTP is actually the
nicer option.
Uh and then the question was how did I
set the destination to Cosmos DB? So you
can look at deployed MCP. py. Um there's
quite a lot of setup in here that I kind
of skipped over, right? So I'm setting
all these environment variables on the
container. So if you look at my um you
know, if you look at my container
variables, uh you'll see that there's
quite a lot of settings on it. So you
know, there's um the Cosmos DB account
here, right? So all these environment
variables are set on the container. Um,
and I set those in the actual bicep. So
the the bicep is it sets it up so that
they're on the container once it
deploys. And then I can make a
connection to that Cosmos DB container.
And then inside the actual tools I can
create items. And then in this one I
query the items from the container.
All right. Okay. So let's let's keep
going. Uh we covered a few questions
there. So we saw how we could use the
deployed server from VS Code. Uh now how
can we use that from our actual you know
programmatic AI agents, right? So um you
know yesterday we showed that we've got
some examples using uh agent framework
packages. So, we have the Microsoft
agent framework example here, which sets
up an agent that has access to the MCB
server based off a URL. And then we have
a really similar link chain example as
well. Now, for both of these, we can
customize them really simply to point at
our deployed server by simply changing
the MCP server URL. And I've actually
just already changed that in my uh EMB
file um as part of my deployment
process. So, if I just run this uh this
agent today,
it it should just work, right? So, let
me um close this this one. All right. Uh
so, I'm going to just say UV run agents
agent framework.
Uh oh, I'm in the folder. Let me just
move up a folder. Okay. So, uv run
agents agent framework hp. All right.
So, it should run against our deployed
MCP server URL because I've set that in
my EMV here and it's loading in that EMV
and uh and yeah. So, and then we could
even let me see if I can look at the
logs right now and we might be able to
see the logs happening as it gets
queried. Let's see.
Listening. Okay.
Uh, okay. Here we go. Oh, we missed it.
We missed it in the logs. That's what
happens if you do historical, but we're
going to show open telemetry very soon.
Oh, here we go. Okay, so here we go. Um,
so we can see a log there. And uh, and
we ran the agent. And the cool thing,
the interesting thing is that this agent
actually messes up the first time it
tries to call the tool. It it picks the
wrong value for for some of it and gets
an error. Uh but then it corrects itself
because this this is an agent, right? An
agent is able to call tools in a loop.
So it makes the first tool call,
realizes that it didn't quite use the
right arguments, and then corrects
itself and makes the second the second
tool call here. Uh so it says that it
logged this 1200 um gadget. So now we
could check Cosmos DB again. Let me do a
a refresh
over here
of Cosmos.
Where is Oh, there it is. Okay. Refresh.
Refresh. Refresh. Refresh.
Uh, I'll just go and click on that.
Okay. Load more.
Wait, I'm gonna go here. Just want it to
refresh. Okay. [laughter]
All right. Now I see it. Okay. So, here
is the new this is the laptop that it
just um that it just bought.
So, uh so there we've run it. So, this
is we're this is running the agent
locally
interacting with that MCP server URL.
Now, we could also run the agent from
the container. So in our agents folder
we also have a docker file and this
docker file you can see very simply just
calls the agent framework file right and
so it should just immediately run this.
So that's actually what it already did
when it deployed um earlier is that it
it ran it ran that uh so we see earlier
actually when it deployed it also ran
that purchase. So you can do you know
you can do you know once you have a
deployed server you can use it from
various places right so you might be
using it uh from internal agents maybe
that are running based off of web hooks
or cron jobs or something like that and
they could be running inside other
containers uh like in this situation
here you could run it from uh a desktop
application like VS code or cloud
desktop you could run it from a Python
script like many many different options
for how you could uh interact with that
deployed MCP server.
So now let's talk about observability
um and how we can actually see what our
MSP servers are doing. So we are using
open telemetry for our observability.
Open telemetry is a standard that says
how applications should emit this
observability data. So it says how they
emit traces, metrics and logs. Those are
the three main types of data that gets
uh exported with open telemetry. So a
trace uh a trace is composed of a of
these spans and it basically shows like
the timeline of what's happening when a
request is being processed by a surface
uh a service. So it says like oh okay
you know we got this tool call request
and then we queried Cosmos and then uh
we got this response back from Cosmos
right so all of that would be a trace of
spans um that that show you know what's
what's happening and all the
dependencies and that's what I find
often the most useful uh then there are
metrics and those are numeric me uh
measurements that can be helpful to
understand how the system is doing
something like you know what's your CPU
usage what's your latency that sort of
And then finally, there are logs. So
when we're using Python and we're doing
like logger.info, logger.
Logger.warning, those are all logs and
we can uh export that in a way that's
compatible with open telemetry as well.
So those are the three main kinds of
observability data that we want out of
our applications.
So for an MCP server, what do we want to
see traces from? So remember our MCP
server is in fact a starlet as
application that's processing route
requests like to slashmcp and
slashhalth. So we want to see every
request that goes to that application.
Then that app you know is is using
fastmcb to process tools resource and
prompts. So we want to know hey did you
get a tool call? Did you get a resource
call? Did you go to prompt call? Right?
We want to know specifically what part
of our server is interacting with. And
then our tools, they happen to call the
Azure SDK, right? Because we're using
Cosmos DB here. So we'd also like to see
like, hey, for Cosmos DB, like what's
going on with those calls, right? Um, so
that's that's the, you know, kind of the
the flow of what we want to see in our
traces.
Now for each of these kinds of uh traces
we need to have instrumentation for that
that knows how to observe what's going
on with that layer in the stack and then
export open telemetry traces. So uh at
the top level for starlet there is a
package called open telemetry
instrumentation starlet. So we install
that that's in our pi project.l
And then inside the application we call
uh you know we call that inventor and
say hey wrap this app in this
instrumentation and when we do that
that's going to output all these traces
for every single route request right and
these are all following a standard of
across open telemetry that says hey when
you get a route request it's going to be
hp.oute route.
Next, we want to get traces from
fastmcp. So, for this, what we did is we
actually wrote our own middleware and uh
we just attached it to our fastm
instance and we say, "Hey, every time
you get a request, you're going to call
this middleware." So, this middleware is
actually in uh you know, it's actually
in the the repo. So, you can check it
out and see uh see how we wrote it. And
basically every time it gets a request
to do something it outputs a span. And
uh so here you can see the kind of
things that are in the spams. It says oh
hey here's the tool call I got. Here's
the arguments that I sent. And these are
also all following a standard um
conventions [clears throat] for how MCB
servers should trace their tool calls.
Finally we want to store the traces to
Cosmos uh DB. Now, when we use the Azure
SDK, by default, it it it likes to
export to open telemetry. Um, we just we
do set this just to make sure that it's
using open telemetry, but as long as we
do that, then we're going to see traces
from all those Cosmos requests as well.
So, now we we're sending off these we're
exporting these traces, but where are
they going to go? We need some place to
actually view all these traces uh once
we have them all exported. So, uh, for
Microsoft, we have Azure Application
Insights, and that's all a managed
hosted, uh, application. It's in the
portal, and we're going to be using
that. Uh, there's lots of other options
for observability platforms. Another one
that I really like is Logfire. It's very
popular with the Python community. I'm
also going to show you that because
that's quite fun to set up as well. Uh,
so you've got you've got a ton of
options. Uh, lots of things that are
open telemetry compliant. So once you've
got everything instrumented, you know,
you can use the open telemetry platform
of choice in order to see those traces.
So in order to export to Azure app
insights, we're going to install another
package Azure monitor open telemetry and
then use this function configure Azure
monitor and that just looks at this
application insights environment
variable and just sets up everything
based off that environment variable. So
this is the easiest way to export to App
Insights. You can do a whole lot more
manual work if you need to, but if this
works, you should use it because it's a
one line and we'll set everything up for
you.
So now what does it actually look like?
Right. So we'll go to um let me show you
the the log. So here in app insights
I've already opened up a trace so you
can see and um and this shows you like
okay we got a request uh we got a post
to slashmcp
and that post uh to MCP tried to call
the add expense tool here and we can see
all the properties and we can see the
first time it did actually fail because
our agent called it in the wrong way so
it got back this validation error and
that's interesting that means like maybe
my agent um you you know, maybe I need
to give my agent more instructions to to
be better. But then it does recover and
then it calls add expense again and this
time it passes in the right arguments
and so it's able to successfully go
through it calls container uh you know
Cosmos container makes a bunch of
requests to my Cosmos account and puts
it in there. So that is a you know a
full trace that shows all these spans
that come from the different uh
telemetry exporters that we have.
Uh now we also have a dashboard. So I
set up this dashboard based off of the
you know logs we have. So then we can
just like kind of monitor this dashboard
and be like oh okay here's our failures,
here's the performance. Uh I I made a
bunch of custom uh dashboard charts here
just using uh KQL custol which is like a
query language for your logs right uh so
you can set all this up with app
insights and and this is all actually
it's all in the bicep so if you deploy
from the repo all of this will be set up
for you automatically
so that is app insights
now I did also want to show logfire
because I think it's a really cool
platform form. So with logfire, we're
going to add the package for logfire and
then it also has a configure function to
call and we're going to set this logfire
token in our environment variable. So
it's able to use that in order to send
the requests. And then when I do that
then I get this I can use the the very
pretty uh logfire dashboard here. And um
so here I am going to focus on earlier
today when I called my logfire agent.
And we can see here
uh let me make that a little bit bigger.
Okay.
Uh so here we get the you know we get
the request from starllet and then we
see this add expense tool call and
that's the one that has the validation
error and then and then it was
successful and then it called uh you
know then it called the cosmos db stuff
uh so it's very similar to app insights
um but you know all these observability
platforms they they have their own way
of approaching things um but you know
what I wanted to show is like okay as
long as we use open telemetry tree then
it's becomes much easier to view our
observability data in any platform that
is compliant with open telemetry. So
that is what I would recommend is um is
to use open telemetry.
All right. Okay. So that was
observability. We have 13 minutes left
and I do want to talk about private
networking. Uh, so I'm going to go into
private networking now. Uh, if there are
questions that I've missed, we do have a
full hour for office hours after this
live stream. So, we can talk about more
things in that office hours as well for
anything um, you know, any questions
that aren't aren't getting answered in
the chat. So, now let's talk about
private networking because here's the
thing. Everything I've been showing you
so far was with a publicly available
endpoint. So, in fact, any of you could
have taken my URL and logged expenses
with it, right? You know, with your $300
of pizza or whatever, right? Like, as
long as you had that URL, you actually
could have logged an expense. And that's
that's kind of kind of sketchy, right?
And some like sometimes people rely on,
you know, subsecure security from
obscurity, like basically like, oh, you
don't know my URL, so it's fine. Like,
it's it's not going to happen. But it
it's very risky to have a public
endpoint uh if you don't mean for it to
be a public endpoint. Right? So this is
a stat we had where 84% of the attack
paths and security issues are because of
internet exposure. Right? So if you have
a publicly available endpoint, you
really
you really should think carefully,
right? Like you really need to want to
have that public endpoint and otherwise
you need to protect that public
endpoint. You can either make it
private, which is what we're going to
show today, or you can add an
authentication, which is what we're
going to show tomorrow. So today, I'm
going to show how can we make that
endpoint be a private endpoint and not
have the risk of this public URL that
anybody could use.
So does a private private endpoint make
sense for you? Right, a private MC
endpoint. So, there's two situations
where you might consider having your MCP
server at a private endpoint. The first
one is if you're making an MCP server
for internal employees at your company
and those employees are already working
on a virtual network and they have like
a VPN or some sort of way of working
inside that network. maybe they're
actually on premise and and that's how
they're inside the network but some way
of getting into that network uh that is
you know particular for your employees
right so then it could make a lot of
sense to uh you know put that MCP server
on that same you know accessible via
that same virtual network so that all
the internal employees can use it and
nobody outside the company can access it
right
that's the first situation the second
situation is maybe your MCP server isn't
even designed for use by humans, right?
Maybe your NSP server is only designed
for use by other agents and those agents
are all internal on your systems, right?
Like if you have these agents that are
just running in cron jobs that are just
like doing this, you know, background
analytics, right? So in that case, it
doesn't make sense to expose your MCP
server to the whole world, right?
Instead, you set up a virtual network
that has your MCP server, it has your
agent, and they just communicate over
internal networking um with each other,
and nobody, you know, nobody needs to be
able to actually go inside that network.
So the way that we're going to set up
that situation on Azure is that we've
already got this container apps
environment and um and that actually by
default that actually is a virtual
network but I'm setting up a new virtual
network um so that I can add some more
things to it. So what I have is this
this whole thing is a virtual network
and inside that virtual network I've got
two subnets and uh the first subnet is
the one that contains the container app
environment with its apps and I've got
what's called an NSG which is a network
security group. It's it's you can attach
it to a subnet in order to um in order
to to come up with rules and say like oh
only you can only allow traffic over
HTTP only allow traffic at these ports.
So, it's just a way of adding more
security to your subnet in terms of what
can go in and out of the subnet. So,
we've got two subnets with NSGS that are
all inside the same virtual network. And
then everything communicates over using
what's called a private link. So, this
MCP server uses a private link in order
to communicate with the private endpoint
of Cosmos DB. So, there's also then
something called a private DNS zone that
resolves that private endpoint in the in
the virtual network. So there's like
there's a lot of infrastructure uh that
goes into this. Once again, I've already
written that infrastructure for you. So
it's all in the bicep. So you can just
deploy it and it'll get all set up. Uh
so everything is communicating over
these private links to private endpoints
using the private DNS zones. So that is
the architecture that we're setting up.
Uh if you do want to try it out in the
repo, you can follow the instructions in
the readme for deploying with private
networking. You basically just need to
set some environment v some acd
environment variables and run up and
it'll set everything up. Uh so I can
show you what it looks like once we uh
once we have everything set up in the
portal. There's quite a lot of
infrastructure involved, a lot of
resources involved with private
networking. Um but it does make
everything more secure once you have it
all set up. So here this is my private
resource group. Uh so this one you can
see we've got private endpoints. Uh
we've got uh okay a bunch of let me sort
by type. Okay. Uh so we've got network
interfaces, network security groups.
Uh we've got all these private DNS zones
and then we've got all these private
endpoints. And then we have the virtual
network itself.
And sometimes the topology graph is cute
to look at and sometimes it's kind of
crazy, but here is the topology of the
virtual network. I think my slide made
it a little clearer. Uh, but this is
just to show there is a lot involved
with a virtual network. Um, and it can
be a little bit of a pain to set up at
first. Uh, but hopefully I've made that
easy for you by writing the writing the
bicep.
So now we have that deployed. Um, so
I'll go back and actually show you uh
how I know that it's all private. I can
go to the I'm going to go and open the
container apps environment
and look at the networking. And here you
can see that public network asex is
disabled. So it is going to block all
the incoming traffic from the public
internet. No one can get into this
environment at all. And we can see it is
associated with a virtual network with
this specific subnet and that it itself
has a private endpoint. So all of this
indicates that it is all completely
completely private, right? Uh so that's
so that's all that's all private now.
And then we can see here we can uh you
know look at our apps and we see that
we've got the agent app and we have the
server app. Uh so if we go to the agent
application,
we can check the logs and uh and see
what it's doing. Uh so we can see it's
it's working here. We can look at the
historical logs and somewhere in the
historical logs, we'll see a call to the
server um where it actually made that
request right?
Uh so that's the general idea. Uh and we
we get it all set up and then the agent
calls the server. Now, this one, the
agent only called the server once when
the server first starts up. In reality,
if you had this sort of setup, you would
either have a cron job, which was
running the agent on a schedule and
doing something, you know, like every
hour or every day, or you would have
some sort of hookbased system where
like, oh, every time that there's a new
um GitHub release, you're going to do
this, or every time there's something
added to the blob, you're going to do
this, right? So, generally, it's either
going to be a scheduling thing or a
triggerbased thing.
Okay. Uh, so, so that's it. So, it's the
demo for it isn't very exciting because
basically the point is, yay, we can't
access it. Yay, it's secure. Um, but the
exciting thing is the fact that it, you
know, it is actually all set up and and
can't can't be accessed.
Now, we have another example that does
set up a kind of similar uh
infrastructure. This is our AI travel
agents uh sample repo and this one sets
up actually MCP servers in different
languages. So it sets up like a
JavaScript one, a Python one, a Java one
and that's kind of a interesting
interesting idea is having uh you know
multiple languages for services and this
is something you can do easily once
you're using something like container
apps or you're just like oh we have a
Python container, we have a Java
container, we have a JavaScript
container, right? So that might be an
interesting sample to check out as well.
All right. So we focused a lot on
container apps today. Uh there are other
deployment options which I mentioned in
the beginning. Uh so just to talk
briefly about these other options. So
Azure functions they do make it very
easy to bring MCP servers. They have two
different ways of doing it. uh you can
either use the Azure functions MCP
bindings and so there you're actually
writing your function code very
particularly for Azure functions using
the Azure functions you know like
package uh and and that's good because
it's very built into Azure functions so
it's going to work really well however
they also did add support for being able
to bring a fast MCB server and just
using that for Azure functions and so
that might be more exciting to you if
you want to be able to like deployed to
multiple places like maybe you want on
Azure functions and also container apps
and also fast cloud right so Azure
functions does support both of those
options
uh I found this example from Anthony Chu
on the functions team which is pretty
fun which deploys a Azure function MCP
server uh for weather but the thing
that's really fun is that it also uses
the openi apps SDK so it dep displays a
widget when inside chat GBT when you use
it. Uh so this is getting into MCP apps.
The idea that MCB servers can produce
not just data but also UI widgets like
actually full interactive applications.
So if you're interested in that do check
out Anony's repo. It's pretty fun.
Now I also mentioned Azure Kubernetes
Service. So with this we're it's similar
to container apps except we're also need
to set up the entire like multiple
services right. Uh so we can use docker
compose.yaml. So, you know, we'd say
like, okay, we have our MCP service
here. We have our agent service here.
Here's the Docker file for each of them.
This is how we're going to connect them
together. You know, the port they're
communicate over. Uh, and then put all
of that on on Kubernetes and you know,
then you want to manage all the scaling
for Kubernetes. Uh, we do have an
example that uses Kubernetes. This is
the the Zaba shop and uh it's like a
pretend retail shop and it uses MCP
servers for uh a lot of the
functionality and then has agents that
you know do analysis based off those MC
servers. Right? So there's one that
looks at finances and tries to get
insights based off the finances and
gives you a report back. Uh so if you're
interested in Kubernetes, check out that
repo there. It's got all the
infrastructure for it.
Okay, and that brings me to the final
minute. We have just made it and I know
there's lots of questions that we didn't
get to, but that is why we have office
hours right after. So basically once we
close down this stream I'm going to hop
into Discord and start up the stage in
Discord and uh and you know we that so
if you have a question that didn't get
answered here just copy and paste it
into Discord and uh we'll try to answer
it there. Uh as a reminder all of these
sessions are being recorded and all the
slides and code is available. So you can
go to this aka this resources link here
in order to get all the resources and
we'll keep that updated. Uh and then
after the series if you want to keep
learning we do have uh my colleagues
have this MCV for beginners repo that uh
you know has a bunch of tutorials about
how to write MCP servers. [snorts] Uh so
there we go.
We can we're going to close up the
stream for today and go over to Discord.
Uh thank you so much for joining today
and for uh for all the great questions
and comments in the chat even though I
didn't get to to all the questions uh
but hopefully we can get to them in the
discord. I hope you join us tomorrow for
talking about authentication
which is super super interesting with
MCP how we can make MCP servers where
we're actually doing things on behalf of
the user. Uh really really some cool
stuff there. Uh, so hope to see you in
the Discord or tomorrow. Bye everyone.
Thank you all for joining and thanks
again to our speakers.
This session is part of a series. To
register for future shows and watch past
episodes on demand, you can follow the
link on the screen or in the chat.
We're always looking to improve our
sessions and your experience. If you
have any feedback for us, we would love
to hear what you have to say. You can
find that link on the screen or in the
chat. And we'll see you at the next one.
All
>> [music]
>> right. [music]
D
>> [music]
>> Dick
[music]
dick.
>> [music]
In our second session of the Python + MCP series, we're deploying MCP servers to the cloud! We'll walk through the process of containerizing a FastMCP server with Docker and deploying to Azure Container Apps, and also demonstrate a FastMCP server running directly on Azure Functions. Then we'll explore private networking options for MCP servers, using virtual networks that restrict external access to internal MCP tools and agents. š Explore the repo - https://aka.ms/PythonMCP/repo š This session is a part of a series, learn more here: https://aka.ms/Python/MCP/25 #MicrosoftReactor #learnconnectbuild [eventID:26543]