Loading video player...
In this video, I'm going to explain what
MCP sampling is and how to build
sampling into your MCP tools using
Python. This is part of my MCP concepts
demo series, and you can see a link to
the playlist with all of my videos from
this series in the video description.
Sampling is when your MCP server will
make a request to an LLM. And how this
is done is the server will pass a
message to the client. So the server
might ask claude desktop for example,
hey I have a sampling create message
request. Uh then the client will pass
that on to an LLM that the user
configures or that the client has set up
somehow. I'll show you that as an
example. And then we'll return that
generation to the client and then the
client will present this back to the
server to continue working. So in this
way the server can access LLMs where the
server doesn't need to configure those
themselves. they just go to the client
and say, "Hey, whatever LLM you're using
or whatever you have set up, can you
just run this prompt for me and give me
back the answer?" And then the server in
this case like can continue processing
before returning the final result of
whatever tool call that it's making. And
this has been around since the first
version of the spec. If I go to November
5th, 2024, I can see sampling is still
in here as a client feature. And in the
latest version now we have just an
additional feature elicitation which I
spoke about in a previous video linked
in that playlist I talked about. Now
let's see this as a demo and talk about
how to implement it with Python. And if
you're learning AI engineering you
should subscribe to my newsletter. You
can go to zazencodes.com and you can
sign up for free. I have a few bonuses
for you if you sign up and uh you'll get
one email a week like this where you can
decide if you're interested in watching
that video and learning about this
particular topic. Here's the demo for
now. Ignore this agent. We're going to
focus on this guy right here, this
server. py. And this code is all open
source in my Zazzen Code season 2 right
here on GitHub. You can download this
and check this out yourself. And I'm
going to open up Vim and we'll look at
this CSV quick plotter server. The
purpose of this server, it's just a
quick sort of demo server. It's going to
take in a CSV file and it's going to
create a plot based on the data inside
of that CSV. And in order to do that
plot creation, it's going to use LLM
sampling. And if I look at fast MCP,
this is the framework I'm using to build
this MCP server. And it's got this fast
MCP object that I'm instant
instantiating in order to get MCP here.
And then this guy is how I'll use I'll
define tools. I sort of take this server
very similar to fast API and okay, so
this is the tool that I'm defining and
we're going to demonstrate now. It's
called analyze CSV and plot. It's got
CSV path which is a string and then it's
got this context object. And this thing
is dynamically injected from fast MCP.
It's not something that the user is
going to see. We're not going to see
this or be able to set this when we're
calling this function. This is like a
thing that fast MCP provides. And down
here is where I'm doing the sampling. Uh
so what we do is we submit an LLM
request and hold on that request. I've
got this with a wait. So the script is
going to stop at this point until it's
got a response and then it's going to
continue on do something with that
response. But uh this sampling request
itself has a user prompt like a message
a system prompt and you can set max
tokens and temperature. And there's
other things that we can do through
through the uh fast MCP implementation
with this. And if I go back to user
prompt that's going to be this guy right
here. So we're dynamically generating
some prompt something to evaluate. And
the idea is with sampling like this is
going to depend on what's come before
this in the workflow. And so in this
case I want to profile some data. So
I've done some data profiling up here.
I'm reading a CSV and I'm calling this
profile data frame which does this thing
which kind of like gets the shape of the
data and some example of the data and
summaries and stuff. And so based on
this profile I wanted to create a chart.
So I come down here and I'm like write
Python that builds stuff and then I'm
going to execute that stuff. So I go to
from response, I find some code and I'm
going to execute this. The number of
times that I've executed arbitrary LLM
generated code on this channel is
getting ridiculous, but it's it's just
so cool to me. It's such a cool use
case. And so I compile it, whatever, use
this compile command and then I exec it
right here. So this is generally
dangerous and bad practice, but we're
doing this for a fun demonstration, so
it's fine. In order to demonstrate this,
I'm going to be using something called
fast agent. This is just an MCP client
that supports sampling. It works in the
terminal and it's a little bit tricky to
set up. So, um, I thought I'd show you
how to do that. This is the readme file
for for like my instructions. So, you
can try this out yourself. And I'm going
to run uv vmv.p.
And we're going to install Python 3.13
inside of a virtual environment. So
that's just created this VEM file right
here. And now I can install stuff into
that. So if I said UV pip install fast
agent MCP, I've been loving UV because
it's just super cool. It's super fast.
So it it installed a bunch of stuff into
here and I can prove that to you. If I
look inside of VM, I'll be careful not
to show too much. So this is all the
libraries agents called MCP agent. So if
I see MCP agent, that's the library. So
that's what was installed along with all
the dependencies for this thing. Um,
this here is the script I'm I'm about to
run from fast agent. I'm not going to go
through this. It's out of the scope of
this video, but you can see the source
code. And I have a different video on
fast agent if you're interested in using
this thing. So I'm going to say UV run
agent. This will spin up my agent. And
we're going to we're going to start
doing some stuff. We're going to try
this tool out. Okay, so it's spun up and
you can see it has access to one MCP
server and I can have I can ask like
what tools do you have and this is the
tool quickplot CSV analyze CSV and plot
it's got a description and we're going
to try to call this tool and I want to
demonstrate the sampling. So, hopping
back to my readme, I can have plot full
path to CSV. And the CSV file I want to
do is called serial. You can see it
right over here. Um, I'm going to just
generate some plot some visualization of
all these serals. I want the full path
to this object. So, I'm going to grab
that like this. And I'll just say plot
this thing
serial.csv.
Okay, let me zoom in. Make this big so
we can see what's going on. It already
called this whole thing. It's logging
that it's it's sampling the data. It
produced this data profile. And notice
how this is going into the sampling
agent. Do you see how it's running on
cloud 3.5 haiku down there? But if I go
up, this was already worked so fast. Oh
my god. Here's what I just said. And the
output of the tool. Then we first see
the the initial prompt that I asked and
we can see the model that we're using.
It's GPT5 mini. Then this is some output
from this plotter from the same thing
this where now we're seeing some output
and we're seeing that the assistant
requested tool calls the it we requested
this specific tool request this um these
are the arguments and this is the name
of the tool over here with fast agent
and now we trigger the sampling and so
as this tool's executing something
called a sampling agent spawns down
here. So, we get this sampling agent
starting to work and this thing's using
Claude Haiku. And notice how we were
using GPT before, but now we're using
Claude Haiku with our sampling agent.
And I'll explain why that's the case.
And this sampling agent is giving
output. This is the prompt that we
provided to the agent um to execute to
the LLM. So, coming down through the
prompt, we see a bunch of output from
the sampling agent. So, this is going to
be output that Haiku is providing us.
This is from the sampling agent which is
going back to my MCP server and we
generate all of this Python code. The
server executes that and then this line
down here is showing the server
returning that to the client. So the
plotter sends this back to the client
and we see this image content just
appear. And when I'm talking about the
client, I mean fast uh agent, I mean my
terminal right here. So this gets sent
from the MCP server back to my terminal
and it just dumps this whole blob of
JSON data which represents the image. It
represents the PNG image and I fast
agent is unable to render this for me or
at least I it I think it can. I just you
know I didn't hook it up to do that but
it does understand this data. It it can
read this binary data and it figures out
what's going on in here. So it's
describing that to me like hey top left
we have a histogram. Top center we have
a correlation heat map. Top right we
have box plots and it's done that just
by looking at the this binary data and
using its multimodal capabilities. And
by it I mean GPT5 because this is
running on GPT5. So GPT5 looks at this
binary data and figures out what it is
and sort of um explains that to me. Now
I'd love to see this image. However,
when I was testing this I wasn't able to
get that working with fast agent. Again
I know there's a way to do it. I just
couldn't figure it out. So, uh, so this
data is kind of lost to us in this
example, but we're going to, um, run
this again with VS Code Insiders
Edition, which also supports sampling
right now. And, uh, we're going to be
able to see this image get rendered for
us. I will note as well that I could
change the behavior of the server to
maybe save this to a file on my
computer. And I'm also going to show you
how we actually return this image
content because that's pretty cool, but
that's unrelated to the video topic. So,
that's at the end. I'll I'll show you
that. And right now I'm gonna go to VS
Code and let's give this a shot. Here's
the server that I want to access. And
that's at this directory. And I'm I'm
gonna need this for VS Code to configure
this. Let me show you that. We're going
to go to VS Code insiders. I'm going to
configure this MCP server, this CSV
quick plotter. And we're going to we're
going to do this. So if I open up my
sort of commands, I have this open user
configuration. Let's see what that is.
This guy in here is in my user directory
for this VS Code Insiders. and it's
mcp.json. You can download this. This
isn't like a gated product. You can just
download VS Code insiders and use this.
Um, and here's how I've set up an image
generator demo. Uh, it was called CSV
uh, quick plotter. I guess here's my
Python. That's the virtual environment
that I I've installed fast MCP and all
the stuff that I need to run this
server. And this here is the server. So,
this is how I can set it up. However,
it's not called test. Let me go back.
It's called MCP sampling CSV quick
plotter. So, CSVQ quick plotter. Okay.
And if I look at list servers, I can see
CSV quick plotter is stopped. I'm going
to click enter on that and I'm going to
say start server. So down here I can see
that this server is working and it's
discovered one tool which means when I
open up this guy this button up here and
I type hash I should be able to see
something called here analyze CSV and
plot. This is going to be the function
that I want to call from my quick
plotter. But first I want to go back
into the server settings for CSV quick
plotter and I want to see how this is
configured with respect to sampling. And
look at this. There's this set model the
server can use via MCP sampling. I'm
going to click enter on that. And right
now it's set up to GPT 4.1. I could also
say auto or something else. Let's um
let's leave this as GPT 4.1. And then
what I'm going to do down here is I'm
going to be using claude sonnet. So I'll
see if we we indicate that as logging.
So I'm just going to accept that plot
this slash serial.csv.
And I I got to I got to specify what
tool I'm going to use with um with
Copilot. That's how we got to do it. We
got to actually explicitly tag the tool
that we want to call. Here we're seeing
our tool call request and the CSV path
which has been populated. The MCB server
CSV quick plotter has issued a request
to make a language model call. Do you
want to allow it to make requests during
this call? I'm going to say allow in
this session. I can see this indicator
running. This has been going for 5 10
seconds and I'm not seeing any logs as
it goes like I did with fast agent. Uh
oh, there was an error trying to analyze
the plot. Let's see. I'm not really
sure. But um let me try and change my
sampling model. If I look at this
configure model access,
show sampling requests. Look at this.
This is the actual log of what the
sampling request said.
and I passed in the data and all that
stuff. And
down here, I'm actually seeing the
response as well. So, this was the code
that was returned and it's trying to do
all these plots. It's possible that I
just had a an error in here. So, I'm
thinking this string is not defined.
It's probably just a syntax error in
this code. So, I want to try this again.
But while I'm here, what I'm going to do
is try this. I'm going to show list
servers down here. I'm going to
configure model access to something
different. So, Sonnet 3.5 is a really
good model. I'm going to pick that one
instead. And we'll try that out. And for
here, I'm going to run the exact same
prompt, but in just a new session. And
when I look at the log from this
request, here's where I can see Copilot
GPD4.1.
I'm going to want to see Sonnet making
that request to to make sure that my
configuration was updated. Okay, it ran
this successfully this time and it's
outputting the information about the
plots, what they show. But can I
actually see these? Here's the data that
was returned and it's this binary data.
But look at this. I can actually
download the plot itself. Save to
file.png. Let's open this up.
Okay,
amazing.
This data set, by the way, is this guy.
80 cereals from Kaggle. So like, for
example, American home food products,
General Mills, Kelloggs. These are these
guys right here. Like that's Kelloggs,
that's General Mills. This one's post
Quaker Oats. And this is I guess this
chart here is like the number of uh
cereals per brand. And we have a
distribution of ratings. There's some
sort of rating in this data set. I don't
know what that one is, but the model
really honed it. a rating of cereals
possible from is it this data set
doesn't even know but it looks like a
lot of the the model really honed in on
this ratings for this we have a
distribution of ratings we have the
highest rated foods have low calories
this chart obviously catches my eye a
little bit just quick analysis thing
like let's look at the first column
right here this is going to be calories
so anything that's red will be a more
will be a higher correlation to calories
so the reddest thing I see is weight. So
the the heavier the portion size, the
more calories it has. That makes some
sense, but also sugars is in here. And
then fat is another one. And it's
inversely um correlated with fiber. So
the more fiber in here, the like the the
lower the calories. The more fiber
content, the the less the caloric
content. Totally makes sense. All right.
So, uh I think this is really super cool
that this image was made available to
us. So I want to complete the
demonstration by showing how I actually
implemented that over. We're going to go
back into the code and talk through this
going back into the server. Um I hope I
haven't missed anything. All this should
be clear. It's all open source. H but
the server itself where where I return
this I'm doing some certain stuff with
images. So down here fast MCP has some
utilities for images and it's got this
image which I imported as FM image. It's
like a helper function. And if I look
where I'm using that, this is where I'm
encoding the figure. So I take in some
figure which is a map plotib object at
this point. It's called figure. If I
come up, it's from matt plot plotlib.f
figure import figure. So I import I
bring in that figure into this function.
And this is where I return this image
content object right here. It's like
MCP.types, but it's it's like a a type
that MCP servers will accept. and
they're sort of expecting like you might
give me an image and then we do give
them an image. So this returns image
content in code figure and what what I
do is I actually return that from the
tool call. So we are returning an MCP
type of image content and the way I'm
actually doing that is I'm bundling it
up using this FM image helper function
from fast MCP and I just do it in the in
a very straightforward way like we have
this buffer of bytes and then we write
the file to that buffer. So this is like
instead of actually say like when we
call save fig or anything that interacts
with file system we can use an IO
command like this bytes io will we'll
pretend that buffer is a file system
like something that we can write to and
then we can save it to the to this
object. Um then we get the binary data
of this like filelike object called
called buff as data. This is a bytes
object and we can pass it in here into
this FM image. explain that it's a PNG
and call this to image content function
that returns something that is image
content like our MCP server expects and
then we can return this and just to show
you if I come up this is all part of
this tool call. So the final result of
this tool call is to return an actual
image, which to me is the right um the
right way to think about this. And it's
super cool that VS Code Insiders with
Copilot has actually implemented that.
So I can literally download this file
and look at it. And the last thing I'll
also show is the MCP servers. If I list
servers, look at this guy and say show
sampling requests. Now I can see two
requests in this file. Two requests in
the last seven days. Okay. And at some
point, if I scroll down, I guess the
they're just like sort of slapping these
together. This was the second request
then and it okay, it's still using
GPT4.1.
So when I ran it a second time, it
worked on the same model. It didn't
update the model. I think in order to do
that, I would need to shut down VS Code
and respawn [music] it. But I do like
that I wasn't that I demonstrated that
we didn't actually need to update the
model because GPT4.1 is perfectly
capable. It's just, [music] you know,
sometimes it goes wrong. That's the end
of the demonstration and if you enjoyed
this video, please give me a like and
consider subscribing. Now, by the time
you see this video, I think more MCP
clients will become available for
supporting sampling and these other
features that I'm talking about in my
MCP series of videos. So, it's just
something to keep in mind. Now, if you
want to access that playlist, you can
see that in the video description. And
otherwise, I'm going to link a few
related videos, things I spoke about in
this one. In particular, I will link you
to a video on either elicitation or
pageionation and another one on fast
agent where I talk about [music] how to
use that in more detail than I went in
on this video. Thank you again for
watching and namaste. [music]
>> [music]
AI ENGINEER ROADMAP [ 🚀 learn AI Engineering in 2025 ] ► https://zazencodes.com/ NEWSLETTER [ 🍰 weekly video email ] ► https://zazencodes.com/newsletter MCP TUTORIAL PLAYLIST ► https://www.youtube.com/playlist?list=PLTPHo6vRHQ8rw8dRK1kdqzULzGB6YlxTa CODE DEMO [ ⭐ source code for this video ] ► https://github.com/zazencodes/zazencodes-season-2/tree/main/src/mcp-sampling 0:18 - How MCP Sampling Works 1:46 - MCP Sampling with FastMCP 4:34 - MCP Sampling Python Demo 9:53 - MCP Sampling VS Code Demo 15:55 - Image I/O with FastMCP