MCP Sampling Tutorial — How to Build LLM-Powered MCP Tools (Python + FastMCP) | DailyDevLists

Loading video player...

Full Transcript

3,811 words • EN

In this video, I'm going to explain what

MCP sampling is and how to build

sampling into your MCP tools using

Python. This is part of my MCP concepts

demo series, and you can see a link to

the playlist with all of my videos from

this series in the video description.

Sampling is when your MCP server will

make a request to an LLM. And how this

is done is the server will pass a

message to the client. So the server

might ask claude desktop for example,

hey I have a sampling create message

request. Uh then the client will pass

that on to an LLM that the user

configures or that the client has set up

somehow. I'll show you that as an

example. And then we'll return that

generation to the client and then the

client will present this back to the

server to continue working. So in this

way the server can access LLMs where the

server doesn't need to configure those

themselves. they just go to the client

and say, "Hey, whatever LLM you're using

or whatever you have set up, can you

just run this prompt for me and give me

back the answer?" And then the server in

this case like can continue processing

before returning the final result of

whatever tool call that it's making. And

this has been around since the first

version of the spec. If I go to November

5th, 2024, I can see sampling is still

in here as a client feature. And in the

latest version now we have just an

additional feature elicitation which I

spoke about in a previous video linked

in that playlist I talked about. Now

let's see this as a demo and talk about

how to implement it with Python. And if

you're learning AI engineering you

should subscribe to my newsletter. You

can go to zazencodes.com and you can

sign up for free. I have a few bonuses

for you if you sign up and uh you'll get

one email a week like this where you can

decide if you're interested in watching

that video and learning about this

particular topic. Here's the demo for

now. Ignore this agent. We're going to

focus on this guy right here, this

server. py. And this code is all open

source in my Zazzen Code season 2 right

here on GitHub. You can download this

and check this out yourself. And I'm

going to open up Vim and we'll look at

this CSV quick plotter server. The

purpose of this server, it's just a

quick sort of demo server. It's going to

take in a CSV file and it's going to

create a plot based on the data inside

of that CSV. And in order to do that

plot creation, it's going to use LLM

sampling. And if I look at fast MCP,

this is the framework I'm using to build

this MCP server. And it's got this fast

MCP object that I'm instant

instantiating in order to get MCP here.

And then this guy is how I'll use I'll

define tools. I sort of take this server

very similar to fast API and okay, so

this is the tool that I'm defining and

we're going to demonstrate now. It's

called analyze CSV and plot. It's got

CSV path which is a string and then it's

got this context object. And this thing

is dynamically injected from fast MCP.

It's not something that the user is

going to see. We're not going to see

this or be able to set this when we're

calling this function. This is like a

thing that fast MCP provides. And down

here is where I'm doing the sampling. Uh

so what we do is we submit an LLM

request and hold on that request. I've

got this with a wait. So the script is

going to stop at this point until it's

got a response and then it's going to

continue on do something with that

response. But uh this sampling request

itself has a user prompt like a message

a system prompt and you can set max

tokens and temperature. And there's

other things that we can do through

through the uh fast MCP implementation

with this. And if I go back to user

prompt that's going to be this guy right

here. So we're dynamically generating

some prompt something to evaluate. And

the idea is with sampling like this is

going to depend on what's come before

this in the workflow. And so in this

case I want to profile some data. So

I've done some data profiling up here.

I'm reading a CSV and I'm calling this

profile data frame which does this thing

which kind of like gets the shape of the

data and some example of the data and

summaries and stuff. And so based on

this profile I wanted to create a chart.

So I come down here and I'm like write

Python that builds stuff and then I'm

going to execute that stuff. So I go to

from response, I find some code and I'm

going to execute this. The number of

times that I've executed arbitrary LLM

generated code on this channel is

getting ridiculous, but it's it's just

so cool to me. It's such a cool use

case. And so I compile it, whatever, use

this compile command and then I exec it

right here. So this is generally

dangerous and bad practice, but we're

doing this for a fun demonstration, so

it's fine. In order to demonstrate this,

I'm going to be using something called

fast agent. This is just an MCP client

that supports sampling. It works in the

terminal and it's a little bit tricky to

set up. So, um, I thought I'd show you

how to do that. This is the readme file

for for like my instructions. So, you

can try this out yourself. And I'm going

to run uv vmv.p.

And we're going to install Python 3.13

inside of a virtual environment. So

that's just created this VEM file right

here. And now I can install stuff into

that. So if I said UV pip install fast

agent MCP, I've been loving UV because

it's just super cool. It's super fast.

So it it installed a bunch of stuff into

here and I can prove that to you. If I

look inside of VM, I'll be careful not

to show too much. So this is all the

libraries agents called MCP agent. So if

I see MCP agent, that's the library. So

that's what was installed along with all

the dependencies for this thing. Um,

this here is the script I'm I'm about to

run from fast agent. I'm not going to go

through this. It's out of the scope of

this video, but you can see the source

code. And I have a different video on

fast agent if you're interested in using

this thing. So I'm going to say UV run

agent. This will spin up my agent. And

we're going to we're going to start

doing some stuff. We're going to try

this tool out. Okay, so it's spun up and

you can see it has access to one MCP

server and I can have I can ask like

what tools do you have and this is the

tool quickplot CSV analyze CSV and plot

it's got a description and we're going

to try to call this tool and I want to

demonstrate the sampling. So, hopping

back to my readme, I can have plot full

path to CSV. And the CSV file I want to

do is called serial. You can see it

right over here. Um, I'm going to just

generate some plot some visualization of

all these serals. I want the full path

to this object. So, I'm going to grab

that like this. And I'll just say plot

this thing

serial.csv.

Okay, let me zoom in. Make this big so

we can see what's going on. It already

called this whole thing. It's logging

that it's it's sampling the data. It

produced this data profile. And notice

how this is going into the sampling

agent. Do you see how it's running on

cloud 3.5 haiku down there? But if I go

up, this was already worked so fast. Oh

my god. Here's what I just said. And the

output of the tool. Then we first see

the the initial prompt that I asked and

we can see the model that we're using.

It's GPT5 mini. Then this is some output

from this plotter from the same thing

this where now we're seeing some output

and we're seeing that the assistant

requested tool calls the it we requested

this specific tool request this um these

are the arguments and this is the name

of the tool over here with fast agent

and now we trigger the sampling and so

as this tool's executing something

called a sampling agent spawns down

here. So, we get this sampling agent

starting to work and this thing's using

Claude Haiku. And notice how we were

using GPT before, but now we're using

Claude Haiku with our sampling agent.

And I'll explain why that's the case.

And this sampling agent is giving

output. This is the prompt that we

provided to the agent um to execute to

the LLM. So, coming down through the

prompt, we see a bunch of output from

the sampling agent. So, this is going to

be output that Haiku is providing us.

This is from the sampling agent which is

going back to my MCP server and we

generate all of this Python code. The

server executes that and then this line

down here is showing the server

returning that to the client. So the

plotter sends this back to the client

and we see this image content just

appear. And when I'm talking about the

client, I mean fast uh agent, I mean my

terminal right here. So this gets sent

from the MCP server back to my terminal

and it just dumps this whole blob of

JSON data which represents the image. It

represents the PNG image and I fast

agent is unable to render this for me or

at least I it I think it can. I just you

know I didn't hook it up to do that but

it does understand this data. It it can

read this binary data and it figures out

what's going on in here. So it's

describing that to me like hey top left

we have a histogram. Top center we have

a correlation heat map. Top right we

have box plots and it's done that just

by looking at the this binary data and

using its multimodal capabilities. And

by it I mean GPT5 because this is

running on GPT5. So GPT5 looks at this

binary data and figures out what it is

and sort of um explains that to me. Now

I'd love to see this image. However,

when I was testing this I wasn't able to

get that working with fast agent. Again

I know there's a way to do it. I just

couldn't figure it out. So, uh, so this

data is kind of lost to us in this

example, but we're going to, um, run

this again with VS Code Insiders

Edition, which also supports sampling

right now. And, uh, we're going to be

able to see this image get rendered for

us. I will note as well that I could

change the behavior of the server to

maybe save this to a file on my

computer. And I'm also going to show you

how we actually return this image

content because that's pretty cool, but

that's unrelated to the video topic. So,

that's at the end. I'll I'll show you

that. And right now I'm gonna go to VS

Code and let's give this a shot. Here's

the server that I want to access. And

that's at this directory. And I'm I'm

gonna need this for VS Code to configure

this. Let me show you that. We're going

to go to VS Code insiders. I'm going to

configure this MCP server, this CSV

quick plotter. And we're going to we're

going to do this. So if I open up my

sort of commands, I have this open user

configuration. Let's see what that is.

This guy in here is in my user directory

for this VS Code Insiders. and it's

mcp.json. You can download this. This

isn't like a gated product. You can just

download VS Code insiders and use this.

Um, and here's how I've set up an image

generator demo. Uh, it was called CSV

uh, quick plotter. I guess here's my

Python. That's the virtual environment

that I I've installed fast MCP and all

the stuff that I need to run this

server. And this here is the server. So,

this is how I can set it up. However,

it's not called test. Let me go back.

It's called MCP sampling CSV quick

plotter. So, CSVQ quick plotter. Okay.

And if I look at list servers, I can see

CSV quick plotter is stopped. I'm going

to click enter on that and I'm going to

say start server. So down here I can see

that this server is working and it's

discovered one tool which means when I

open up this guy this button up here and

I type hash I should be able to see

something called here analyze CSV and

plot. This is going to be the function

that I want to call from my quick

plotter. But first I want to go back

into the server settings for CSV quick

plotter and I want to see how this is

configured with respect to sampling. And

look at this. There's this set model the

server can use via MCP sampling. I'm

going to click enter on that. And right

now it's set up to GPT 4.1. I could also

say auto or something else. Let's um

let's leave this as GPT 4.1. And then

what I'm going to do down here is I'm

going to be using claude sonnet. So I'll

see if we we indicate that as logging.

So I'm just going to accept that plot

this slash serial.csv.

And I I got to I got to specify what

tool I'm going to use with um with

Copilot. That's how we got to do it. We

got to actually explicitly tag the tool

that we want to call. Here we're seeing

our tool call request and the CSV path

which has been populated. The MCB server

CSV quick plotter has issued a request

to make a language model call. Do you

want to allow it to make requests during

this call? I'm going to say allow in

this session. I can see this indicator

running. This has been going for 5 10

seconds and I'm not seeing any logs as

it goes like I did with fast agent. Uh

oh, there was an error trying to analyze

the plot. Let's see. I'm not really

sure. But um let me try and change my

sampling model. If I look at this

configure model access,

show sampling requests. Look at this.

This is the actual log of what the

sampling request said.

and I passed in the data and all that

stuff. And

down here, I'm actually seeing the

response as well. So, this was the code

that was returned and it's trying to do

all these plots. It's possible that I

just had a an error in here. So, I'm

thinking this string is not defined.

It's probably just a syntax error in

this code. So, I want to try this again.

But while I'm here, what I'm going to do

is try this. I'm going to show list

servers down here. I'm going to

configure model access to something

different. So, Sonnet 3.5 is a really

good model. I'm going to pick that one

instead. And we'll try that out. And for

here, I'm going to run the exact same

prompt, but in just a new session. And

when I look at the log from this

request, here's where I can see Copilot

GPD4.1.

I'm going to want to see Sonnet making

that request to to make sure that my

configuration was updated. Okay, it ran

this successfully this time and it's

outputting the information about the

plots, what they show. But can I

actually see these? Here's the data that

was returned and it's this binary data.

But look at this. I can actually

download the plot itself. Save to

file.png. Let's open this up.

Okay,

amazing.

This data set, by the way, is this guy.

80 cereals from Kaggle. So like, for

example, American home food products,

General Mills, Kelloggs. These are these

guys right here. Like that's Kelloggs,

that's General Mills. This one's post

Quaker Oats. And this is I guess this

chart here is like the number of uh

cereals per brand. And we have a

distribution of ratings. There's some

sort of rating in this data set. I don't

know what that one is, but the model

really honed it. a rating of cereals

possible from is it this data set

doesn't even know but it looks like a

lot of the the model really honed in on

this ratings for this we have a

distribution of ratings we have the

highest rated foods have low calories

this chart obviously catches my eye a

little bit just quick analysis thing

like let's look at the first column

right here this is going to be calories

so anything that's red will be a more

will be a higher correlation to calories

so the reddest thing I see is weight. So

the the heavier the portion size, the

more calories it has. That makes some

sense, but also sugars is in here. And

then fat is another one. And it's

inversely um correlated with fiber. So

the more fiber in here, the like the the

lower the calories. The more fiber

content, the the less the caloric

content. Totally makes sense. All right.

So, uh I think this is really super cool

that this image was made available to

us. So I want to complete the

demonstration by showing how I actually

implemented that over. We're going to go

back into the code and talk through this

going back into the server. Um I hope I

haven't missed anything. All this should

be clear. It's all open source. H but

the server itself where where I return

this I'm doing some certain stuff with

images. So down here fast MCP has some

utilities for images and it's got this

image which I imported as FM image. It's

like a helper function. And if I look

where I'm using that, this is where I'm

encoding the figure. So I take in some

figure which is a map plotib object at

this point. It's called figure. If I

come up, it's from matt plot plotlib.f

figure import figure. So I import I

bring in that figure into this function.

And this is where I return this image

content object right here. It's like

MCP.types, but it's it's like a a type

that MCP servers will accept. and

they're sort of expecting like you might

give me an image and then we do give

them an image. So this returns image

content in code figure and what what I

do is I actually return that from the

tool call. So we are returning an MCP

type of image content and the way I'm

actually doing that is I'm bundling it

up using this FM image helper function

from fast MCP and I just do it in the in

a very straightforward way like we have

this buffer of bytes and then we write

the file to that buffer. So this is like

instead of actually say like when we

call save fig or anything that interacts

with file system we can use an IO

command like this bytes io will we'll

pretend that buffer is a file system

like something that we can write to and

then we can save it to the to this

object. Um then we get the binary data

of this like filelike object called

called buff as data. This is a bytes

object and we can pass it in here into

this FM image. explain that it's a PNG

and call this to image content function

that returns something that is image

content like our MCP server expects and

then we can return this and just to show

you if I come up this is all part of

this tool call. So the final result of

this tool call is to return an actual

image, which to me is the right um the

right way to think about this. And it's

super cool that VS Code Insiders with

Copilot has actually implemented that.

So I can literally download this file

and look at it. And the last thing I'll

also show is the MCP servers. If I list

servers, look at this guy and say show

sampling requests. Now I can see two

requests in this file. Two requests in

the last seven days. Okay. And at some

point, if I scroll down, I guess the

they're just like sort of slapping these

together. This was the second request

then and it okay, it's still using

GPT4.1.

So when I ran it a second time, it

worked on the same model. It didn't

update the model. I think in order to do

that, I would need to shut down VS Code

and respawn [music] it. But I do like

that I wasn't that I demonstrated that

we didn't actually need to update the

model because GPT4.1 is perfectly

capable. It's just, [music] you know,

sometimes it goes wrong. That's the end

of the demonstration and if you enjoyed

this video, please give me a like and

consider subscribing. Now, by the time

you see this video, I think more MCP

clients will become available for

supporting sampling and these other

features that I'm talking about in my

MCP series of videos. So, it's just

something to keep in mind. Now, if you

want to access that playlist, you can

see that in the video description. And

otherwise, I'm going to link a few

related videos, things I spoke about in

this one. In particular, I will link you

to a video on either elicitation or

pageionation and another one on fast

agent where I talk about [music] how to

use that in more detail than I went in

on this video. Thank you again for

watching and namaste. [music]

>> [music]

MCP Sampling Tutorial — How to Build LLM-Powered MCP Tools (Python + FastMCP)

ZazenCodes

110 days ago

19:30

Model Context Protocol (MCP)

Rank #26

Description

AI ENGINEER ROADMAP [ 🚀 learn AI Engineering in 2025 ] ► https://zazencodes.com/ NEWSLETTER [ 🍰 weekly video email ] ► https://zazencodes.com/newsletter MCP TUTORIAL PLAYLIST ► https://www.youtube.com/playlist?list=PLTPHo6vRHQ8rw8dRK1kdqzULzGB6YlxTa CODE DEMO [ ⭐ source code for this video ] ► https://github.com/zazencodes/zazencodes-season-2/tree/main/src/mcp-sampling 0:18 - How MCP Sampling Works 1:46 - MCP Sampling with FastMCP 4:34 - MCP Sampling Python Demo 9:53 - MCP Sampling VS Code Demo 15:55 - Image I/O with FastMCP

Watch on YouTube

Video Details

Category

Model Context Protocol (MCP)

Featured Date

December 2, 2025

Quality Rank

#26

AI Recommended