Build a Support Agent with Vercel AI SDK – Full Tutorial | DailyDevLists

Build a Support Agent with Vercel AI SDK – Full Tutorial | DailyDevLists

Loading video player...

Full Transcript

18,136 words • EN

The Verscell AI SDK is a TypeScript

first toolkit for building AI features.

It streamlines text generation,

embeddings, and structured outputs. In

this course, you will learn how to use

the Verscell AI SDK to create and ship a

customer support agent that makes

autonomous decisions to either answer

questions based on your support docs or

search the web in real time. Mayo from

Scribba developed this course.

Welcome to the course on building a

customer support AI agent with VCEL AI

SDK and OpenAI. Now, over the past

couple of years, we've seen how AI has

transformed how we work. And one of the

most exciting recent developments is the

rise of AI agents. This allows the large

language model to not be static but be

able to trigger these different actions

and make more autonomous decision-m

thereby giving us more productive

results. In this course, you're going to

dive deep into the various strategies to

help you build a production ready AI

agent. We will be focusing on the use

case of customer support to make this

learning process easier for you. Now,

we've seen the use case of AI as a

chatbot where you ask a question and you

get a quick response. But as I said, AI

can go much further now with these

agentic capabilities because of the

ability to take action and call APIs and

get into databases and do things like

booking hotels online, interacting with

websites, debugging code, performing

deep research. And so now businesses are

able to use these AI agents to automate

complex tasks, synthesize information,

and deliver long- form insights. Now,

one of the most popular and powerful use

cases is customer support. Why? Because

customer support is at the heartbeat of

every company. This is where complaints,

bugs, and issues surface. and solving

them well is essential to keeping

customers loyal and happy. AI agents can

make a huge difference here because they

can first of all help to escalate and

resolve issues quickly. Second of all,

they automate responses to basic

queries. So the human experts only focus

their valuable time on the complex cases

that the model cannot confidently

resolve. Thirdly, they help to

personalize support by retrieving

details from past conversations or user

profiles. Again, this is very useful for

customer attention. And finally, they

can help the company analyze customer

interactions, uncover trends, problems,

and even new product ideas. So in this

course you will learn how to build

exactly this type of AI agent that

doesn't just generate text but

intelligently routes queries retrieves

real-time information and reduces

hallucination. So firstly you will learn

how to create logic so that the large

language model will intelligently

classify and route the user's query

based on their intent. Secondly, you

will learn how to build this AI agent

that doesn't only just know when, but

also how to retrieve relevant real-time

information. You will learn the basics

of using the popular VEL AI SDK, which

will help you to simplify your workflow

and speed up your development process.

You'll also learn the integration of

retrieval augmented generation for more

accurate answers. And building this

customer support AI agent in the end

will enable you to see and have an

application that is not just QA for

documents but also the ability to

perform web search and retrieve

real-time information from the web. And

so by the end of this course, you will

have a working customer support AI agent

that is a powerful real world

application with actual real world use

cases to dramatically improve the

support experience for the users of

whatever applications you're working on.

Now this is what it's going to look

like. Here we have asking a basic

question. How do I join the Scrimmer

Discord? And it's going to be trained on

Scrimmer's help documentation.

The results are shown below where the

question is asked and the user is able

to view the sources so they can see what

documents or what specific pages were

where the answer was retrieved from. And

likewise, web search showing the other

side of the autonomous decision the

agent can make where the agent is able

to go to the web to retrieve information

because they identify that the answer

was not available in the customer

support docs. Now before we dive in, you

should already know these. First of all,

you should know how to use the OpenAI

API, how to create an account, retrieve

the API, and just work with APIs in

general. Secondly, you should be

comfortable with just crafting basic

prompts. If you've done it on the chat

chip interface, that is fine. Third, you

should be comfortable with the concept

of retrieval augmented generation. Now,

we are going to go through a brief

refresher in this course, but I highly

recommend the scrimmer course if you are

completely a beginner at what retrieval

augmented generation is. Fourth, the

implementation of basic function calls.

You should be familiar with the idea of

function calling. And fifth, creating

basic express service for API routes.

Again, Scriber provides a course that

you can go through as a refresher. But

don't worry if you're not super expert.

I will walk you through the essentials

as we go along in this course. Now, this

is just a brief warning. This course is

quite advanced. Okay? So the code is

fairly complex and the challenges I've

set are really going to make you think

and work. So at the end of each scrim,

make sure you take time to play with the

code I have presented and really make

sure you understand it before moving on

to the subsequent challenge. With that

being said, my name is May Ashen. I'm

going to be in teacher in this course.

I'm an AI engineering educator. I'm also

the co-author of Learning Langchain

published by O'Reilly. If you want to

get my thoughts on technology, AI,

entrepreneurship, you can follow me at

Maya Ashin. With that being said, if

you're ready to start building powerful

AI agents that can make a real impact,

let's get started.

As discussed earlier, you should already

have a basic idea of how retrieval

augmented generation works and how it's

used to create this question answer

dynamic between a large language model

and a user. Let's quickly recap. So what

is rag? Rag is simply a mechanism where

we store information somewhere. This

could be a database or the web. And the

user asks a question. We retrieve the

most relevant pieces of information,

pass it to the large language models

context. the large language model is

able to give the final answer that's

contextually relevant. Now, why is this

important? This is important because

large language models are limited to

data they're trained on. So, if you ask

a question that's outside the training

window, it may hallucinate or say it

doesn't know. But with retrieval

generation or rag, we can provide fresh

contextual information so the model can

give accurate answers it otherwise

wouldn't know. So for example, if I ask

the question, who are the most recent

FIFA Club World Cup? The large language

model may not know, but with retrieval

generation, you can retrieve the fact

that Chelsea Football Club beat Paris

Sanjaman and then this is passed into

context of the model so it can generate

an accurate response. So where does this

external data come from? Well, this is

pulled from places like the web

database, local text files. And in this

course specifically, we'll be relying on

a vector database. This is a special

kind of database designed to work with

embeddings. Now, embeddings are simply

numerical representations of words,

phrases, or even images. So if you plot

them in this multi-dimensional space as

you can see in the diagram semantically

similar items will end up closer

together. And when you ask a question

what's actually happening under the hood

is we convert the text into these

embeddings and then we compare them

against one another in the database and

the closest ones are then retrieved as

context and then passed so they can be

used locally. Now, for the sake of this

course, we are going to be using

OpenAI's latest embedding model, text

embedding three small. It's fast, it's

accurate, and it's perfect for tasks

like search recommendations, in our

case, retrieving support documents

relevant to answer the user's question.

So, here's what to do. First of all,

visit openai.com.

Create an account if you haven't done

so. Get the API keys and then go back to

Scriber. Now, when you go back to

Scriber, in the bottom right corner,

click on the small gear icon and add a

new entry called open AI API key. This

is where you're going to paste the API

key and save. So, at this point, you

should understand what rag is, why it's

important, have a basic grasp of

embedding, and have your OpenAI key

ready to go. So, with that being said,

let's move on to the next lesson where

we'll run our first function to generate

embeddings and see what they look like

in action.

So, now we are in the code editor. I've

already pre-installed the relevant

libraries. At this point, you should

have already saved your API keys in the

edit environment at the bottom right.

And so if you go to the config, we've

essentially imported this OpenAI client

from the OpenAI library. Everything's

going to be pre-installed when you press

run. So it would install everything in

package.json, which is just essentially

this OpenAI library. And then it will

attempt to run start as well, which will

trigger this file. So in the config, it

checks if you've saved the environment

variable, the API keys, and if you

haven't, it's going to throw an error.

We create the client and then we import

into the main file. Now what we have

here is OpenAI has a function called

embeddings that allows you to create an

embedding based on input text and all

we're doing is passing this text will be

converted to an embedding into input and

then just console.log the results we get

back. So if I click run open here we get

the results. So this is the programmatic

version of what embeddings look like.

Essentially, embedding.data gives you an

object that includes these different

values. And it's important to see

because you can see that all these

unique numbers have been assigned to

this block of text. And in the case of

this particular embeddings model, there

are 1536 dimensions. So here you can see

three rows 6 7 8 9 and all the way down

to you hit 1,536

different items in the array. And so

this is essentially what the embeddings

look like for this block of text. Now

obviously if I change the text and when

you change just press control S or

command S to save the changes. Click run

and you'll see a different group of

numbers are generated because the text

is different. And of course, if you

eventually just want to access just the

embeddings which we will later on to

then pass on somewhere else, you

essentially would need to first grab the

first array here. So this is the only

object inside. So again, the first item.

Then you want to grab embedding dot

embedding and then zero. So if I save

control save should give value. There we

go. So now we have the full array of

embeddings and then we can pass this

assign this to a variable and then pass

it somewhere else. So you can play

around here changing the different text

just to have a feel for it. This is

essentially the function that creates

the embeddings. See you in the next

lesson.

Okay, so as discussed in previous

lessons, a big part of retrieval meta

generation is the ability to store data

that you need somewhere else. Data that

will be relevant to answering the user's

question. And at runtime, we convert the

question into embeddings. We retrieve by

comparing the embeddings to the

embeddings of stored data in a database.

and then we pull the relevant documents

back. In this course, we are going to

use superbase to be the vector store or

vector database that will store the

embeddings of these documents that you

eventually want to retrieve based on the

user's question. So, Superbase is

essentially a Postgress open-source

database, but most importantly, it has a

PG vector extension. This is just an

extension that essentially allows us to

work with embeddings at scale. Go to

superbase.com. Once you're in

superbase.com, follow the instructions

to sign up and create an account. And

then it's going to prompt you to create

a new organization. Just create a free

plan, personal, and just give your

organization a name. Once you give your

organization a name, you want to go in

the middle of the web page, click on

create new project, and then you'll be

given an interface similar to this.

Choose a region, East, US, or wherever

is kind of closest to you. Give a

database password you can remember and a

project name. Maybe just call it

Scrimmer customer support AI agent. Once

you've created the project and you've

created the organization, you're going

to see an interface like this. And what

you want to do is scroll all the way

down to the bottom left. Click on the

gear icon so you can go to the settings.

When you're in the settings under

project settings, move your mouse all

the way down to API keys. Click on API

keys. And then you're going to see this

interface with what's known as your

public key, which is safe to use in

browser, and your service ro key, which

essentially allows you to bypass raw

level security and access any data in

your database. For the sake of this

course, we're just going to take both of

them. So, make sure to copy both the

public key and the service ro key. So,

you click reveal and also copy that as

well. Once you have both of these

values, you want to come back, click on

the gear icon in the bottom right of

your scrim, and you're going to get the

interface here. You've already saved

your Open API key. So save your public

key as the value for superbase URL and

then superbase service

key place the value of the service ro

key here. So at this point you've set up

Superbase URL and the Superbase service

ro key. Now let's jump into the actual

code. So now at this point you should

have your OpenAI API key and your

Superbase service ro key. And so this

will allow you to create the Superbase

client so you can access the database.

As you can see, we've stored some dummy

text and this text can be replaced with

your own documents. But for the sake of

demonstration, I'm just showing how we

can take this, embed it, and then store

it somewhere that we can retrieve later

on. So, different topics, Magna Carta,

Apollo, the history of computing, and so

on and so forth. Here we have the logic

for asserting the information from this

text into the vex store. And here we

have the main function. Now before we

proceed, you need to go to

created_tables.sql

and copy this SQL command which will

help you to enable PG vector extension

and also create a documents table. You

will pass in the content, any metadata

and the embeddings associated with the

content. Remember the model we're using

is 1536 dimensions. So that is what we

are going to pass in here. So command

copy go back to your superbase interface

UI. Now previously you were in the

settings section. Click on this icon and

this will open the SQL editor. When the

editor is open it's going to have a

button for you to create a new command

essentially and just paste what you just

copied in here and then click run. So

once you've run inside the Superbase

editor, it's going to create a table and

it's going to create this extension. And

so you will be able to see a table like

this. Now it you will initially have an

empty table until we finish this

exercise here. But you should see a

documents table here. Once you've

completed this task before you've run

the command, I'll just walk through

what's going on here. Essentially, your

source documents are in this directory.

So we're going to fetch it. We're using

text embedding three small as the model.

Your table name is documents as per what

we stored here. Here is an option. It's

just a toggle I created that allows you

each time you run this script to delete

all the contents of your database. Uh

it's just good for the sake of

demonstration because then each time we

rerun this or you rerun this, you don't

have to think about merging content from

a previous run. If you turn this to

false, then every new run will just add

new content to the documents table

without removing previous content. So

all we're doing here is going into the

path, fetching the text, and what we're

going to do is embed the file content.

So you remember in the previous lessons,

we discussed using the embeddings

function to create embeddings based on

content. So now we're just simply

repeating that exercise except we just

fetch the file content from the text

file. We pass it here as input and then

we store or we are creating objects that

have content in the same object as the

embeddings and metadata which include

the source file name. And so this is

very useful because when you later on

retrieve the relevant documents, you're

not just retrieving the content, you're

also retrieving associated metadata

which you can then use to basically see

where the information came from. So once

we're done creating the embeddings, all

we're doing down here is using the

superbase client from basically just

points to the table and then insert

command and we're just going to insert

all the collected documents and that's

it essentially. So if you go to index

and you've done everything as per your

API keys, all you need to do is click

run. We should, as you can see, as per

the terminal ingest with the source as

text.txt. txt and you can see the

embedding attached. We've uploaded one

document to Superbase table documents.

So then check your Superbase documents

table. Make sure this text is there. You

should see a table that consists of this

text as a content metadata which should

be source text.txt and then the

embeddings. See you in the next lesson.

All right. So based on the previous

lessons, you should have already been

able to create embeddings, understand

what the role of the vex store is, and

so we've taken the documents and we will

visit splitting shortly in another

lesson, but essentially we've taken the

documents and we've embedded them and

stored them in Superbase. So you should

have already done that in the previous

lesson and the document should be in

your documents table in Superbase. Now

in this lesson, we're going to go over

this step, retrieval. The user asks a

question, we convert the user's question

into embeddings. We go into the veto

store. The vector store is going to

perform its similarity calculations to

see which embeddings in the vector store

are most similar to user's query. We

retrieve the relevant portions and then

we pass the prompt, the query the user

provided plus the context of relevant

docs in a final prompt that's sent to

the model to generate an answer. So this

is essentially what we're going to work

on in this lesson. So jumping into the

code, I will just walk through the

changes we've made. We still have the

documents folder in config. We still

have superbase client and OpenAI. We've

added a few things in the utils which

I'll come to shortly. But one of them is

the prompt that's going to be sent to

the model which is going to pass in the

user's question alongside the context.

This context is what we are essentially

trying to retrieve from the vector store

based on the user's question. So

essentially just to take the magic, this

is what we're going to send to the model

to generate an answer. And so our job is

to essentially solve for this equation

because we already know what the query

is and we already know what the rest of

the prompt is. We just need to fetch the

context. And so we have this new logic

called retrieve similar docs that will

essentially help us to retrieve the

relevant documents from the vex store.

But before we do that, please go to

created_tables.sql.

And now I have new logic. Now in the

previous lesson, you went to the

superbase editor to create the documents

table. This time just copy this whole

thing. Go back to the editor and run it

again. What is this doing? Essentially

this is a database function that is

going to help us to bind the relevant

embeddings that are most similar to the

user's query. So you have other

variables here which we'll be able to

pass in later on. We'll be able to say

okay how many results should we return?

What is the threshold? Right? So all

embedding similarities are between 0 and

one where one meaning that the two

embeddings are very similar to each

other and zero means that they're

similar. So maybe you don't want to pass

in too much polluted content into the

models context. So you can adjust your

match threshold. I've set 0.3 as

default. And then here is just an

optional filter that allows you to maybe

filter based on metadata. And this is

what you're going to get back from this

match documents function. Coign

similarity essentially is the

calculation I discussed that is

performed to essentially see which

embeddings as close as possible to the

user's query. Remember the user's query

is going to be converted embedding.

We're going to the vector store to also

see which embeddings are similar to the

user's query. So essentially this is a

database function that once you've

created it, we can then create this

function here. So Superbase allows for

the ability to make remote procedure

calls to database functions. Remember

you created this match documents

database function. So we can make a

remote procedure call here and then we

can pass in the query embedding as we

defined here the match count, match

threshold and filter. In this case, I'm

only going to pass the query embedding

and the match count. And to get the

query embedding, it's just what we've

covered previously. You're just creating

an embedding based on the query passed

in as a parameter in here. So the query

comes in, we generate the embedding

response and then we pass in the

embedding into the remote procedure call

to trigger this database function that

was created. And we've got in the

constants fixed variable just a fetch

five similarity count. Now we have also

defined a new variable called the answer

model. Why? Because in the previous

lessons all we were doing was just

retrieving the relevant documents. We

didn't do anything with them. This time

we're going to take an extra step. So if

you come back here, if you just focus on

retrieving the documents which is

calling the retrieve similar docs based

on this question which is covered in

this document here, this text.txt. If I

click run, what we're going to see in

the terminal is we should see that we

fetch the content and then this is what

we expect. We expect an ID. We expect

the content which will be the entire

document because remember we embedded

the entire text. We did not split the

text into chunks. So there's only one

block of text. And then we've got the

metadata. And this is the cosign

similarity I discussed previously. As

you can see it's saying 50/50 in terms

of how similar it is to the user's

question. And that's because this text

contains both the relevant information

to answer the question which is right

here in 1843 but it also contains a

bunch of other irrelevant information

that has nothing to do with the

question. So that's why the similarity

score is relatively low. So now we've

retrieved the docs. The next step is to

essentially create the prompt. And as I

discussed previously, we are going to

take these relevant docs, access the

content property. That's what this

function is doing. So we get the full

string and then we pass in the context

and the query to construct this final

prompt that the model is going to use.

Remember, every time you make changes,

just control S or command S. Once we've

created the prompt, we are now going to

send the prompt to the model and then

console.log the response from the model.

So now we are going to press run just to

generate the output. And so we're going

to send the entire prompt to the model.

Here we go. In 1843, Adah Love Lace

published notes describing algorithms.

She's often called the first computer

program. So that is essentially coming

from this block here. it's providing an

answer based on what is saw in the text.

And this is very different to just

simply retrieving the relevant docs.

You're now able to go into the vex

store, retrieve relevant information,

pass it as context in the final prompt,

send the prompt to the model, and

generate a final answer. So that is it

in a nutshell. You can play around with

this and when you're ready, I'll see you

in the next lesson where we're going to

discuss how to split this into chunks.

So you can have more relevant chunks of

text, embed those chunks and retrieve

the relevant chunks as opposed to in

this case we just retrieve one big block

of text. But before we do that, let's

have a challenge.

So at this point, you should have

embeddings of documents stored in your

Superbase. And in this lesson, we'll be

doing a short exercise where you're

going to recreate building the retrieval

mechanism we've discussed in previous

lessons. You take a user's query as we

have here. You embed that query using

OpenAI's embeddings function down here.

Then you retrieve the relevant docs from

Superbase calling the RPC match

documents function. And then you return

the relevant docs. When you have the

relevant retrieve documents, we already

automatically create them into a single

string. You create the prompt using the

get rag prompt function from here. And

then you pass in that prompt into

OpenAI's model to generate a final

response. And then you can see the

relevant constants here including the

model the simarity match cal and the

embeddings model all here ready to be

used in the functions provided. So in

the end when you run this function you

should be able to see the final

generated answer based on the user's

question. Now two things for

housekeeping here. Number one, make sure

when you're making changes, you press

command S if you are on Mac or control S

if you're on Windows to save your

changes. And the second thing is make

sure the question can be answers based

on the context in the documents that

you've stored in your Superbase. So for

example, if in superbase you've stored

content from the great fire of London

from the previous text written lesson,

then just keep this. If however you

don't have that and you have something

else or from the prior lesson, just make

sure you change the query so that you

can fetch relevant documents in the veto

store. Okay, with that being said, go

ahead with the exercise and I will come

back to provide a solution shortly. Good

luck.

Hopefully you didn't find that too

challenging. Let us start down here by

first of all getting the embeddings

response. Remember OpenAI provides

embeddings function and all we're doing

here is passing the embeddings model

name and we've got the input which is

the user's query. So done and dusted. We

just need now to access the embeddings

response. I believe that should be data

zero and embedding. So now we should

have the embedding corresponding to

users query down here. You just want to

get the documents and the match error

from calling superbase.

RPC and remember the database function

is called match documents. And then we

want to pass in the parameters. First of

all, you are passing in the users

embedding. So the the embedding of the

query and then the match count how many

relevant docs to return is limited to

five which uh was passed previously.

Okay. So now we have embeddings relevant

docs and now we want to return the

relevant documents similar to the users

query. Now further up here we want to

take the retrieve docs. This is the

context. And now we want to pass in the

query and the retrieve docs to construct

the final prompt that will be used by

the model and then finally construct the

response. So the AI response is going to

be taking in OpenAI responses

create and then we're going to pass in

the model on model which GT4 in this

case and you can change that later on.

The input is the prompt. Further down

here we're just going to console.log

response.output

text and that should be it in a

nutshell. So, AI response. And now,

let's click run and see what we generate

in the terminal. Voila. The Great Fire

of London destroyed some 13,200 homes.

That's exactly what we expect based on

the relevant documents. We can see that

that information is contained here in

this chunk. Okay, fantastic. Well, great

job so far. You've covered going over

embeddings, recap on using the veto,

retrieval, splitting using OpenAI. And

now in the next lesson, we're going to

move over into exploring Versile AI SDK,

a library that will help to greatly

simplify and abstract key logic for

building AIC apps. See you soon.

Okay, so in the previous lesson you have

learned how to essentially construct a

retrieval mechanism that will take a

user's question, create some embeddings

of the question, go into the vex store

and compare the users's embeddings

against what's in vex store, retrieve

relevant documents, create the prompt

and context and send to the model to

generate an answer. Now we want to

explore briefly an intermediate step as

shown previously of text splitting. And

essentially what is this? Now imagine we

have this long document. This is just a

passage from historical information on

the great fire of London in 1666.

Now, if you simply pass everything into

the vex store and embed all of this as

one chunk, what's going to happen is

when you ask a question, as we've seen

in previous situations, the vector store

is literally just going to return this

entire document. And this can be

problematic for various reasons. First

of all, maybe the user's question is

only could be answered by a particular

chunk of paragraph. And so by giving the

model the entire context of the entire

passage, you're essentially increasing

the odds that it misses out on crucial

details that may be required to answer

the question. It could be as simple as

this sentence right here, but the model

might get confused or conflate with all

the other information around that text.

The second challenge with simply just

grabbing the entire block of text is

that your model might not be able to

handle the full context window. As we

know models are limited by context

window of a number of tokens or

characters they can pass in for any

generation. Now obviously in this case

this is relatively smaller passage or

chunk of text but you could have for

example a PDF of hundreds of pages of

hundreds of thousands of character and

you will not be able to pass everything

into the same context window. So in

order to overcome this issue, we need to

find a way to essentially split our

document into various chunks, embed each

chunk and store each of them in separate

rows in the table in Superbase. So the

only adjustment or the main adjustment

I've made here is to introduce this new

function which is a very very basic

simple text splitter. is going to take

the text some sort of chunk size you

provide how many characters to split the

text into and overlap how many

characters should overlap between each

chunk. This is very very basic and only

for demonstration purposes. As a matter

of fact in this course for the most part

we are not going to use text fitting but

it is important for you to understand

how it works and why it's important. So

once we've done the text splitting or

create this utility function, the obser

documents logic has been updated. So if

you scroll further down, you'll see that

instead of just simply fetching the

entire text and and embedding that, we

first of all the document into chunks

and we have stored the chunk size 2,000

characters in an overlap of 100,

similarity of five and a threshold of

0.5. And so when we upsert this

document, we are essentially going to

chunk. Then after we create the chunks,

we are then going to embed each chunk.

So the only difference between this

example and the previous time was

previously we just took the entire

document and we embedded it. And this

time we are taking each chunk and we're

embedding each chunk and we are adding a

row for each chunk. So essentially you

will end up with a table with multiple

rows and each row is representing 200

characters of the block of text and the

associated embeddings. So I'm going to

run this function now. Go to your

index.js

and you can see the question how many

houses were damaged and it's a very

specific question inside corpus of text

which can only be found in one sentence.

So, we need precision. And so, this is a

good use case for splitting. So, I'm

going to click run in the top right.

Let's see what we find. And you can see

we've now embedded, we read 5,988

characters. We split into four chunks.

And the different chunks were embedded

into the vector store. Okay. So, now you

have four rows. You can verify in your

end instead of that one row with one

block of text, you should have different

rows now with each chunk. So I'm going

to unc comment this and let's uncomment

this whole block here. This is just a

repeat of everything we learned in the

retrieval lesson. We're going to go in

the ve store pass in the users query.

But you're going to notice something

different this time. Actually before I

generate this, let me just demonstrate

what I mean. So now we're going to run

again. And notice that we have fetched

more than one block chunk of text.

Right? In the previous lessons, you saw

how it was just one block. But now you

can see it's an array of multiple

objects and each object represents a

chunk of text that was embedded. And

each chunk has a similarity score

assigned to it saying how relevant it is

to the question. And you can see that

based on the query provided, we said how

many houses were damaged during the

great fire of London. And so you can see

that this chunk already has the answer

to the question because the answer to

the question is 13,200 houses. And so

this chunk has already succeeded in

fetching the relevant portion. So

imagine you had thousands and thousands

and thousands and thousands of

characters or words and what the

embeddings approach has done is allowed

you to just quickly retrieve this small

chunk. So we can precisely provide this

context to the model. You can see the

similarity scores all above the match

threshold that was set in the constants

file. So you can see we have an array of

different retrieve docks. Okay. Now we

want to just complete the remaining

steps as we did before. So just

uncomment this and essentially we're

taking creating the prompt with the

chunks of text retrieved which is all of

these text combined together as the

context string. We then essentially pass

in this prompt to the model and generate

response. So let's click on run and see

what happens. So now we're running and

waiting for the response. And according

to the context provided, approximately

13,200 houses were destroyed. And that's

exactly what we expected based on the

information here in this sentence right

here or paragraph rather. So now you've

seen how we've been able to basically

take a a relatively large corpus of

text, split into chunks, embed those

chunks, and now we're able to have more

precise answers to the question. So I

hope this gives a good overview on

dealing with inserting data into your

database as embeddings, retrieving them,

and answering questions. In the next

lesson, we're going to explore the

Versell AI SDK and discuss how this can

streamline and make it much easier to

build AIC apps.

So, as discussed earlier in this course,

we would later on interact with Versell

AI SDK. Now, what is Versel AI SDK? It

is essentially an open-source library by

the Versell company, the same group and

team behind Nex.js JS that essentially

provides a unified interface to interact

with LM providers. You can plug and play

different providers. With the same

interface, you can access very useful

functions that are used for building AI

apps and agents for generating text,

generating embeddings, generating

structured output, generating different

steps that an agent can take, and it

just simplifies the entire process. So

it's a very useful library for us to use

especially in this course. For example,

you have a very simple generate text

interface that you can import from the

SDK. You provide your model and a

prompt. That's it. If I want to swap

OpenAI with cloud or any other model,

it's very easy to do so as opposed to

going through each of their

documentations and finding their own

different styles of generating an AI

response. The same thing for embeddings.

You can simply import embed from Versel

AI SDK and use that unified interface.

It doesn't matter what the model is, you

can still get the same outcome. So now

if we jump into the code, the config is

still the same package.json, we've

installed the AI SDK both here and here.

And then finally, we have imported the

OpenAI model that we're going to use as

our AI model. So the key difference now

between this and previous lessons is you

are not using OpenAI's interface

anymore. You're using the generic

interfaces provided by the AI SDK. For

example, here we want to generate text.

We provided the OpenAI model and the

prompt is write a brief p. So if I run

this, we should see a brief poem that's

been generated. The same applies to

embedding. So let me comment this out

and uncomment this. To generate

embedding, you import the embed function

from AI. You call the openi.ext

embedding model and you just pass in the

embedding model. The value is just

whatever you want to embed. So this text

will be converted to embeddings. I'll

click run again and we see the text has

been converted to embeddings. You will

see we have a full 1536 items

representing the embedding for the text.

So all we've done essentially is taken

the same concepts but instead of hard

coding it to OpenAI's way of doing it,

we have these generic interfaces. And

this is going to be very useful for us

moving forward as we begin to build AI

agents that may potentially want to use

one or more models depending on the

strengths and capabilities each of them

have. So that being said the basics I'll

see you in the next lesson where we dive

deeper into Brazil AI SDK.

All right. In this lesson, we're just

going to do a brief exercise to recap

the basics of using Versell AI SDK. So

the challenge is you are going to

essentially replicate the logic before

we're going to have a twist. So the goal

is to generate text and embeddings using

Versell AI SDK. You are going to

implement the generate text interface in

here. Uh pass it into a prompt to the

model to create a recipe for your

favorite meal. And then you're going to

return the generated text from the model

that is the recipe. Afterwards, you

going to have that value assigned to

text to embed. And text to embed is

going to be passed into the generate

embeddings function which is ready to

take that text. Then we're going to

embed that text using the embed

interface. So that's the objective of

this challenge. I'm going to give you a

couple minutes to do this. go ahead and

once you're done I will provide a

solution.

All right, hopefully didn't find that

too challenging. So let's kick this off.

So the first thing here if you remember

from the previous lesson is we want to

create an interface using the Versel AI

SDK generate text.

So we're going to have the text

extracted from the weight generate text

and inside here we are going to pass in

the model which is the AI model already

up here. So GT40 and then the prompt. So

uh create a recipe for making pizza. So

whatever your favorite dish is the idea

here is to ensure that it's it's

whatever your favorite. So say pepperoni

pizza. Okay, so now we're going to send

this off to the model. It's going to

give us back the text and I'm going to

console.log and say generated text which

we'll pass in here and then finally

return the text. Okay, so that way you

can uncomment this and now text to embed

should be the value of the text. So

that's phase one and then phase two is

use the embed interface to generate

embedding. So we will already have the

text to embed passed in as a parameter

into this function. So all we have to do

is extract the embedding from weight

embed the model which is openi.ext

embedding model and then you're going to

pass in the embedding model. So this is

the embedding model name value is going

to be the text to embed. So this is what

you want to embed which we've passed in

which is the generated recipe of your

favorite dish. And then here we're just

going to console.log the embedding

generated. Okay. So now we're going to

run and see what happens. We can see

generated texture. Here's a simple

recipe for making delicious homemade

pizza. Pizza dough. Teaspoons.

Sugar. Olive oil. Salt. pizza toppings

and so on and it's got instructions. We

take all of this text and we generate

the embeddings from the text and that's

it in a nutshell. So now you've got the

basics of how to use resell AI SDK to

generate text and generate embeddings.

In the previous lesson we discussed

briefly the basics of Versell AI SDK to

generate text and generate embeddings.

But one of the more useful use cases

especially in the cases of building

aentic AI apps is the ability to

generate structured output. So given the

model instructions to give you back

structured output. So then you can use

that for something else. Maybe pass into

another function that will call an API

retrieve data or pass into another chain

of the model so it can do something

else. Now, one of the useful interfaces

that Visel AI SDK provides to do this is

the generate object interface. This is

an interface that allows us to instruct

the model to generate structured output

in JSON format based on a predefined

schema. When we mean schema, we just

simply mean you're defining the data

types, the properties you want, and the

data types. Do you want a string, array,

objects, and so on and so forth. And so

for example down here we have a function

called basic structured output using

this generate object interface to

essentially instruct the model to

provide a structure JSON output of a

recipe for pizza with the schema of

object that contains name ingredients

and the steps for creating the recipe

and each property is defined by data

type. a string. Ingredients is an array

of objects with the name and amount and

so on and so forth. And we're using this

zed. Zed is imported from a library

called zod. Zod is a popular library

used to define schemas used to define a

structure interfaces. So we're just

simply accessing and using zod to help

us to define the schema. So z.object

simply means object. You're creating a

schema. You're basically saying I want

an object. Z.AR means you want an array

and Z.RAY with Z.Object inside means an

array of objects with name and property

as strings. So when we tell the model

generate a pizza recipe, we expect back

a structured JSON object that contains

name, ingredients, and steps. So let's

run this and see what happens. Okay, so

we're now running this. We've sent the

instruction to the model. Please

generate pizza recipe and give us back

structured output. And here we go. The

name which is a string which is what we

expected. The ingredients which is an

array of objects with name and object as

strings exactly what we expected. And

steps which is an array of strings. And

that's literally how you can instruct

the model to give you back a JSON in a

structured output. You can see how this

can be very useful to then instruct or

pass on to another step in your program

to invoke something else and create a

chain of prompts or a chain of actions

that's more agentic by nature. Another

use case for structured outputs is

classification. Now in this example, we

gave a plain instruction and left the

model to exercise some form of

creativity. But classification is

usually binary. So in this case, we want

the model to classify the customer

review using either positive or

negative. So if you pay close attention,

we're using a type enum. Enom is just a

special data type that enables a

variable to be predefined constants. So

you can have more than one or two or

even three or more enoms. But in this

case, we want the model to give us a

schema. Provide a JSON that contains the

reason for why it's deciding positive or

negative and whether the type is

positive or negative. We also provide

describe. Describe is just a useful

property added to the ZOD schema to just

help the model guide the model to know

what you're looking for. So here I'm

just further emphasizing that this is

the sentiment of the customer review.

And so again, we use the generate object

interface and provide the model schema

name, the description of your schema,

the shape of your schema, what

properties you expect back in the types

of those properties, and finally the

prompt. Let's run this and see what

happens. Okay, so here we go. Reasoning

the statement indicates satisfaction as

the app performed as anticipated and

type positive. And so we were able to

classify the text provided by giving the

model instructions using the generate

object interface to give us a structured

output and providing the enoms of what

we expect. And so you can start to see

how this can be very useful for

classifying customer reviews,

classifying customer requests, customer

service. The structured outputs gives us

the ability to provide more structure so

that we can take and manipulate data

types to do something down the line. So,

in the next lesson, we're going to go

over a brief exercise so that you can

put this into action and begin to put

this in memory as well.

So, in the previous lesson, you learned

how to create basic structured outputs

using Versel AI SDK. In this lesson,

we're going to do a brief exercise just

to get you used to how the interface of

generate object works and how Zod works.

So, the first goal is to build a Zod

schema. You're going to place a simple

sandwich order. All the instructions are

down here, including what each property

should be and what the data type of each

of them should should be as well and

steps of being provided. Once you've

completed that for this function, you

want to uncomment here and run. After

you've run, the next exercise is to

build a Zod schema for classifying a

short message. Again, all the

instructions have been provided here.

how to classify a short user's message

based on what's being sent, what the

message is. So this is the message that

we are using and you will complete this

as well. So I'm going to give you a

couple minutes to work on this and once

you're done I will provide a solution.

Good luck.

So hopefully that wasn't too difficult

but we will go through everything slowly

right now. So let's start off with the

sandwich order. As you can see per the

instructions, you can see the different

data types as expected for the required

fields to fill into Z.Object which you

will then pass in as the sandwich

schema. So let us begin. The first thing

is the size. So we want size and we want

an enum. Remember an enome is a

constrained data type. So we can just

pass in small, medium and large. And you

can also provide a description just to

help the model as well. Overall size of

the sandwich. The next is bread. So as

we said up here, bread is a string. So

that's pretty straightforward. Z string

and describe. This is the type of bread.

Example, wheat, white, etc. The next

data type is toasted. That's a boolean

type. So toasted Z.Boolean describe and

this is whether the sandwich will be

toasted. We have toppings array of

strings. So the data type has been

provided to you here already. So you

just pass in toppings Z array which is

an array of strings and then minimum one

at least one provided here. I think this

is complaining because we didn't pass

comma there. And describe one or more

toppings like tomato, potato, lettuce,

pickles, and finally the notes. So it's

just an optional string. So you're going

to do Zstring. Then you're going to pass

in optional. So you're just letting know

it's optional. and optional free text

notes like cut in half. And so the

prompt is here as we have provided

previously for you. We have the sandwich

order, a simple sandwich order, the

schema has been provided and the

console.log.

So we're going to scroll up here,

uncomment this and run and see what

happens. Okay, so here's the structured

output returned by the model. We can see

the size is small. The bread is

sourdough as we have the different

choices here. Okay, so small sourdough

bread toasted is a boolean says true.

Toppings an array of strings exactly

what we wanted and notes which is cut in

half. So that is the basic structured

output. Moving on to the next exercise

about classifying the short user

message. and to complete the message

class schema that has been provided to

you here. So let's start off with

reasoning. So reasoning is a string and

let's say describe and this is a brief

explanation for why this label fits or

however you wanted to describe that. So

classify users message. We're providing

a reasoning and then we're going to

provide the label. The label is an also

an enom. Let's put the comma here. So

stop complaining. And this enom is going

to take a compliment plaint.

So we've got the reasoning and then

we've got the label and let's also

provide describe so that the model has a

better description of what we want to

achieve here. So highle category of the

message. So this is the message schema

and then you got the message here. You

could also have a group of messages. So

you could have a loop for basically an

array of messages and then for each

message you can loop over and generate

the object for that particular message.

So now we have the generate object. Now

classify the user's message below. It's

a promise sent to the model. And now we

uncomment classification structured

output exercise. comment the basic

structured output exercise. And now

we're going to go up and run and we

should see the label is complaint. The

reason is the message expresses

dissatisfaction with the app due to

technical issue. So there we go. So

you've been able to implement two types

of use cases for structured outputs. one

where you're able to essentially provide

a prompt to the model to construct a

JSON of the data types you want for

multiple properties of multiple data

types and the second type which is more

focused on constrain classification. So

I hope this was useful keep practicing

and I'll see you in the next lesson.

So in the previous lessons we've covered

the basics of vers AI SDK generating

text generating embeddings we covered

structured outputs and how you can

effectively provide an interface that

allows us to instruct the model to

provide a JSON in the structure that we

want. But where is this all pointing to?

Where does this all go to? Well, it goes

to eventually building agents that are

able to execute what's known as tool

calls. Now what's a tool and what's tool

calling? If you've never used function

calling before, essentially tools are

just actions that the model can invoke.

So we take an instruction, we give it to

the model, the model provides some sort

of structured output and then we take

that structured output to invoke a

function. We pass that in as a parameter

of some sorts and then we're able to

invoke another function. And so the

results of these actions can then be

reported back to the model and then the

model can then generate a final

response. So, how does this work in a

nutshell? Let's use this basic weather

example to illustrate what's going on.

Remember, we've already covered using

the generate interface in the previous

lesson for structured outputs. And so,

what we want to do now is use a generate

text interface provided by Versel AI

SDK. And what this expects is model,

tools, and a prompt. Now, we've already

covered schemas and defining schemas,

constructing scheas in the previous

lesson. And so essentially we provide

the model which is already defined here.

Tools are the functions that you want to

invoke based on the structured outputs

the model generates. Remember in the

previous lesson we learned how to

provide a schema to the model and how

that schema would be generated as JSON.

Well, all of this logic is just

encapsulating defining the schema in

this tool. So this tool will contain

description. You want to get the weather

for a location. The schema that you want

the model to enforce. So you want the

model to the location as a string in an

object. And we have this execute

function. This execute function is then

called passing in the location into the

parameter to invoke this function. The

results of this function will then be

displayed. Okay, I'm going to run this

because I know it might not be too clear

right now, but hopefully this example

will make it make sense. So remember,

all a tool is is a function or an action

that the model can invoke. It is

essentially a way of taking structured

outputs generated by the model and then

extracting that to then pass in as a

parameter as we've done here. So we can

invoke a function. So we have the tool

call here and the tool call has an ID

and the tool name is weather. Okay, we

are trying to get the weather of a

location. The users provided a prompt.

What is the weather in New York? The

model sees this prompt and ex creates a

structured output with New York as

location. location is then passed in to

the execute function and this is

returned location temperature with a

random number between 50 and another

value. And so we can see here this is

the tool call which has the input as the

location New York because we provided a

schema as we learned in the previous

lesson and now we have the tool result.

This is the result of calling this

function. So all of this is happening

under the hood. First the model

generates the schema, then it's passed

in as a parameter. The functions invoked

and that's why we have the output as

location and temperature is this random

value. Okay, so that is the difference

between a tool call which is more of

just a structured input and the tool

result which includes the result of

invoking the function. So you can

imagine that this could be an API call

to your database or any other place. The

key point here is the parameter used to

invoke the function was a structured

output generated by the model through

the prompt that you provided. Okay. Now

what if you want to provide multiple

tools? Sometimes to get the results we

want it's not sufficient to just provide

one tool. So let's say in this case we

want a situation where we have two

tools. We want to first get the weather

in a location and then get the tourist

attractions in the location. Now all of

this should be familiar to you because

the schema is defined in a way familiar

from the structured outputs lesson. We

want to tell the model that we want an

object that contains city as a string.

We also want to execute this function

with the parameter passed in the input

schema passed in the city extracted and

we want the city and then we want to

return these attractions. These are

hard-coded values. And so if I go

further up here, let's run. And when I

ask the question, what is the weather in

New York? And what are the best

attractions to visit? The model is going

to see this. First of all, the first

tool, which is the weather tool, is

going to see schema and extract New

York's location, pass that in here, and

return this value. the second one which

is best attractions to visit. The

model's going to see that and realize it

needs to invoke the city attractions

tool. Looking at the descriptions, it's

going to extract New York as the city

and it's going to return New York and

the attractions. Let's see what happens.

Okay, this time instead of one tool call

and one tool result, we see tool call

one which has the location New York and

tool call 2 which has city New York and

we also see the tour results for both of

them. First we have the output location

New York with temperature 43 and the

second tour result is going to be the

city attractions tour result which is

the New York and the values here. So you

see how we've gone from generating

structured outputs to now being able to

take those generated structured outputs

and be able to use them to invoke

functions so we can get back data. Now

of course all of this would not be

complete without a way to generate the

final response from the model. Remember

the whole point behind retrieval

augmented generation is the ability to

fetch context outside of the model's

training data so we can pass it back to

the model in the prompt and generate a

more context relevant answer to the

user's query. So the whole point behind

this is not just to be left alone with

this tool calls and these tool results.

Perhaps like I said you've executed

called an API you've retrieved the

relevant information but now we need to

pass that relevant information to the

model so we can generate a final answer

and so this is where this last step

comes in and versel AI SDK provides

these useful interfaces as we've seen

previously the generate text the tool

interface and so on and now we are able

to use what's called the stop when so As

it says here, by default, tool calls

will just return the results of

executing the function, which is nice,

but we need to take that to the model to

summarize the tool results. What the

stop when property does is it basically

tells the model to loop over itself

again and take x number of steps beyond

just generating the tool results. So

after the tool results are returned, we

can then tell the stop when to basically

take another step to provide the tool

results to the model and that would tell

it to generate the results. So the way

to think about stop when is just how

many times to invoke the model in the

process that we're currently running. So

here we have three steps. So step one it

will invoke the tool results here.

Create the structured outputs here and

and then the execute will be called

which we've seen. This is the second

step. And then the third step that we

want is for the model to summarize the

results from the data that's been

returned from invoking these two tools.

So that's why the number three has been

passed here. And then we have other

logic to extract that. So let me run

this. And you can see we've essentially

extracted the tool results and then the

final result generated by the model is

what we can see here. So let's scroll

back. We can see we've logged the tool

calls and as we expected locations New

York, cities New York. The model has

basically seen the results of the tool

results and aggregators summarized what

it's seen in a more human way. current

temperature is 72 degrees and it

provides the tourist attraction. So this

is not too different from what we've

learned with embedding storing

embeddings retrieving relevant documents

passing as context the model generates

an answer. The difference this time is

that by using tools we can provide

custom functions. These functions can

also call the vector store to retrieve

embeddings of relevant documents or it

can call an API. It can call your

database. It just gives a lot more

flexibility for where you're retrieving

context to pass into the model. And you

can see how this allows you to build

more powerful complex and multi-steps

Gentic application. So in the next

lesson, we're going to go over some

exercises to help you have a better

understanding of how tool calling works.

In the previous lesson, you learned the

basics of using the Versel AI for tool

calling. As I mentioned before, tool

calling is essentially taking structured

output from the model and invoking a

function that you define based on the

inputs constructed by the structured

outputs. Now in this here in this

exercise, you are going to practice

this. We have the goal here to implement

a single tool which is essentially

fetching grocery item from this

in-memory data. We're trying to mimic

what would happen say if you had fetch

data from the database to get the price

and also implementing a situation where

you have more than one tool. So delivery

ETA as well and using the stop when in

the third case. So for each of these you

have all the comments. So you would

start with number one, uncomment, run

it, number two, read the instructions,

follow the todos, and number three as

well. So you should be able to go

through all three of them. It should

take you a couple of minutes to get

through them. Make sure to command S or

control S to save your changes and then

click run to run and see what happens.

So best of luck. Take a couple minutes,

do this, and I'll be back with the

solution shortly.

Now, let's begin with the first one. The

first exercise here, the first challenge

is create a price lookup table that will

take item string and return item price

using the price table. So the first step

is to define the tool with the Z schema.

And so we already have the tool here.

Now to define the schema in here. So

what we have to do is Z.object is

already open. So you're just going to

pass in the string type into this input

schema. Remember we want the model to

return an object of item and the item

being whatever it is we want to search

in this case milk. So we want milk to be

passed in here. The next thing that we

want to do is to go ahead and implement

remaining portion here. So this is the

execute function. So all you have to do

is uncomment this section here. We're

passing in the item in and then we are

going to return this function here. So

it should get the item and then price

which is then going to be from the price

table. We're going to pass in the item

passed in to lowerase just to make sure

and then fall back to null if we were

not able to find it in there. So now

we've got this price lookup here with

the tool execute and then we pass in the

item as a parameter and then we return

this item and the price from the item.

And so now we can pass in price lookup

into tools. So we just need to uncomment

this. And price lookup is being passed

in as a tool. And so when how much does

milk cost, it's going to extract milk as

the item invoke the function return the

tool result which will be what is

returned by this function. With that

being said, let us uncomment the first

one. Save and run. Okay. So as expected

we've got the tool call which has the

item as milk is the input into

subsequent function. As we discussed the

model was able to extract the item as

milk as per the schema defined in the

tool result which is now the price

lookup is going to have the output which

now includes the milk item and the price

for milk which is 1.59 and that

corresponds to what we have here. Okay,

so that's the case where we have one

tool call. Let's move over to the case

where we have two tool calls and we want

to introduce a delivery ETA that takes

address string and returns a pretend

estimated time of arrival. So here are

all the instructions again and we're

just going to follow the sequence. So we

have the tool interface description, we

want an item in string as we have and

here's the solution. And hopefully you

didn't see this before the previous one.

But now we've got this delivery ETA

which is the estimated delivery time. So

we need to pass in the address z string.

That's the address is going to be a

string. This is what we want the model

to enforce. This is the JSON we want

back. Inside here we're just going to

pass in the address. And then ETA is

going to be a random number as assigned

by this value. And then here just remove

this and pass in the address. So we

should get back from this function an

address and estimated ETA minutes which

is the random number generated here. And

then based on the question we should see

extraction of milk and we should also

see an address extraction as well which

should be returned back in this delivery

ETA function. So if we run this make

sure we first uncomment we should see

the tool calls and tool results. So

let's see what the model has called. The

first item it extracted was milk. Okay

we expect milk to be extracted because

we defined item and so it's seen the

milk and it's fetched tool call for the

milk. It also made a tool call for bread

because we also put bread inside the

prompt. And finally a tool call for the

address which was extracted from the

prompt. Now we have the tool results

milk and the price 1.59. We also expect

to see bread and the price which is

2.49. And finally the address and

estimated minutes which was calculated

by the tool function further down. So, I

hope you're getting the hang of this.

Essentially, all we're doing is using

prompt sending to the model, give the

model a schema so it can construct a

structured output, take the inputs of

the structured output, pass that in as a

parameter into another function, invoke

the function with those parameters, and

then return the results back to the

model to generate final answer, which is

the third and final exercise we're going

to do right now. Because as I said

before, having the tool results is

extremely useful. But if you want to

complete the process, we want the model

to essentially summarize the results

like we would do with rack. You might

not just want to retrieve the relevant

documents from the back store. You also

want to synthesize that with the prompt.

So you generate a final answer. And so

now we move over to the last challenge.

We want the module to call the tools,

receive the results, and then summarize

them in the second turn using the stop

when function. So, we've got the price

lookup tool, the delivery ETA tool,

which is the as we covered, pass in the

address, we're passing the item, and now

we want to enable step count to count

number of steps. Remember the model was

used to generate the tool calls once

twice and now we want a third step where

the model generates the final results

using what the tool results have been.

So models invoked once step one models

invoked twice step two and model is

going to be invoked a third time. That

is why we are defining here as stop when

the step count is three. The prompt is I

want to buy eggs and banana. Use tools

to check prices and tell me total cost

and estimate delivery time to 221

Baker Street. So we have the tools being

defined, the delivery ETA and the

results also being provided as well. So

we are going to call this and see what

the results are. And there we go. Final

summary. The total cost for eggs is

$3.29.

As we can see in the dummy database,

banana is 0.59. So the total cost is

3.88.

The estimated delivery time to the

address is 26 minutes. And this was

exactly what the user had asked for

based on the prompt. I hope you found

this useful. You can see how we've gone

from a basic prompt to generating

embeddings to the retrieval from the

veto store to then exploring the ability

to utilize tools to essentially extract

information from invoking functions. And

you can see how this is leading to

multi-steps where a model can be

involved in multiple steps to get to a

final outcome. sometimes involving one

or more functions based on inputs

provided by the model and then the model

at the end helping to summarize the

information. So I'd recommend you just

keep practicing so you get a good feel

for this and once you're more

comfortable you can go to the next

lesson where we're going to start

piecing this together in building the

final customer support AI agent.

So in previous lessons you've learned

about how to use the cell AI SDK to

create structured outputs, generate

embeddings, generate text, and also

introduce tool coding. Now we're going

to move closer towards the final project

of building a customer AI agent. The

first thing we're going to try to do is

create this basic agent routing

architecture. As you can see on the

screen, unlike the usual route retrieval

which we covered earlier in this course

where you embed the documents and then

the user asks a question or retrieve the

relevant docs and answer the question

which would be this typical path going

this way. We are now going to introduce

two branches. So it's no longer one

direction. We're going to have binary.

We are now going to give the model the

opportunity to classify if the user's

question requires a retrieval. So if it

does require retrieval, we will continue

to do what you've learned so far in the

early parts of the course with

retrieval. And if it doesn't require

retrieval, we'll just answer the

question directly using the model's

trained data, whatever the model's been

trained on. So let's go into the code.

Now what I've done here is introduced a

couple of things. The first thing I've

introduced is replacing all the

primitive OpenAI library functions with

Versel AI SDK. So when we ingest or

upsert the documents, we are using the

embed and the structure vers AI SDK

expects. The same thing with retrieval

as well. Again we embed using embed from

versi SDK instead of using the openi

primitive because when we export from

config we are using the create openi

client from the SDK not from the openi

library. So these are the key changes

which of course lead us to using

generate text which is the interface by

versel AI SDK. We've also introduced

some new variables in the constants

file. For example, the classification

model is what model we're going to use

for classification to decide whether to

perform retrieval or not to perform

retrieval. So in this case, I'm using

the same model for both. But you're

going to find a lot of use cases where

you might want to use different models

for each of them depending on the

strength of the model. In here we've got

knowledgebased description which I'll

come to in a second once I finish the

quick tour. We've got all the different

functions I'm going to showcase here

which I'm going to uncomment one by one.

And then we've got a prompts file. Now

this prompts file contains all the

prompts used in this example from the

prompt used to retrieve to the prompt

used to classify and so on and so forth.

I'll come back to this in a second. The

other introduction here is the docs. Now

in the docs folder I have extracted the

markdown various pages that scriber has

in it help desk. So it has multiple web

pages for providing FAQ and answers to

usual support questions. So what I've

done is I've scraped all those web pages

converted them into markdown which is

friendly for models and embeddings. And

so when we run this ingest documents

function, we're really just looping

through each of these, extracting the

text and embedding each of them. And

then for each of the embeddings and the

text, they will get inserted into the

database. So I'm going to embed and

insert into my database a snippets of

some of Scribbers's help documents. And

once I've done that, we are then going

to perform a basic retrieval and then a

classify and retrieve. This is kind of

more agentic as per the diagram I showed

you earlier. Okay, so let's start with

ingest documents. In the previous

retrieval lesson, you learned how to do

this without Versel AI SDK. As I said,

the only difference is I've replaced

with the embed interface. And there is

no splitting and chunking here because

we're just looping through each file.

There's no need because each of them do

not contain anything substantial per se.

So we're going to clear all the contents

of your database and then for each file

here we're going to loop over them embed

and then insert and then have a metadata

property which is going to have the

source as the file name. So this is

going to be very useful down the line

when you want to uh inspect the source

of where the chunk is coming from or you

want to display in a UI and then we're

going to insert all of them. So we're

going to run this. Let's go back to

index.js.

So here we go. Inesting documents from

the directory. We loop over each of

them. Read in the characters of each of

them. And we've successfully uploaded

nine documents. Two superbase table

documents. Okay. As per what we did in

the retrieval lesson. And this is what

one of the objects looks like. You can

see the metadata points to the actual

document. Then of course we have an

embeddings property, the text and the

metadata for each of these files and

that's all being put into the database.

So we're familiar with this. Comment

this out. Let's jump over to basic

retrieval. Now I've got a question here.

How do I access Scrimmer Discord? Now

what we expect to happen is for this

function to kick off. First of all,

retrieve similar docs is going to take

the query, embed the query, and then

send the query to superbase remote

procedure call through the match

documents database function which we

created earlier on in this course to

retrieve the relevant docs based on the

query. So that's what we expect to come

back. So when we get this back, we're

going to have these documents, the

relevant documents. We're going to pass

it to this combine documents function

which is essentially just going to

extract the text and combine all the

text together. We create the rag prompt

which you saw in the previous retrieval

lesson essentially saying below is the

context answer the question based on the

context provided and then we're going to

generate the text again using the Vel AI

SDK interface and console.log the

generated text. Now, if I go here, we've

got one support document about linking

your Scrimmer account to Discord and

another one in how to join Discord. So,

the expected behavior is that we are

able to retrieve the relevant docs from

the Vex store to answer the question and

then the model is able to provide a

relevant answer to the user. So, let's

run again. There you go. So to access

Discord use this link and that was in

alignment with what we saw here as well.

So it was able as you can see look at

the relevant doc. So we've got

similarity score 0.48 for this chunk

which is the can I download the code in

a scrim file. So it was able to pick up

on that. It also picked up on this chunk

here linking your scrim account to

Discord which I showed you as well. And

all this content was passed as context.

And finally, how do I join Scrimma

Discord, which had the highest

similarity score. So if you remember in

the first uh retrieval lesson where we

covered match thresholds, you can

actually set a threshold at the point of

retrieval, there's another property here

that will allow you to set a minimum

threshold. So you could say only

retrieve documents that have a

similarity score greater than or equal

to a certain number. That way maybe we

would cut off this one and only focus on

the top two. But I left everything here

for you to see that what we expect to

happen is exactly what's happening. So

if we go back now we want to go to the

last section. This is the classify and

retrieve function. So what's going on

here? This is a bit more involved than

what you saw previously. So I'm going to

take this slowly. So essentially the

main step is that rather than just send

the query directly to do the embeddings

in the retrieval, we want the model to

exercise some form of agency. We want to

classify the prompt first. So we want to

give the model the prompt and then let

it classify whether or not it's a

general question. If it's a general

question, we let the model just answer

the question directly using its own

training. Else if it's a retrieval, then

let it continue with the process. Embed

the query and call the remote procedure

call to get the relevant docs and so on

and so forth. I've also introduced

what's here as a fallback. This is a

situation where for example, we don't

actually retrieve any relevant

documents. So we will just send the text

to the model to just generate an answer

directly. So provided that we do get

relevant documents back, we will

continue to go through the process,

combine the documents, get the rag

prompt and so on and so forth. Something

else that's pretty interesting here that

I've introduced is we will also map over

the retrieve sources and then give them

a type as well as where the source is

coming from. So this would put more

structure into the retrieve doc. So

you're going to get the answer and the

sources from where the answer was

generated from. If we essentially are

run into some sort of error again, we

will just default to generating a

general answer. Okay, so that is the

classification in a nutshell. Let's go a

bit deeper into looking at the prompts

that's kind of driving this behavior.

First of all, the classification

prompts. This generates a prompt for

classifying the retrieval or general. We

pass in the question and knowledgebased

description which is in the constants

here. Scrimber an online platform for

learning to code. We are going to pass

that in here. And this is the prompt.

Classify the user's question. Goal is to

use a customer's support knowledgeable

base about scrimber. If the question is

about scrimber itself or it's cod and

technical and and so on. If the question

is clearly off topic, respond general.

Otherwise, respond retrieval. And now we

send that question here. Classification

is an open colon. And we send that off

to the model. So this is the prompt to

classify. As I was saying previously,

you could actually put this in the

context of a structured output

instruction to the model. So it's an

enom. It returns either general

retrieval, but I'm keeping it simple

here for the sake of your understanding.

Then we've got get general prompt. So

it's just a direct prompt. Answer the

following question concisely and the

question is passed in. And we've got

fallback. And we've got the rag prompt

which you're familiar with your helpful

assistant. Answer the user's question

based on provided context and so on and

so forth. So pretty straightforward. But

all we're doing is classifying the

prompt, generating a retrieval prompt or

a general prompt and sending that to the

model to provide a final answer. A few

other things to note here is the use of

max output tokens and temperature. Here

we're just limiting the amount of text

returned. We really want to force the

model to essentially say response or

general. And so 20 tokens is enough to

do that. Temperature controls the

randomness of the output. How much

creativity you want the model to have.

In this case, I want zero because I just

want retrieval or just give me general.

I don't want the model to express any

creativity whatsoever. So that is the

reason for the introduction of these two

properties and their values. And with

that being said, let's jump back to the

main section. And I'm asking the same

question as well. And the behavior I

expect to see is a log for retrieval.

And so that that's exactly what

happened. How do I access Scrimbo

Discord? We classify the question. The

questions been classified as retrieval.

We perform retrieval. We generate the

embeddings. We retrieve the chunks and

we generate the rag answer. And that's

the answer that we get. Now you can see

the retrieve docs here. These are the

source documents. So you can see ID,

content, metadata, similarity score and

type which was all defined in the

agentic retrieval file. Now let's say I

completely change topic. What is the

capital

of France? Okay. Now what we expect is

different behavior where the

classification should be general. So

let's run this. And now we're answering

a general question and the generated

answer is Paris. The retrieve docs is

null because we never went to the vector

store. So you can see how we've gone

from asking a query. We go we perform

retrieval and it was this kind of

one-dimensional approach to now we've

introduced more agency where we're

giving the model the opportunity to

classify queries and now it can route in

different directions based on what the

query is. So in the next lesson, we're

going to go deeper into this so you can

see how this routing can build up to the

final customer support AI agent.

The previous lesson you learned how to

route the agent's outcomes in different

directions either towards retrieval or

to answer the question generally. In

this lesson, we're going to do a brief

challenge where you are going to

refactor the logic for the agentic

retrieval to introduce structured

outputs which you learned a couple of

lessons ago using Verscell AI SDK. So

we're going to go to that file where the

agentic retrieval logic is and follow

the instructions to use the generate

object interface to create structured

outputs so that we can classify whether

the question or the query as you can see

here is retrieval or general. So what

you're going to do is hop over to aantic

retrieval and we have the classification

prompt here already done decision

already assigned to classification which

is extracted from the object here and

all you have to do is fill in the gaps

here construct the schema property

utilize zed everything you've learned

including the enom types from the

structured outputs lesson and then also

add a property for the prompt sent to

the model. Once you've done that, make

sure to control S, command S, save your

changes and then run. So, I'll give you

a couple minutes to do that and then

I'll come back with the solution. Good

luck.

So, let's work through this slowly. We

know we've got the model already, schema

name, schema description. So, the next

natural thing that we need to cover here

is just the schema. And so, remember,

we're using Zod. So we need to construct

the object that we expect to get back.

The first thing I usually recommend is

you pass in the reasoning. So you can

see what reason the model wants to give.

And let's describe and say we want a

brief reasoning for the classification

choice. Why is it going to choose

retrieval or general? The next thing we

want to do is define the type and zenom.

And now we're going to pass in

retrieval. And we are also going to pass

in general. So let's go here. Let's pass

in general. So we've got retrieval and

general as the enom types. And then we

can also add a describe again to help

the model where we are going to say is

the question general or does it require

knowledge

base retrieval? And finally number two

add a property for the prompt sent the

model. We will just go down here and we

already have the classification prompt

here which we've already constructed and

put a comma here and voila. So I think

that should be that. So we've already

constructed all of this here. If we go

back to index that's been imported ready

to go. Click run and we should classify

the question as retrieval. As you can

see the reason here is the question is

about scrimmas discord which is directly

related to scrimmers platform and user

support. We've got the type retrieval

classified and we generate the answer as

expected. So practice this a couple of

times and you're ready. I'll see you in

the next lesson.

In previous lessons, you learned how to

use the OpenAI SDK to generate responses

based on a prompt provided. This was

using the in-built OpenAI responses API

and Versel AI SDK was able to wrap

around that and provide a simple

abstraction over that functionality.

But aside from just generating text, the

responses API also allows you to file

search or web search or computer use

interact with different web pages. you

would essentially be able to invoke that

tool directly using OpenAI's inbuilt

functionality and get the latest results

from a web search and then you can pass

the results on to the model. So you can

see an example of return data from the

web search tool. Let's dive into the

code. As you can see, we've imported

OpenAI from Versel AI SDK. We have the

generate text interface for generating

text as we have covered previously and

then we've got the model and the

question what is the latest open large

language model and as of today it is

GPT5 gpt5 is the latest large language

model by openai first of all we're using

openi door responses and then we're

passing in the model and secondly and

most importantly we are passing in tools

with web search preview and

openi.tools.web

search preview with this brackets in

here. So there's nothing passed inside.

So this is just the syntax and what

happens under the hood is visel AI SDK

will then interact directly with the

openi APIs in order to invoke the tools

and then the model's going to generate

an answer based on the tool results. So

essentially this is an abstraction over

a lot of the functionality we've

discussed previously with tool calling

and returning the tool results and

having the model generate the answer.

It's just been abstracted and

simplified. So if we click run all right

so we can see the text as of September

2025 open latest large language model is

GT5

and on and on and on. So this is the

text generated as a result of context

from the web search passed onto the

model. If we scroll further down and

look at sources, you'll see what was

fetched from the web. So you can see the

sources type, URL, id title and so the

contents of these different URL were

scraped passed as context and the model

saw the scrape contents of these web

pages and also saw what the question was

and based on the question generated a

final answer. So this is essentially the

same mechanism as we saw in retrieval.

we're still performing a sim similar

behavior. We're going somewhere. We're

retrieving context. We're passing that

context back to the final prompt that's

sent to the model. And so this is

essentially how we can incorporate a

tool for web search. And you can see how

we can build on this and augment

pre-existing

agents with multiple tools and add one

of those tools as web search. And so

we're going to see how this works in the

next lesson.

So in the previous lesson you learned

about using the web search tool provided

by Vel AI SDK and OpenAI under the hood.

It's the open air responses API being

used as a tool to retrieve information

from the web in real time. Now, in this

lesson, we're going to look at how to

combine the entire mechanism about

retrieval, which you learn going to the

vex store, retrieving relevant

information to answer the question, as

well as the web search tool. So, the

agent is able to decide which one to use

based on the question. Now, what is new?

First of all, we have taken the entire

logic for retrieving and instead of

having a standalone file where we run

the retrieval, we've converted it into a

tool. Remember, a tool is essentially an

interface provided by VEL AI SDK that

allows us to define what exactly we want

to invoke based on structured output

returned by the model. Here we will

provide a tool that is going to retrieve

relevant information about Scriber based

on the user's query. And so that query

is then going to be passed in to do

everything else you've learned in this

course. We embed the query, then we

query the Superbase database, and then

we return the retrieve docs. So

essentially everything you learned about

retrieval is now wrapped inside this

async function in execute in this tool

interface. Now if we go to web search

retrieval agent we have two tools. The

first tool is the knowledgebased tool.

So we've converted the entire veto

retrieval mechanism into a tool. We also

have the web search tool which we

covered in the previous lesson. And then

we pass in the tools. And then we've got

stop when which was also covered in

previous lessons with the max stop steps

of three. So the model is going to

iterate three times to generate the tool

results and finally pass in the tool

results into context and then generate a

final response. We've also introduced

this new prompt function get retrieval

web search prompt. Here we pass in the

knowledge base which is Scrimber, an

online platform for learning to code.

And if you look at the prompt here,

essentially just saying you're a helpful

assistant and your primary goal is to

answer the user's question accurately.

And we just provide conditions. Use

knowledge base search if this. Use web

search if you need real-time

information. And then answer directly if

you already know the answer. And so this

is just a prompt to encourage the model

to return the appropriate tool based on

what is required. All right. Further

down we just have different logic to

extract the sources. If it's a web

search based tool call extract the web

title URL. But if it's knowledge based,

if we went to retrieval step, then we're

going to access the steps property

inside results, which is the steps that

were taken by the model invoking each

tool results. And then we're going to

check if it's knowledgebased search. If

it is, then we're going to do some extra

extraction and eventually we're going to

get the retrieve documents which was

passed in through the knowledgebased

tool. This is what was returned here.

And so once we get the retrieved

documents, we're just going to extract

all the stuff that we had embedded and

insert into into the database. So the

content, the metadata and similarity

score, which can then be used later on

for filtering or removing retrieve docs

that are below a certain threshold. Once

we're done, we're just going to return

the answer, the sources, and tool used.

So with that being said, let us run

this. Here are the response to sources.

So we can see the similarity score and

the documents retrieved from the vex

store. And then you can see the

generated answer here. And this is all

based on context. But most importantly,

look at this. It was able to identify

that this was a question that required

retrieval calling the knowledgebased

tool. And that was how was able to get

the relevant context and then answer the

question. Now if I switch this up and

instead of the retrievalbased query, we

then ask the web search query, what is

the latest opening our large language

model pass to the web search retrieval

agent. Let's see what happens. So I'm

going to click run again. Okay, so here

we go. So we receive the question, we

generate the text and then we see web

search sources were found and we've got

the full generated answer by the model.

And then if we go further down, we can

see the retrieve docs, which is the

different URLs that were scraped to get

content that was then passed as context

to the model. So you could see how we've

gone from learning about vex store

retrievalss as a standalone action. Then

we went further on to learn about

structured outputs, the faceli SDK,

talked about web search retrieval using

the responses API. And so with that

being said, in the next lesson we are

just going to do a brief overview and

exercise so that you can put this into

action.

So in the previous lesson you learned

about using tools for web search and

retrieval together. So the model is able

to decide based on the nature of the

user's query whether to use web search

or retrieval to answer the question. So

now I want you to take this challenge in

the web search retrieval agent file.

You're going to implement the missing

tools logic to add the tools and then

complete the empty generate text

function. So if I go in using memory try

to add the two tools. So one for

knowledge base and web search preview

and then here you want to fill in the

generate text properties. So all the

properties involved including the model,

the prompt and so on. I'm not going to

say all of them, but just try your best

for memory and I'll be back in a couple

of minutes to provide the full solution.

Good luck.

All right, I'm back. So, let us work

through this. The first thing is we need

the knowledgebased search. This was the

name that we provided in the previous

lesson. And we're going to pass in the

knowledgebased tool which is imported

from here. As far as the web search, we

want to make sure we use this exact

syntax, webc search preview. And then

you're going to pass in the tools

web search preview. And remember, you

open and you leave it this way. This is

the exact syntax that's expected from

the Versel AI SDK. So now we've

completed the tools but we now need to

fill in the generate text property so we

can include a tool. So the first thing

is we need to pass in the model

responses and then you want to pass in

the tool calling model. You've already

defined this up here. The next thing we

want to do is pass in the tools so the

model has access to the tools. And then

stop when remember you want to step

count the number of steps for the model

to iterate over and generate results and

tool results as well as the final

response. We also pass in a system

prompt. This is just kind of to give the

model general instructions to follow.

You can either pass it as part of a

string in the prompt but we just

separated it for good abstraction and we

pass in the knowledgebased description

and finally we pass the prompt. This is

the question that's passed in as a

parameter in here. So we save all of

that back to index.js run the function

and voila we get the sources URL and the

full text as well. So that's it for this

exercise. Keep practicing and I'll see

you in the next lesson.

We have now come to the final lesson, an

overview of the Keystone project which

is the customer support AI agent. As

discussed, this is trained on

Scrimbridge's help center articles which

have been embedded and stored in the vex

store as we've covered in previous

lessons. And so, for example, if I ask a

question, how do I access Scrord?

We should see able to access the

knowledgebased tool as we covered in the

previous lessons. Retrieve the relevant

documents and display to the user. And

of course, we can click and view sources

and see the different chunks or the

different documents that were retrieved,

the similarity score and so on and so

forth. So this is a UI display of a lot

of the concepts that we've discussed

previously. If I ask a question which

requires real-time information from the

web because the support documents do not

contain them, I can do the same thing.

So, how do I resolve the error node

module not found? And again, similar

concept, but this time it's going to go

and rather than retrieving or using the

knowledgebased tool, it is now this time

going to go ahead and perform the web

search. So you can see it's able to

retrieve the answer to the question

based on web search information. And so

you see this dual choice between the

model between deciding should I answer

this question using my retrieval tool go

to the vex store or web search. Now

before we walk through the actual code

let's look at the architecture. So

essentially all this going on here at a

very very high level is the user asks a

question then we have some way to tell

the model hey check if this requires

retrieval if it doesn't require

retrieval of any form whether it's real

time or vexal retrieval just answer the

question directly

if the question requires retrieval then

we want to know if it's real time

information required because if it's not

real-time information then we can go

check the vector store to see if there's

sufficient context to answer the

question. If however it requires

real-time information like the question

I asked or if I ask something to do with

a latest package or a library that I

want to use in this context then it will

go to the web to get the latest

documentation for that particular

package and so this is the architecture

we have in place. So if we go back

essentially just to walk through the key

changes or new things that have been

introduced that you not familiar with

previously. The first thing as you can

see in the screen is we have an express

server and this has been installed here

and this express server essentially is

going to serve as the API route from

which the UI when it's clicked then

JavaScript is going to send the text to

this API route as a question and then

we're going to use the web search

retrieval agent which I'll show you in a

second and so it's listening on port

3000. So that's what the server.js file

is doing. Config is pretty much as

you've seen previously. And then we have

constants also having the constants that

you saw in previous lessons including

similarity match count which is number

of retrieved documents from the vex

store to return back and the models to

answer classify and so on. We also have

an index.html. This is the HTML of what

you can see here. And then we have the

style.css which is design and the

client.js which is the JavaScript

that is listing for clicks to ask and uh

when it gets the click it will assist in

showing the sources and sending a post

request of the question to server.js and

then server.js returns back the answer

and then we have a bunch of processing

going on here as well. The key new

introduction to walk through is the web

search retrieval agent. So what's going

on here? As we covered in the previous

lessons, we have two tools, the

knowledge base, the web search preview,

and as you can see, we essentially pass

in those tools to the generate text

interface by VEL AI SDK alongside the

number of steps you want to take the

prompt which is coming from here. So you

can see we've got instructing when to

use the knowledgebased tool, the web

search tool and when to answer directly

passing off the different sources based

on the tool called and then we return

the answer, the sources and the tool

used. And this is the knowledgebased

tool which we covered in previous

lessons in a separate file for handling

the embedding of the question and then

fetching from superbase the similar

documents. And so that's it in a

nutshell. This is the complete workings

of a customer support AI agent that is

able to apply some level of intelligence

based on tools provided and based on the

prompt decide what tool to execute or

not to execute a tool at all and just

answer the question directly. So you can

see how you can expand on this. You can

add more tools for different things.

Maybe another tool for a particular API

that can retrieve information. Maybe

another tool that performs calculations,

maybe more niche tools that are specific

to different capabilities. You can

expand on this. So, don't stop here.

There's so much more you can do to build

on what you've learned in this course so

far. Use your own support documents or

even another completely different use

case, but using the similar concepts

you've learned throughout the course.

So, I hope you've enjoyed this course so

far. Keep practicing and good luck.

Right, congratulations on finishing this

course. Let's recap what we've learned

so far. You've learned how to create and

store document embeddings of support

documents into Superbase as a vector

store. Then you learn how to build a

retrieval mechanism to question answer

these documents retrieving the relevant

documents to answer the question from

Superbase where they're embedded. Then

you explore the basics about Versel AI

SDKs APIs to generate embeddings,

generate text, construct structured

outputs, and then execute tool calls

based on whatever actions you wanted the

agent to take. You also learned how to

utilize OpenAI's web search tool and the

function calling and to integrate that

as a tool for the agent to implement

when he wants to get real-time

information. Finally, you learn how to

build a customer support AI agent that

can intelligently classify a user's

query and determine whether to answer

the question generally perform retrieval

or web search. Now, with that being

said, my suggestion for a stretch goal

is to refactor the final project for

your use case. Now, perhaps you don't

want to do customer support use case. So

take the concepts of tool calling using

the Versell AI SDK and apply it to

something else. Maybe you want to build

your own travel agent. Maybe you want to

build your own coding agent or an agent

that will do something else that's very

useful. You can apply a lot of the

principles you've learned in this course

to do that as well. So with that being

said, I wish you the best of luck and

thank you for taking this course.

Cheers.