Loading video player...
Thank you for the for the introduction.
Um,
ideally it was not gonna be me talking
today. Um, so I'm here as a replacement
forana. I don't want to steal her
thunderish. These are her slides. So
she's still here with us. Uh,
>> she's alive.
>> Yes. Yes, she she is. She is.
So the idea for today is to think about
what kind of challenges we face when
doing what we traditionally call
knowledge work and how can we use AI to
solve those challenges.
And to do that, I I'm going to take this
example probably to its extreme. Um,
and think about an invoice.
We all use invoices in our daily lives.
Most often when we buy a coffee from our
local coffee shop and we might proceed
to throw it away. But for a lot of
business applications, keeping track of
invoices is quite important.
And one use case that we see often and
often it's the problem of invoice
reconciliation. That is I have a
contract that says um this company is
going to provide these services over
let's say a year and every month they're
going to send me an invoice. Now
most of the time there's humans involved
in creating invoices. So an invoice
might not exactly be what I expect in my
contract. And I want to be able to check
those invoices to make sure that they
are what we agreed upon. This is one
example that we see a lot in the
industry where we have these lots of
providers, lot of contracts, lot of
invoices and we want to make sure that
all these invoices make sense. So how do
we approach a problem like this
um
with AI?
And so it's it's not trivial that I can
just throw the contract, throw the image
and say, is this correct? Like we we
would love that to be the case, but
there are some nuances that make this
process a little bit harder.
So let me run through an example on how
this may work. Here we have uh I have
uploaded a couple of contracts. I have
uploaded a couple of invoices. And so
one of those contracts might specify the
price or the delivery date for
particular items,
but they don't match the invoice. And I
want to be very quickly be able to
highlight
what's going on. So that that's sort of
what it looks to to do invoice
reconciliation. And if you're interested
in looking at a demo uh more closely, we
are having a demo later today. So do
come to the to the demo.
So
what what makes this problem slightly
harder than what it seems? Why can I
just show or throw my contract my
invoice to judge GVD and ask like is
this fine? And it turns out that most
LLMs or textbased we are now getting
multimodal LLMs but contracts are very
comp PDFs are very complex structures um
made for human understanding where
layouts, images, plots um or are very
easy for a human to understand but not
as CC for um an LLM.
And so
there's there's a lot of work that needs
to be done to extract all this
information from these documents before
they can be they can fed to an LM. And
that's a crucial part of getting these
applications right. Making sure that
whatever you're feeding your large
language model, it's exactly what they
need to solve this problem. And that's
where Llama index comes in.
So how how might we solve this problem?
And this is like a very rough
approximation. I'm going to show a more
complete example um in a second. The
idea is that we might first upload a a
contract and then that contract is pors.
And so if I I do this every day, so I'm
very familiar with this, but the idea is
that we have a PDF and we're going to
extract text from that. And we often end
up with a markdown representation which
is basically plain text of that
contract. That's what I mean by parsing.
Um for this particular application I
might need to find specific clauses on
that contract. Um and contracts might be
very large. So I need to do look up of
different parts of the text. This can be
achieved for example using rag. And so
for these particular applications once
we parse this contract once we have this
text representation of the contract we
are going to throw it into an index uh
into a vector database. We have llama
index uh in lama cloud for this and we
have the first step of this process
ready which is processing the contract.
And again, contracts that look like this
might have um an address somewhere on
the top. It might have a table. Again,
these are complex structures that are
easy for humans, but not necessarily so
much for lens.
And this is how a contract that sparse
might look like. You can see the table
there at the end. I can is how nice.
Then again here
once we have the contract then the next
step it's the invoice. What do I need?
What might I do with the invoice? First
step is always parsing again. Uh we need
to get all this information as text so
that we can do more interesting things
with it. Um but for invoice we actually
need a lot more details. We need to know
for example the name of the company and
we need to find the name of the company
that invoice. We might need to get all
the line items and exactly what was the
amount for each line item. Um
so for this we have a
a product called pars that I'm going to
mention in a second
that allows us to basically solve this
step at once.
So what is llama index and what do we
do? Um
we provide at the very core level
these building blocks that are necessary
for doing knowledge work with AI pores
extract classification of documents. All
these are essential tools that you will
need eventually if you're building
applications that involve documents.
You might know us better for our open
source framework
um that builds upon these tools to build
complex AI um applications and AI
agents. And we are starting to release
agent templates
um that allow you to very quickly
prototype u complex agentic applications
that use documents as their main input
and transform them into whatever you
need to build. for example, in
reconciliation. So the idea is that the
example that I show,
you should be able to get started with
one of the templates and in a very few
minutes get up and running with a
properly agenda application that
ingests real documents.
So about
the core building blocks, I think that
llama is is really
something special. It's the building
block that allows you to go from PDF to
text. Um, and there is a lot of
optimization behind the scenes to make
sure that we understand layouts, that we
understand images, that we understand
tables, that we understand charts. And
so, regardless of how complex your
document is, there's a good chance that
you're going to get much better results
by parsing it first and then using an
LM.
Now if what you need is structure
output. For example, for an invoice I
need line items like what was charged,
how much was it, when was it, then I
usually want a JSON output for my task.
And here is where extract shines.
Extracts allow you to specify a schema.
These are the things that I want to get.
and
ingests your document, parses it behind
the scenes and then gives you the
structure output that is essentially
what you want.
And so here you can see we have a
feeling from Nvidia.
I might only care about certain specific
fields and so I can just specify that as
a schema and I get that um those values
only.
And it's very easy to get started. You
can build your own schemas, but we have
a a few that are prefilled for you to
use. We also have a mode that
automatically suggests a schema for you.
So again, it's very easy to to get
started.
If you're a developer, you can also go
the programming route and you can do
this by specifying a py schema here. The
key is that these descriptions field
here are what the LM will use behind the
scenes to understand what's going on.
And so they're really really important
and so if you skim on this results might
not be as good
but again it's just a couple of lines to
get sorted. Uh also if you go the code
route um we are going to share the
slides after the talk but there is a
couple of QR codes um here. This one
takes you to tutorial on how to use
extract for these feelings.
Um now
if you want to build
AI applications,
agentic applications at some point you
might need to start putting all these
building blocks together. And so
in the ideal world, we just write agents
or function agents where we specify a
list of tools and
the agent automatically knows what tool
to call in what order to get the right
output and that works great. In
practice, we sometimes have specific
logic that we want to make sure is
executed in the right order and some of
those decisions will be agentic. We
might have agents that solves particular
tasks, but we want a specific flow um of
our application. And here is where agent
workflows shine. One of our core
open-source contributions is the idea of
agent workflows that allow you to
build any application specifying
different steps and how those steps
connect. Um the idea is that this is
event driven. is built for agents. Most
of our agent abstractions are built on
top of this. Um, and this is a great way
to build agenda applications. This is
what a the simplest workflow might look
like. We can define steps. These steps
just print and do nothing. And then
steps communicate by specifying events.
So first step emits an event. Step two
wait expects that event. And so when it
sees it's firing from this, we'll
execute. And this way you can build a
very distributed application by specifi
specifying when this thing should run.
This is really useful when you're
processing documents where a task might
take a minute for example and this other
task needs to fire only when this is
done. Uh this manages all that for you
uh without uh been blocking.
And so we also have tools to visualize
some of the workflows. And so if you run
this workflow, you might get a
visualization such as this one.
Back to our invoice reconciler. If I
were to implement this as a workflow,
this is what this workflow might look
like. I have a step that uploads a file.
Um, I might have a classification step
that we have a classification model that
decide if this is an invoice or a
contract. If it's an invoice, I might
emit an invoice event and I'm going to
sort of start the workflow that deals
with invoices. So, I'm going to parse
the invoice. I'm going to
extract it. Um, I'm going to go back to
the contract and I'm going to reconcile
it. If I have a contract, I'm going to
get the name from the contract. I'm
going to index that contract. I'm going
to say that contract was indexed.
And when I'm reconciling an invoice
here, basically what I have it's the
extracted invoice. I have the extracted
contract and I'm going to use a
structure output lm. This is not
necessarily an agent. Uh and say this is
what my output should look like. um from
the PORs invoice and from the PORs
contract try to find if there are any
discrepancies and this step could be as
complex as as you want it to be.
We are also building tools to make it as
easy as possible to deploy agent. So if
you build an agentic application with a
workflow, we want to make sure that you
can deploy it very fast. And so in this
case we developed a tool called llama
that is the one that has all these
agentic templates. So if you type lacl
in it this will
very quickly um allow you to generate
one of these templates and then you can
deploy it you can run it locally but you
also get all the code. So if you
actually want to make any changes to the
application, you want to change the
schema, you want to change the UI, uh
everything is there um for you to to
customize.
We are going to be doing a demo of
exactly this and how this works and how
you can get started deploying an agent
on the demo later. So I think it's 5:30
demo booth is over there.
Um,
finally
we wrote a paper on how to use lama
index with weight and biases. You can
find the link to that blog here uh if
you're interested.
I say we is was but again I'm still in
her talk so I I get to do this again. Uh
if you're interested in how
when to use pores, when to use extract,
um what is OCR and why we might need
pors and not just OCR and a document,
these are two interesting blog posts
that will be linked with the talk. Uh if
you're interested in getting started
with agents, this is your main QR code
that will take you to
the demo how to properly build these
applications.
And this is it. We are, as I mentioned,
we have our demo and we have swag. So,
if you're interested in one of these
very cute hats or have any questions
about Llama Agents, happy to approach me
or Mutasa there, just search for the
hats and we're happy to exchange uh one
of these hats for your time.
All right.
Thank you.
In this session from Fully Connected London, Diego Kiedanski, Founding AI Engineer at LlamaIndex, covers how common knowledge work is being automated with the latest AI technology. Most human knowledge remains locked in complex documents and file types like PDFs, tables, and content with irregular layouts that still hold valuable context. Diego explores parsing and extracting technologies that work alongside AI agents to make truly automated knowledge work a reality. He also introduces LlamaAgents, a new framework that allows you to serve and deploy these assistants at scale.