Building Decision Agents with LLMs & Machine Learning Models | DailyDevLists

Loading video player...

Full Transcript

4,985 words • EN

So decision agents are an essential component in agentic AI if you're going to solve large,

complex problems. The challenge is that if you have a complex decision that you need to make

autonomously, and if you're building agentic AI, you're going to need decisions to be made

autonomously. Ah, but these decisions are not a great fit for large language models. And large language

models, of course, are the sort of key technology in the agentic AI, but they're not a good fit for

decision agents. So you need to build decision agents in your agentic framework, but you need to

use a technology other than large language models. So why aren't large language models a good

fit? Well, let's think about some of the things they are famous for. They're famous for

inconsistency. They might do the same thing every day and then suddenly one day do something

different. Well, that's not great. If you're trying to make a decision, you really need people to be

treated the same and not vary day to day, minute by minute, just because the LLM feels like doing

something different. They are notoriously black box. They are very bad at describing why they did

something. And it turns out often you need to explain to people why you made a certain decision,

why they didn't get the job, why they didn't get the loan. And so you need some transparency in all

of this. And large language models are not good at that, even when you ask them to explain themselves.

They have a little bit of a reputation for lying about how they decided what they decided. And then

the the final one is that in many business decisions, there's a there's history. You have a

database of information that tells you what you should do. What might be fraudulent might be

problematic. You need to be able to process that data and turn it into analytic insight. And large

language models are not very good at that either. So for all of these reasons, you're just not going

to use a large language model to build a decision agent. So what you can do is you're going to use a

decision platform or a business rules management system for this. And let's just sort of reiterate

some of the sort of key value propositions for this kind of technology. This is well-established

automation technology that's been in use for a long time. And so we know what the benefits are. So

let's think about what some of those benefits might be. So what are our requirements for a

decision that we're going to get out of using one of these platforms? So first and foremost, we're

going to get consistency. So if I use one of these platforms, it's going to make the same decision

the same way every time. It's going to give me complete control over exactly how the decision is

made, and once I've defined it, Every customer who gets that decision made against them is going to get

it made the same way. So I get ruthless consistency. The second thing they're really good

at is transparency. Not only do I have a formal definition of how this works and what the

steps, they're, the rules are that I'm following. I can explain that to someone. I can show that to

someone, and I can log it. So I can have a complete transparent log of exactly how this decision was

made. I have this transparency that I need for a decision. The third thing they give me is

agility. Now agility is important because the way I make a decision is subject to change without

notice. Competitors change their behavior. The market changes. The regulations change. There's a

court case. There's all sorts of drivers changing the behavior of the way you make a decision. And

if you can't do that quickly, if you have to wait for there to be new data or new documents, or you

have to retrain something that's going to take too long. So you have to be able to respond more

actively, more quickly. The other thing about decisions is that there's a tremendous amount of

domain knowledge in them. So programmers often find it really, really hard to correctly build

decision agents. And so, you really want to be able to engage people who have the domain knowledge.

That means you're going to need some kind of low-code environment. You're going to need some way to

engage customers, engage, sort of, experts, I should say, in managing the behavior of a decision

agent while still being able to manage it as a programing, uh, programmatic component in your agentic

AI framework. So you need some kind of low-code environment. And then lastly, you need this way of

embedding analytics that we were talking about, where I can take analytic insight, I can turn

historical data into analytic insight. And I can embed that analytic insight in my decision so

that I can make the decision more precise, more accurate, more analytically precise. Now, all of

these are sort of classic benefits of using a decision platform. But let's just reiterate why LLM

is a, is a, tough call in these things. If I have an LLM, well it's not really consistent. It's hard

to make an LLM do the thing. This is a feature, not a bug. That variation, that randomness is part of

what makes them so powerful. it's very hard for them to be consistent. Uh, they're definitely not

transparent, right? They're very opaque about how they did things. Even attempts to get them to

explain themselves are problematic. And if I go to a customer or a regulator and say, hey, I have this

black box that's been explained by this other black box, that doesn't really induce confidence.

They are, actually, they can be hard to change. Their behavior is, uh, you know, easy to get it set

up. You don't have to, like, code it. You just, you know, provide information to it. But it's then hard

to change without re, you know, presenting new data to it, retraining it with new data. You can't just

like, tell it to stop doing X, stop doing Y. If you've watched some of the news around attempts

to block particular agents or make agents behave in a certain way. If you try and like just code

something in quickly, you get very, very strange behavior. Um, they're quite complex. They require

quite AI-like level skills to build them and manage them. And as I said, they're just no good at

structured data. They're not good at building predictive models out of historical data, and

using that historical data to improve the precision of your decision-making. They're good at

reading documents and text. They're not good at structured data. So we're not going to use a large

language model. We're going to have to use something else. So what are we going to use

instead? What technology can we use to do it? So let's go back to the scenario I talked about in

the previous video, where I talked about a bank that needed to lend money. Can't write bank today.

Needed to lend money to a person. So I wanna lend you money. And to do that, I have an agentic AI framework

that manages that whole complexity. And as part of that, I have two agents. I have two decision agents. I

identify one was an eligibility agent to say, are you eligible for a loan? And then

another one which was to say, can I actually lend you the money? Which is sort of, uh, what banks call

origination? Uh, you want to borrow this amount of money for this actual thing? Can I lend it to you?

If so, what's the rate? What's the price of this? So I have these two decision agents. Now, we've been

building these kind of autonomous agents using decision technologies for a really long time. So,

um, there's a couple of things that need to be true of a decision agent. First of all, they need

to be stateless and side effect-free. So what does stateless mean? It means that you want them just

to respond to whatever data they're given at the moment they're given the data. Here's the data,

here's the decision. Here's the data, here's the decision. Don't remember the states. That's why we

had, if you remember, a workflow agent whose job it was to remember the state. So the workflow tracks

the state, and it gathers the data for us that we need, and it passes that back and forth to

these agents. So it says, okay, at this point in the process, I need I've got this set of data about

this person, about this application, about this loan. Are they eligible? Yes or no. And you get an

answer back. And similarly with the origination decision. So they're managing the state. They're

managing all of that. And that, uh, scales better. It, uh, keeps the decision agents simpler. Makes it

much easier to check that you're not using things like personal information or health information

inside the decision when you don't need to. So it's just a much cleaner interface. But why side

effect-free? Why is it important that your decision agents don't do anything, they just make

decisions? Well, you want to be able to reuse them. Let's think about eligibility. Well, I might be

using it in the context of a workflow for originating a loan. That's one use case for it. But

I might have other processes, other workflows that do other things that send you letters or that, you

know, um, tell a call center wrap or put you into a marketing campaign and so on. So I still need to

know if you're eligible, but I'm going to do something completely different if you are

eligible. And so by separating that, by not having the action be part of the decision agent, I get to

reuse it in lots of different circumstances. So I have these stateless, side effect-free agents. Okay.

So how do I build one of these? What does that look like? What technology do I need to build a

stateless, side effect-free decision agent that has these characteristics? Well, we use what's

called a business rules management system or decision platform. So decision platforms are

software stacks designed to build, you know, historically speaking, decision services that can

then be wrapped into decision agents. So what is a decision platform have? Well, it has a number of

software components. First and foremost, it's got a couple of editors. It's got typically like an IDE

or a technical editor and a low-code editor in which you can write logic, business rules, decision

logic. So you can lay out the the actual rules, the logic that has to be followed to make a

particular decision. And those two editors generally are then linked to a single repository. Now,

this might be something I get, but it might also be a more managed repository so that you can

have version control and branching and all those things that is specialized for business rules and

decision technology and available to your low-code editor. this varies by platform, but they

all have the concept of a repository in which you can do branching, the versioning and, and all the

kinds of repository things you need to do to make sure you have a current version of the rules. And

you can do development work and have multiple people working and all that good stuff. Now, once

it's in this repository, and because it's a decision platform focused only on decision-making

logic that is stateless and a side effect-free, you can do a lot more testing and validation of

the logic, so you can validate that the logic is correct. So you can have often a set of tools that

look at the rules that are in the repository and validate them. Is the logic complete? It's the

other. Are you missing a criteria that you're not checking? Do you have overlapping ranges, all that

kind of stuff? And it's much easier to check that in the context of a decision platform, because the

logic is written in a more declarative, less programmatic way, and it's managed as a set of

assets that can be checked. So you typically have a set of validation tools so that the logic you

write is more robust. And then obviously you're going to need to test it. Now testing, um, testing

tools can be as simple as the kind of JSON object. Pass it in, see if you get the result you're

expecting UI that you would use, like with swagger or something like that but there can also be a

lot more sophisticated. Some of the decision platforms have very robust test suites, where you

can load up very large numbers of tests transactions, run them through the results, check

expected results, confirm you've passed all the tests and so on, and do all of this in a low-code

way so that your non-programmers who are writing, providing their domain expertise can also test it

to make sure they haven't broken anything. Now, when it comes to decisions, testing is is

necessary but not sufficient. Because within those decisions, within those business rules, there are

going to be thresholds, places where you make choices as a business or as an organization as to

what that threshold should be. There's not a hard, this is a good threshold. That's a bad threshold.

There's a, it's going to make a difference. So take loans. How much am I willing to lend you to buy a

boat? Well, that's a, there isn't a right answer and a wrong answer in the sense that I can't write a

test case for it. But the business could change that threshold and it has an impact. I need to be

able to track what that impact is. And so generally, we have some kind of impact tool that

takes a bunch of historical data and loads it in and then runs a set of simulations. So very

similar to a test engine, but with a different perspective. Instead of saying this broke, this

didn't break, it says, here's the difference. If you make that change to that rule, the results look

like this. And if you make this change to this rule, the results look like that. So you can see

what the impact of a change is going to be before you make it. So a lot of these tools. Uh, yeah. You

have to deploy and put a test version out before you can do these things, but several actually

allow you to do things like testing and simulation on rules you haven't deployed yet that

are just in your repository, and manage all of that essentially under the covers so that you can

do it inside your development environment So they provide a lot of tools to make sure you have the

rules correct before you deploy them. Now, once you have them correct, obviously you do, in fact, need

to deploy them. So you've got a deployment engine that deploys them as a service. So now I've got my

rules service deployed, my decision service deployed. And it's going to execute those rules.

It's got the code and the engine that it needs to execute those rules. So when you pass in data, it's

going to give you an answer. Now in this case obviously I'm going to expose it as an agent. So

I've probably got some kind of NCP (model contact protocol) server that exposes these decision

services as tools, you know, and then those can then be wrapped into an agent and

exposed in my agent framework. So what is agentic? Uh, yeah. What these agents are going to do?

The origination agent is going to say, here's my data packet. It's going to come in to my decision

service, and I'm going to get a response back, which I then, you know, goes back to my agent. So I

can quickly, uh, package up my rules as decision services. I can reuse rules and reuse logic and

package it up in multiple services, deploy those services, wrap them as agents using MCP. And now

I've got a whole series of decision agents that I'm managing from this repository. The technology

is really good at doing things like I've made a rule change, update the engine, handling in-flight

transactions so that an in-flight transaction doesn't get broken if you change the rules. All of

that kind of, uh, constant update is all handled very effectively. So what this lets me do is it

lets me build these rules, build these decision agents in a very robust way, and then deploy them

as a service that I can then use to support my agentic framework by exposing them as agents. So

yeah, this handles, if you like, most of what's going on in an agent. If you think about these agents,

this is all very prescriptive. So this is really describing how I write rules. How do I

describe the rules, the logic that prescribes how this decision is made. But many decisions have a

probabilistic component too. So, you know, um, probabilistic, I probably spelt

that wrong, but probabilistic elements too. So if it's likely that this is James

will do one thing, and if it's not likely that it's James, if it's someone's impersonating James will

do something different. If it is likely that this is a legitimate transaction, we'll do certain

things. So these are probabilistic elements that are typically built using predictive analytics,

machine learning from my historical data. So in an agentic framework, what does that look like? Well

I typically what I'm going to do is I'm also going to deploy these machine learning components.

So I might have a prediction, for instance, of fraud. How likely is it that this person is the

person who's applying for this loan? I might have another one around, credit risk. How likely are

they to pay us back? And I might have a third one, which is payoff risk. How likely are they to pay

us off early? And all of these agents are used by my origination agent as part of the

origination decision. So I need to be able to consume this. So how do I build those agents? Well,

I'm gonna use a machine learning platform to do that. I'm gonna use machine learning technology to do

it. And generally, with machine learning, you're gonna do some kind of analysis. And this might

be, um, you know, supervised in the sense that there is a human user who

is directing, directing it, or it might be unsupervised, where you're really just using the

algorithm and letting it see what it finds out about your data, which, of course, means you've got

to have data. So generally for machine learning, you have a lot of data so you have multiple

databases that have to be sort of combined and merged and managed. And you're going to do

something called feature engineering. So you're going to engineer a set of features. And features

are, you know, predictive characteristics of one kind or another, things that seem interesting. They

can be very simple. If you have a date of birth, they can come up with an age. They might classify

something. I'm going to say which customer, which age range are you in less than 20, 20 to 30, 30 to

40 and so on? Because the range seems important. But they can get quite sophisticated. They can say

things like, how often have you been more than 30 days late in the last 180 days on a payment for a

bill? Well, that has to be calculated from all this data. So there's a lot of work to not just merge

this data, but calculate these features from it. And then I'm going to feed that data and my

features that I've created into my analysis, run these machine learning algorithms, neural networks,

regression models, decision tree analytics, all sorts of different analytic techniques to see if

I can find patterns or classifications or make predictions based on the historical data that

I've got. It can supervise. I'm telling it what I'm looking for. Can you me which

features will predict that this person will pay off the loan early? And if it's unsupervised, I'm

more looking for things like. Is there anything unusual in here? What counts as an unusual pattern

of data? Because that might be indicative of a new kind of fraud, for instance. And so the supervised

generally driven by a data scientist, by a machine learning engineer. The unsupervised ones, you know,

generally, you know, being kicked off and allowed to do their own piece. And then I'm going to go

ahead and deploy these as, um, as endpoints that can be consumed by these agents. Now, we used to do

a lot of analytics in batch. So we would run these kinds of analyzes and then update the database

with a bunch of scores. Today much more likely to deploy them as individual endpoints, individual

REST endpoints that I can pass a JSON object to to score and get a result back. And obviously once I

do that, once I have an endpoint, I can use MCP again, and I can deploy those as tools that I can

make available to my analytic agent. I now have analytic agents talking to deployed endpoints. And

those endpoints run essentially an algorithm that's been built from my historical data. So

they're not analyzing the historical data at runtime. What they're doing is they're using the

results of that analysis to say, okay, here's a formula that takes this data and calculates a

payoff risk for this customer. So I can see how likely it is this customer is going to pay it off

early and use that as part of my pricing. So I have all these analytic agents, they're deployed

into my, into my into, you know, into my agentic AI framework. And then my decision service is going

to consume the results of those, those predictions, those probabilistic models as part of how it

makes the business decision to originate you or not. Now, these are two types of technologies, decision

platforms and the machine learning platforms, these are quite separate from large language

models. But that doesn't mean they can't be enhanced with large language models. And there's

two areas in particular where we see a lot of work. One of them is this idea of a large language

model for ingestion. if I've got documents. If I've got brochures, if I've got

a recorded conversation, it doesn't matter how I've recorded a bunch of data. But large language

models are really good at extracting the data I need from that. So if I've got an origination

decision and it needs to know, for instance, details of the boat you want to borrow money

about, and I've got a brochure about that boat, then I can ingest that using a large language

model, feed it directly into my origination agent as input data. So this gives me tremendous

opportunity for making it much, much easier to supply the data I need. Often these decision

agents need a lot of data. And so being able to consume documents and turn it into data is very

effective. The other place we've seen are really good um, uses is in explaining results. If you

think about I invoke this decision agent, one of the things it's going to do is it's going to log

how it made the decision. It's going to create essentially a detailed log of how it made the

decision, how much detail goes in that is something that's up to you, but you can look at quite

precisely how the decision was made. Which rules fired? How was the decision made? Now that looks

great for you. It's great for long-term improvement. great for understanding how your

engine worked, how your decision agent behaved. It's not necessarily great for explaining it to a

human being, a call center rep or a customer. So one of the other use cases for LLMs is to take this

log data and turn it into an explanation. So now I can explain how that decision was

made. And I can ingest textual data that you give me. So I can use these LLMs to make it

easier to interact with my decision agent, both in and out. Now there's one last step that I wanted

to add, which is how do I make these things learn if I want them to learn, if I want them to get

better over time? What does that look like? How do I, how do I, get my results to like, you know, have

an upward trajectory? Well, there's a couple of things to say about that. It really varies

depending on the kind of agent you have. Many of the analytic agents will learn on their own

behalf. The unsupervised ones in particular, they'll take new data and continually sort of, you

know, update themselves as new data comes in. They'll update themselves. So as you run them, they

make predictions, they make scores and new data results in new scores. And so they constantly

change their algorithms. Typically you have some guardrails on that. So they can't change too much

without telling someone. But you allow them to essentially run experiments on their own data and

experiment internally so that they evolve as predictions So those those kind of agents, agents

built on unsupervised analytic techniques are inherently learning. But other kinds of analytic

agents, uh, don't learn quite the same way. So you typically then have some data scientist who is

looking at, you know, doing new analysis with new data and proposing a new model. So they might do

this every month, every quarter, every week they review the data up until yesterday. They see what

day has changed since the last time they built the model. They rerun the model and see if the

algorithm is different or noticeably different. if it is, they typically will deploy a new

endpoint. And that gets version to control just like any other code. So any analytic agent can

learn. It might learn automatically, but it might also learn because the data scientists are

responsible for keeping it up to date over time. But what about decision agents But decision

agents don't really learn, Right? The whole point of a decision agent is that it's concrete, right?

That it's got this hardened definition of how it behaves. And so you don't really want to have it

like randomly changing its behavior. So there's a couple of things you can do. First of all, you can,

in the rule repository, you can code multiple versions. So you can put in: here's the old

version of the rules, here's the new version of the rules. And then put a rule in that says some

people get one, some people get the other. And I get to experiment to see which one works better.

So I can run what's called A/B or champion challenger testing by writing rules in my

decision agent so that it looks like one agent to the outside world, but it's got these two versions

that it's it's running comparisons for so I can learn, and then I can have somebody look at the

log, see which one works better, and, you know, close the loop, adding more rules

back into the rule repository. The other thing I can do is I can start to think about the

overall agent and how the overall agent works. And this starts to get more involved. Because if you

think about if I want to improve my origination agent, well, what does it mean to make better

origination decisions? What that means is I lend money to people who pay me back, but they don't

pay me back at once. Right? The whole point of a loan, as you might pay me back over many years. And

so I can't really tell how good you're going to be at paying it back until some time passes. So I

can't do a real-time feedback loop because it's nonsense, right? The idea that I'm going to find

out in real time whether this was a good loan decision is just silly, right? So I have to be get

a log of how I made the decision and log which scores and predictions I used and what version

everything was and stored that in my log. And then I need to wait some period of time, and then

somebody needs to come back and look at all this data and say, well, given this log data and the

versions of the analytics that I use and the results I got out of this origination decision as

processed through my workflow and actually do that analysis work. So that requires a process

and structure that you can follow. You can do it with agentic AI, but you have to be a little bit

more thoughtful about how you would do it. It's not enough just to rely on the individual agents

to learn about their bit of the problem. Someone has to own the framework as a whole, the solution

as a whole, and systematically learn from how well that works.

Building Decision Agents with LLMs & Machine Learning Models

IBM Technology

52 days ago

24:43

Agentic AI Systems

Rank #3

Description

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam → https://ibm.biz/BdehaJ Learn more about Types of AI agents here → https://ibm.biz/BdehaA 🤖 How do decision agents complement LLMs in automation? Blue Polaris Executive Partner, James Taylor, explores designing decision agents with DMN, LLMs, and machine learning models to create transparent, efficient systems. Discover how these technologies enable scalable decision-making in AI frameworks. 🚀 AI news moves fast. Sign up for a monthly newsletter for AI updates from IBM → https://ibm.biz/Bdehau #llm #machinelearningmodels #automation

Watch on YouTube

Video Details

Category

Agentic AI Systems

Featured Date

November 15, 2025

Quality Rank

#3

AI Recommended