What Is an AI Stack? LLMs, RAG, & AI Hardware | DailyDevLists

Loading video player...

Full Transcript

1,241 words • EN

Whether you're building an experimental prototype for your own personal use, or creating an

application to power an entire organization, there are key components of the AI technology stack

that you must get right to build AI systems that can do more than just generate answers but solve

real, meaningful problems. Say, for instance, I'm building an AI-powered application to help drug

discovery researchers understand and analyze the latest scientific papers in their domain. Maybe it

starts with a model that I recently heard about that is supposed to be better

at highly complex tasks like that of a PhD researcher. Model is an important layer of the

stack, but it's just one piece of the puzzle. There's also the infrastructure that that model

will run on, because not all LLMs, large language models, can run on

standard enterprise CPU-based servers, and not all are small enough to run on a laptop. So it

matters what infrastructure you have access to and how you choose to deploy it. Next is data

because in this example, the whole point is to help scientists understand the latest papers in

their field. And models typically have a knowledge cutoff date. So if we want to talk about papers

from, say, the past three months, that means we have to provide the AI system with extra data.

That will be the data layer. Next would be the orchestration layer. Because to do

a complex task like this probably is going to require more than simply providing a large prompt

into the AI system and getting an output, a single output, out. Instead, we'll want to break that user

query up into different parts. Help um, plan how the AI solution is going to actually

tackle this problem, what data it needs, and then do the summarization and creating an answer and

maybe even review that answer. Finally is the application layer. And this is because at the

end of the day, there's a user using this tool. So there will have to be an interface that defines

what the inputs will be and what the outputs will be. It might not be as simple as text in and text

out. And there's also the issue of integrations. So, will the actual results of this be something

that's integrated into other tools that this user uses? It's important to understand

all the layers of the AI stack, whether you're building a solution from scratch or using

solutions which might manage several of these layers for you as a service. This is because

across the stack, from the hardware, all the way up to the user interface level, the choices you make

will have important implications on your solution's quality, itsspeed, its cost and its

safety. When it comes to infrastructure, LLMs generally require AI-specific hardware,

specifically GPUs, and these can be deployed in one of three ways. The first would be on premise,

that is, assuming you have the means and resource to buy this kind of infrastructure yourself.

Second option would be cloud, and that would allow you to rent this capacity and be able to scale it

up or down as needed. Finally would be local, which usually means on your

laptop. Not all lap ... laptops can support LLMs of different sizes, but there are certainly LLMs on

the smaller end of the range that can be run on the kind of GPUs available in a standard laptop.

The next layer is models. So AI builders have plenty of choice when it comes to what model they

can use. One dimension to consider is whether the model is open

versus proprietary. Another dimension is the model

size. So we have large language models; we also have small language models that might

be lighter weight and able to fit on more lightweight hardware,uh, but might not have exactly

the same thinking capacity as a large language model and instead be specialized for more

specific things. Finally is specialization.

Which sometimes goes hand in hand with size. Some models might perform better on things like

reasoning or tool calling or generating code. Others might have different language strengths

than others. There are plenty of new models over 2 million already in model

catalogs, like Hugging Face that can serve any mix of these different needs that an AI builder might

have. The next layer of the stack is data. This breaks up into a few different components,

so the first would be data sources themselves to supplement the model's knowledge.

This could also include the pipelines to do any processing,

pre-processing, post-processing of that data, as well as vector

databases you may use. Or retrieval systems,

also known as RAG. Vector databases is the step where that external data is actually vectorized

into embeddings that are saved so your model can retrieve that context more quickly and augment it

with this additional knowledge that the base model does not have. That's important because base

models are usually trained on publicly available information, which might not always be complete to

accomplish the task that you have. You might need to supplement with additional data. The next layer

is orchestration, because building an AI system that does something more complex than

just generating text or answering questions requires breaking the initial user input down

into smaller tasks. Those can start with things like thinking,

using the model's reasoning ability to plan out how it will tackle the problem. That

can also include things like execution, where the model does tool calling

or function calling, as well as steps like

reviewing, where an LLM can actually provide its own critique of the initial generated

responses and initiate feedback loops to even improve those responses. This layer

is very quickly evolving, with new protocols like MCP and new architectures for how to best

orchestrate increasingly complex tasks.

Next is the application layer, so the most widely used AI systems do follow a pretty simple design of text in and text

out. But as we use these tools in our work in life, there are important features that become critical

for the actual usability of AI and these factors make up the application layer.

First factor is interfaces.

The most classic interface is text in and text out,

but there are other modalities that can be very valuable for certain tasks too, like image, audio,

numerical data sets, and plenty of other custom data formats.

Also, in the interface,

it's really important to keep in mind the ability to do things like revisions or

citations so that when the user sees what the model comes up with, they have the ability to edit

that or inquire on it further. The second consideration is integrations, and that

comes both in the form of integrations, of allowing other tools that the user uses to

actually send inputs to the AI system, or to take the

model outputs and automate how that gets integrated into some of the tools that they use

in their day-to-day work. All together, these layers of the AI stack, from the

hardware to the models, the data you use, how you orchestrate it, and the application and the

usability of it, matter because when we have a clear understanding of how they fit together, we

can see what's truly possible and make practical choices to design AI systems that are reliable,

effective and aligned to our real-world needs.

What Is an AI Stack? LLMs, RAG, & AI Hardware

IBM Technology

74 days ago

9:06

RAG & Vector Search

Rank #3

Description

Ready to become a certified Certified watsonx Generative AI Engineer? Register now and use code IBMTechYT20 for 20% off of your exam → https://ibm.biz/Bdb2vE Learn more about What is an AI Stack here → https://ibm.biz/Bdb2vH 🚀 What is an AI stack and why does it matter? Lauren McHugh Olende explains how LLMs, vector databases, and orchestration layers integrate with AI hardware to power real-world systems. Discover how these components enable smarter workflows and reliable AI solutions. 🤖✨ AI news moves fast. Sign up for a monthly newsletter for AI updates from IBM → https://ibm.biz/Bdb2vr #aistack #llm #ai #machinelearning

Watch on YouTube

Video Details

Category

RAG & Vector Search

Featured Date

November 13, 2025

Quality Rank

#3

AI Recommended