Prometheus Intro, Deep Dive, and Open Q+A - Owen Williams, Grafana Labs & David Ashpole, Google | DailyDevLists

Prometheus Intro, Deep Dive, and Open Q+A - Owen Williams, Grafana Labs & David Ashpole, Google | DailyDevLists

Loading video player...

Full Transcript

4,078 words • EN

All righty. Why don't we go ahead and

get started? So, this is the Prometheus

maintainer track. Um, I'm gonna be

talking about You can't hear me? Do I

need to talk a little bit closer? Okay.

Wow.

Okay.

All right. Um, this is the Prometheus

maintainer track. And, uh, I'm David

Ashpole. I work for Google.

Um, I wear a lot of hats. So, as you may

notice, I have one of the open telemetry

maintainer shirts on, but I'm also um a

Prometheus team member, which is why I'm

here giving this talk today.

Unfortunately, my co-presenter, who

actually did most of the work for these

slides, was not able to make it because

he's in Europe. Um, so you guys get to

listen to me for for half an hour today

instead.

All right. If you've been to a

Prometheus talk before, you may

recognize this. Has anyone heard of

Prometheus before? Show of hands.

All right, keep them up if you've used

Prometheus before.

Keep them up if you use Prometheus in

production.

And last and not least, raise your hand

if you know what the plural of

Prometheus is.

It's Promethei. Listen to the talk. Um,

it's really funny, actually.

All right, for those in the audience who

haven't heard of Prometheus before,

um, Prometheus was originally developed

by Soundcloud and inspired by actually

Borgmon from Google and uh, it's a

monitoring and alerting system

and it was the second project to join

the CNCF after Kubernetes.

Roughly speaking, Prometheus is broken

down into an ecosystem of exporters. So

[snorts] your custom application may

have a Prometheus exporter endpoint on

it. But there are also common ones like

node exporter or cube state metrics that

expose metrics in Prometheus format.

Then there's the Prometheus server which

is in charge of discovering the

endpoints that need to be scraped doing

the actual scraping of that metric data

storing it in a time series database and

making it available for querying and

alerting using PromQL which is

Prometheus's query language.

Alerting is very important

so that you know if your production is

on fire or not. Um so Prometheus is

depended on all around the world for uh

making sure that production systems

everywhere are available.

Uh Prometheus was originally um started

work on in 2012 and has gone through a

lot of iterations since then. One of the

biggest milestones was the project's

graduation, which means that uh the CNCF

deems it to have reached a high level of

maturity, have strong governance, a

large contributor base, and uh lots of

people using it in production.

But we've been busy and uh if you've

been paying attention for the last year

or so, Prometheus 3.0 was a big deal.

Came with a lot of new features. And

today in November of October or of 2025,

um we've most recently released the 3.7

release. So that's where we are today.

So what am I going to cover today? Um

we've done a lot of work on open

telemetry compatibility. So that

includes deltas. Oh my goodness. Um it

includes keeping the original open

telemetry names with dots. Um even

improvements to resource handling and

scope handling.

Um, I have a couple updates on native

histograms. They're finally stable.

Hooray. And, uh, there's a new cool

thing around called native histograms,

custom buckets. What are those? And

finally, um, there's been some

improvements to PromQL. So, um, without

further ado, let's talk about open

telemetry.

Um

uh the Prometheus server supports OTLP

at the standard OTLP uh path and so uh

actually since 3.0 Prometheus has

supported pushing metrics to it which is

for me kind of crazy um and you can do

that in OTLP format but we've gotten a

lot of feedback. We've done some surveys

and there's a lot of improvements that

we've been trying to make to the open

telemetry experience of using

Prometheus.

So, first you may think that Prometheus

and Open Telemetry were enemies, were

competitors, but that's not actually the

case. Prometheus um a while ago actually

published a big blog post committing to

supporting open telemetry. Um you know,

it it has made many different design

decisions, I'll say. And um you know

there's definitely been friction between

the communities but we and the CNCF are

working hard to make sure that the

projects work well together um and can

be used

in a variety of mixed ways but it's

still a work in progress and sometimes

you need to turn on experimental

features to have it work as intended

right now. So, this is still not not

quite there, but we've been making a lot

of improvements. And there's a nice

guide on the Prometheus website telling

you exactly all the details of what to

turn on, what not, and what some of the

trade-offs are that you have to make

there.

Okay. One of the most requested features

since 3.0 is to be able to keep your

original Open Telemetry metric names.

Open telemetry is built around the idea

of semantic conventions that you know a

metric should have one name and one name

only and should have exactly these set

of labels like stuff like that. Um and

so it a lot of people had asked to be

able to keep that original structure

even when sending it to something like

Prometheus which most people expect to

have underscores sprinkled everywhere

and unit suffixes and that sort of

thing. So, we've introduced a new

translation strategy option and one of

the options that we've added recently is

the no translation option.

There's some gotchas though which I'll

get into.

Um, yeah. So, some of the changes are

changing like dots to underscores that

happens or that happened before this and

also adding suffixes.

So, let's talk about uh type and unit.

So, Prometheus has always recommended

strongly that you should include a type

hint in your metric name like underscore

total and that you should include the

unit in your metric name as well. So,

Prometheus names are typically named

something like HTTP request duration

seconds, you know, total, I guess, if it

was a counter. Um, but open telemetry

doesn't do that, right? Open telemetry

is HTTP.reest.duration.

And if you're looking at this in a YAML

file, good luck. Um hopefully you can

remember what unit it was. So um to

partially address that, one feature that

we've worked on is called the type and

unit labels feature. And we do recommend

turning this on if you're ingesting open

telemetry metrics via the OTLP endpoint.

What this does is it adds type and unit

as labels on the metric. So that if you

do end up say ingesting one met metric

that's in seconds and a different one in

milliseconds, you can actually

disambiguate them later at query time.

Uh and the hope is also that this will

uh be used to provide richer user

interface experiences as well by making

type and unit more readily available.

But um we're still looking for feedback.

So if you love it or hate it um make

your voice known.

Open telemetry is working on delta

support. It is in the super duper duper

early stages. So we when we ingest delta

counters mark them as unknown because

Prometheus they're obviously not the

same as traditional Prometheus

cumulative counters. Um but this is

going to be an area that I think

improves slowly over time. Things like

adding start time stamp support. Um, and

there's obviously going to be a lot of

work on the storage and query layer

required for this as well, but the very

beginnings of this has already sort of

started. And if you want to play around

with it, I would say it's in a state

where if you do happen to have delta

metrics, they are usable in some in some

cases, but do have some pretty severe

limitations.

Another massive area of feedback has

been dealing with open telemetry

resource. So, open telemetry sticks a

ton of metadata about where a process is

running in resource attributes and not

just like one or two resource attributes

but like 20 or 50 sometimes. Um, so

dealing with these in Prometheus and in

the data model can be a little bit

tricky. Um, if you promote them all,

well, one you've added tons and tons of

labels to all of your metrics. it clut

clutters up your UI and that's not

really a great solution. But if you've

been following the mapping up until this

point, you can also take them and like

just stick them in a different metric

called target info [snorts] and make

everyone write a nice long join query.

So that also has some problems and

instead of picking one one of the two

approaches, we've actually kind of been

trying to make both approaches a little

bit better supported.

So first, if you want to take the

approach of promoting resource

attributes into your Prometheus labels,

we've added better support for that. So

now we have an allow list and a deny

list essentially. So if you want to add

in certain resource attributes whenever

they're present, uh you can use promote

resource attributes. And then if you

want to add in all resource attributes

except for those problematic ones over

there, you can do that with promote all

of them and then ignore resource

attributes.

Um, and then there's separately one

special for the service resource

attributes called keep identifying

resource attributes and you can yeah

learn more on the the website there.

There's actually some pretty good best

practices there. So if you're looking

for the copy paste and forget version of

this, it's probably right there.

Second, um, if you, we do put all the

resource attributes in a metric called

target info, but we've gotten a lot of

complaints over the past year or two

that this is basically unusable for most

people. Um, you can see the query you

have to write here is quite long

compared to just the thing that you were

trying to do, which is get your HTTP

server duration over the last 2 minutes,

right? So, that's a pain in the butt. In

order to try and make that experience a

little bit better, make the queries more

readable uh and a little bit more

ergonomic, we've introduced the info

function. So the idea here is that we

want to add some um we want to just make

joining with an infometric across

Prometheus generally a simpler concept.

And so you can see it actually does

simplify it quite a bit and it's a maybe

a little bit easier to tell what's going

on, which is that we're trying to add

the Kubernetes cluster name to the

metric that we're quering there.

And that has to be enabled with a

feature flag um for experimental promql

functions.

And I think finally, whoops. Yeah,

finally um we've added a little bit

better support for open telemetry scope.

So if you've ever been debugging a

metric and you're like who the heck

defined this? Um that's what open

telemetry scope is meant to answer. So

it tells you usually the package name

where the metric is defined and also

includes the version. So, uh, it can be

helpful to catch regressions. If a

metric broke or if you're seeing, as was

the case before, one metric in seconds

and another in milliseconds, you can

kind of figure out who the culprit was,

who defined their metric, not using base

units.

Um, those do those are opt-in still. So,

um, there's a promote scope metadata

option there that you can turn on to get

those as labels.

All right, let's talk about native

histograms. Native histograms are an

awesome new feature that comes with

Prometheus 3.0. For more details on

exactly how they work and why they're

awesome, some of the previous talks are

probably going to be better than this.

But I'm going to focus today on

something called native histograms with

custom buckets,

which when I first heard it sounded kind

of bizarre, like isn't the whole point

of native histograms that they have

exponential buckets?

All right, so we'll start with an

overview of the classic Prometheus

histogram today.

This example has six individual samples,

right? Each there's four of them that

have the bucket suffix with the LE

label, right? Meaning less than or

equal. And each series has the count of

observations that were less than the

threshold, right? Most people are pretty

familiar with this. There's also a sum

and a count series.

On the other hand, native histograms,

even though they're full of information,

right? They're very dense, often times

could have like 100 buckets. Um, they're

only stored as a single data point.

They're stored as a single complex

sample.

And it turns out that especially in the

TSDB and the query layer, that that is

actually really, really efficient. So

the logical next step is for someone to

say, hey, why don't we just store all

the histograms as a complex sample type?

Right? So turns out

why not take our regular histograms and

turn them into this complex type and

store them that way and query them that

way so that we get all these efficiency

gains and um so that's what was

implemented and that's called native

histogram with custom buckets.

There's there's mostly one catch to

this, which is that um if you're

familiar with native histograms,

they're not individual series anymore.

So, you can't query for them like that.

There's no series in a native histogram

that has the underscorebucket suffix,

for example. Instead, you have to access

fields in that complex histogram type by

using functions on it. So, um I might

use the histogram count function to get

the count of a native histogram. So when

you migrate or if you if and when you

migrate to native histograms with custom

buckets from classic histograms, you'll

need to change your queries as well to

actually query using these new native

histogram functions.

Um, the folks that worked on this, I

think, thought through a lot of

different migration cases, and they have

a couple different config options that

you can use to either write just the

classic ones, both, or just the new uh,

native histogram custom bucket versions.

So, definitely something worth trying

out. Um, and yeah, there's some real

benefits to using this.

All right, for a fun like let's look at

a bunch of the features all in one view.

Here we have a metric that is using UTF8

support. So it's got some dots in it,

right? It's a native histogram. So as

you can see it um

is this a native histogram? I think it's

a native histogram. Um and it's also

quered on the queried on the new

Prometheus 3.0 user interface. So, this

is actually showing off a bunch of the

cool new features all in one view. Ah,

and adjusted via the OTLP endpoint.

And finally, let's talk about some of

the changes to PromQL. These are all new

feature additions, so it's a great time

to offer feedback on them um or play

around with them, find and report bugs,

things like that. So, first um this

apparently has been open a while but is

finally getting looked at which is now

you can finally use you know plus or

minus in uh durations. So I can do 9

minutes and 2 seconds by writing 9

minutes plus 2 seconds instead of having

to write 542 seconds. It doesn't yet

apply to the at offset but hopefully

that'll come in the future.

Next, um, it it turns out it's quite

expensive to query for timestamps of

metrics. And Prometheus doesn't have

like a general way to say, I'd like to

query for a bunch of time stamps and do

math on them. There's a time stamp

function, but that has a lot of

performance issues from what I've read

here. um instead of trying to work

towards a generic solution for now

they've tried to address the most common

use cases of quering for time stamps

with these new functions that they've

added. Um, so time stamp of min or max

or last over time. And so if you have

encountered this, there are now fixes.

Um, but if your use case isn't met, it's

probably also a good idea to raise your

voice too.

All right. Finally,

um, I think this is actually really

cool. So there's now support for a

little bit more control over how

extrapolation works in ProQL.

So there's two new keywords that are

supported. One is anchored and the other

is smoothed.

So if you look at how

currently PromQL deals with missing

data. So on the

on the left, man, I'm bad at left and

right. On the left, you can see what

Prometheus does today. If there's

missing data, it'll show you the two

pieces that it knows about. And for

example, if you measure the increase,

it'll show you the portions it knows

about, which adds up to about six in

this case. Um, if you use the smoothed

keyword, I think it's called a keyword.

Um, if you use the smooth keyword,

Prometheus does a lot more

interpolation. It ignores, for example,

staleness markers and just linearly

interpolates between points that it

knows. So if you've ever looked at a

Prometheus dashboard that has some like

intermittent data or data missing,

you'll just see like a scatter plot and

it's not very useful. This is

potentially more useful in situations

like that because the UI will then or

the query engine will then draw smooth

lines between all your points and make

it a little bit more usable. It's maybe

harder to see that data is missing, but

it's closer to probably what the

original data was supposed to be.

There is a big catch with this though,

which is that it requires a data point

before the query window and requires a

data point after the query window to

look correct. This is mostly really

important if you're talking about

recording rules or alerts because let's

say you had a constant line going

across, it doesn't have a point in the

future if you're quering all the way up

to now. And so you'll see it drop off.

That's the case in like some other

monitoring products as well. But if

you're doing an alert and you're looking

at the data that's dropped off because

it doesn't have data in the future yet,

um you might get the wrong alert value.

So there's good documentation on exactly

how to handle that if you still want to

use the smoothed um rule. But just be

careful using this. Like it's great for

dashboards and probably is something

you'll want to play around with, but you

just have to be careful with uh looking

at time. that's really close to the

current time where you might not have

the most recent data yet.

Now, let's talk about anchored.

Show of hands again. Who has ever

queried for an error metric only to be

confused as to how you got 3.75

errors in an interval? Right? That's

that's a pretty common piece of

confusion. How the heck can you query

over a bunch of integers and get a

float? Right? So that's where the new

anchored keyword comes in. So it doesn't

do any fancy extrapolation. It doesn't

do any interpolation.

It just gives you exactly what the

change was between the values that are

in the time series database. So if the

last value was what is it here? If the

last value is four or three and then the

next value is nine, it's not going to

try and draw lines all the way between

them.

This is let's see

this avoids

this avoids over interpolation. So it's

generally it it's making fewer

assumptions. And it can quite it can be

quite useful for like business metrics

use cases where you don't really want it

to be an overestimate or to be guessing

as to what comes next, right? Um, and if

you're querying with an increase on

integer data, you will only get integers

back. So, in many ways, that can make

dashboards on things like errors much

much easier to actually understand.

All right. Um, there's also been some

recent changes in governance. So, we did

a big batch of new team members. Um, and

that was a a good process and I think uh

welld deserved for them. So here are the

the people that were added recently and

as you can see there's actually quite a

good mix of of companies represented

here which is very exciting for the

project.

Uh we're working on a new 2.0

of the governance structure and are

trying to

um mimic more the other CNCF projects in

terms of project structure. So it right

now Prometheus has just a group of team

members, myself included. Um, but we're

moving to a model that's more similar to

other projects where there'll be

contributors members maintainers and

then a smaller steering committee that

makes the hard decisions.

All right, there's a ton of stuff that's

happening that's going to be part of the

next Cube CubeCon talk in EU or North

America next year that you can be a part

of. So, um, and yeah, I'm excited about

a lot of like there's much to do with

Delta support. There's open metrics,

which is we're working on a 2.0 and

that's extremely exciting. And there's a

bunch of other really useful features

that we need feedback on, but also

appreciate contributor help with. So,

um, if anything excites you, get

involved, talk to me afterwards, and

I'll try and connect you with the right

people.

Um, Slack is also a great place to leave

feedback. If you see neat thing today

that you love or that you hate like let

us know. Um there's a variety of of

channels here for different topics.

All right, I'm happy to take questions

now.

I think we've got five minutes.

[applause]

This turned on

where it's on. Okay.

>> Yeah. Do you plan to do anything about

the initial zero problem?

>> Say that one more time. Initial zero

problem where you know when the metrics

comes in it's not set to zero and then

something you know get reported but when

you are running rate function or

increase function it doesn't return

anything because there was no zero. So

I I'm not familiar with this specific

problem. I do know that a lot of the

weirdness around querying when a metric

initially started is

in some respects addressed by adding

created or start timestamps because then

we actually have a point at a time stamp

with a zero value but I again I'm not

super famili familiar with that exactly

so I'm not sure if if that'll fix it or

not.

>> Okay.

Hi, uh I have a question not related to

the topics in the talk but uh mainly

uh regarding the wall corruption in

Prometheus. Um

uh I know a lot of uh these wall

corruption issues have been fixed over

various versions but we still keep

hitting them occasionally and the only

way to fix them is to you know go in and

just RM minus RF the wall directory. Um

so is there a plan to add it in prompt

tool uh where Prometheus can

automatically handle this or provide

some tooling around this

un I unfortunately not like an expert in

the TSDB. I I encourage you to open an

issue or look for existing ones if uh it

it sounds like it sounds like a useful

feature. it Prometheus sometimes will

solve problems like that itself and

sometimes it expects like an operator or

some wrapper you know like something

managing it to solve it so I I'm not

sure maybe I don't know if you know but

>> okay thank you

>> yeah you're welcome

all All right. Thank you everyone for

coming. Thank you for coming all this

way out to C11.