Bringing Observability to Data | DailyDevLists

Loading video player...

Full Transcript

1,373 words • EN

Hello, good morning Morgan Berlin.

Hopefully, it's fair to assume if you're

here at a data dog conference, you're

either building applications, you're

helping maintain reliability of

applications.

And many of you also just use data to

make decisions.

In fact, Maxim noted this earlier. You

could argue in this era where everyone,

all your competitors also have access to

those latest models,

it's the data. And by that really, we're

talking about your knowledge, your

expertise,

your subject matter expertise, your

context that you bring into your

offerings, your applications, your

services. That is your differentiated

advantage. That's what makes your

offerings valuable and special.

So, shouldn't we make sure that data is

in a good state?

Let's make this even more concrete with

some examples that our customers have

shared at recent conferences.

One of those is RAMP.

They're an expense management automated

expense management platform, but they

also use, they talk about this a lot,

they also use data as a springboard for

innovation. And just to make that even

more specific, one of the offerings

they're working on now, they recently

spoke to is using that expense data,

combining it with upstream thirdparty

data sources and the latest models to

give their customers pages to track

their software vendors and they can look

at the latest growth, retention, and

adoption of these vendors. Really kind

of an innovative new offering that you

can imagine getting some eyeballs. Now

imagine if you're putting that offering

out there and your end users see

incorrect data values

quickly

the your customers trust in your brand

would be lost. So ramp is working hard

to make sure uh data is uh clean

throughout and we'll look at what that

can look like.

Another example is MX. They provide APIs

for video streaming. another

sophisticated team with an innovative

offering. But even then, all it took was

one bug from an upstream application

to make some reports about their users

look as if their users suddenly dropped

off, sending a shock through the

business that was completely

unnecessary. And also, who can not who

can't relate to this? This is not a new

problem. You know, you look at a

business dashboard, the data doesn't

look right, and you either instantly

lose trust in that dashboard or worse,

you use the data, make a bad decision,

uh, and then your reputation is

impacted.

And uh from personal fi uh finally third

example from some personal experience of

mine at a pre previous product offering

I worked on before data dog we were

offering really critical data to our

customers through our SAS application

that was used in official government

filings financial reports. So we

basically saw if we weren't really on

top of our in that case spark pipelines

we knew about it from our customers. And

I I just saw in that case just how

important how critical this data is when

it impacts a business's reputation or a

person a human being's reputation.

Started to take it uh a lot more

seriously.

Uh but why is this hard? Why is this a

problem in the first place?

Couple reasons.

one. Well, first what we see is a lot of

the issues that occur in data are

actually happening from upstream

application changes from our well-intent

well-intentioned software engineers that

don't know how this data is used

downstream in data pipelines or business

dashboards.

These teams are operating independently

of each other.

Luckily, this is starting to change,

starting to um and AI is only

accelerating this change with both teams

realizing the dependencies actually in

both directions. Applications uh

producing the data that goes downstream

into a warehouse, but that then feeding

into models that get used in these

applications.

So, increasing the appetite at least for

more collaboration.

Another challenge, our tooling, our

observability traditionally doesn't

focus on this. What we would normally

see in an observability platform is

things like my Java service isn't

throwing errors. Seems fine.

My consumers are up and running. My

Postgress database is performing.

And yet under the hood, these silent

failures are happening. Something in the

contents itself isn't being caught.

Third challenge, these systems are just

getting more complex. Take this overly

simplistic uh example diagram where we

have one very simple application service

feeding down into a data pipeline batch

and streaming

that then goes into models, reports that

then go back into applications.

But in reality, we know each of these

hops consists of multiple components to

say the least that are changing

regularly,

often daily.

At each step of the way, we have

multiple different teams responsible for

these technologies.

So, what can we do about it?

First, let's start catching data issues

that matter.

Let's help the teams owning these data

go from being reactive to um angry Slack

messages to proactive by catching issues

and knowing about them before anyone

else.

But putting manual checks on what the

data should look like feels not

feasible.

Luckily, machine learning can help us

here.

We've been working on this actually for

uh quite a bit. And you see the impacts

of some of the models we're working on

catching an anomaly and what's actually

a critical pricing data point to uh this

example scenario. And in this case I'm

pulling this in with custom SQL so that

data knows about this. But now I'm

wondering okay but does this matter?

So to get a bit more context, I flip to

a view of lineage and I see indeed this

data set is used not only by some

business dashboards and some downstream

data sets but also my rag application,

my vector database. Okay, this matters

enough to keep looking at it. But what

happened

upstream? I'm able to see not just the

data lineage in terms of what data feeds

into this data, but the applications

changing the data along the way. and in

this case an error in a Spark pipeline.

Drilling further, I'm able to see

exactly the various stages of how that

data was transformed and how that Spark

application looked and ran with a

specific execution. Uh, and uh, in this

case, all of the really funky things

that can go wrong in Spark weren't

actually the problem. It was actually

just the problem that the data wasn't

being populated upstream.

And looking even more upstream because

data dog is also observing my

application. I see it's this Kafka

stream that where a a producer

introduced a new feature, changed the

schema, my consumer didn't know how to

process it, my Spark pipeline didn't get

the data, lots of steps with lots of

technologies,

but I've just gone from being reactive

to not even knowing about this issue to

knowing about it before anyone else. And

I've able to connect the dots between

how this is all happening.

And one of the ways Data Dog is able to

do this is because we're bringing data

observability right next to application

observability. What a lot of our

customers already use this uses for.

Another way we're able to do this is

using open source standards uh like open

lineage which we're also contributing to

uh up uh heavily uh upstream if you look

at the PRs in the open lineage

foundation.

But I think what we see is in order to

catch these data issues for any teams

who care about them, which increasingly

is everyone, we need more than just a

simple a few simple data checks. We need

to think about this holistically. And we

talk about this as bringing

observability across the data life

cycle.

Stemming issues or catching issues in

the data, but bringing them back to uh

where where where the change actually

happened, whether it's in the data

pipelines or in our event streams or in

our upstream databases. and uh

application services.

Data dog data observability is currently

in preview. A big thank you to all of

our design partners here so far. I'm

really looking forward to discussing

this with uh those of you were able to

discuss with today here in Berlin. Um

and thank you very much

Bringing Observability to Data

Datadog

8 days ago

9:14

AI Evaluation & Monitoring

Rank #1

Description

While observability practices have evolved in recent years, they have largely focused on application services and infrastructure. Yet it is data what powers our applications, businesses, and AI models. When data issues occur, the consequences can be far reaching, from poor product experiences to billing errors to misinformed AI outcomes. In this session, Jonathan Morin, Group Product Manager at Datadog, shares real-world examples of incidents and explains how data observability can address them, helping teams detect issues earlier, reduce costly downtime, and restore trust in their data.

Watch on YouTube

Video Details

Category

AI Evaluation & Monitoring

Featured Date

November 20, 2025

Quality Rank

#1

AI Recommended