Loading video player...
Hello, good morning Morgan Berlin.
Hopefully, it's fair to assume if you're
here at a data dog conference, you're
either building applications, you're
helping maintain reliability of
applications.
And many of you also just use data to
make decisions.
In fact, Maxim noted this earlier. You
could argue in this era where everyone,
all your competitors also have access to
those latest models,
it's the data. And by that really, we're
talking about your knowledge, your
expertise,
your subject matter expertise, your
context that you bring into your
offerings, your applications, your
services. That is your differentiated
advantage. That's what makes your
offerings valuable and special.
So, shouldn't we make sure that data is
in a good state?
Let's make this even more concrete with
some examples that our customers have
shared at recent conferences.
One of those is RAMP.
They're an expense management automated
expense management platform, but they
also use, they talk about this a lot,
they also use data as a springboard for
innovation. And just to make that even
more specific, one of the offerings
they're working on now, they recently
spoke to is using that expense data,
combining it with upstream thirdparty
data sources and the latest models to
give their customers pages to track
their software vendors and they can look
at the latest growth, retention, and
adoption of these vendors. Really kind
of an innovative new offering that you
can imagine getting some eyeballs. Now
imagine if you're putting that offering
out there and your end users see
incorrect data values
quickly
the your customers trust in your brand
would be lost. So ramp is working hard
to make sure uh data is uh clean
throughout and we'll look at what that
can look like.
Another example is MX. They provide APIs
for video streaming. another
sophisticated team with an innovative
offering. But even then, all it took was
one bug from an upstream application
to make some reports about their users
look as if their users suddenly dropped
off, sending a shock through the
business that was completely
unnecessary. And also, who can not who
can't relate to this? This is not a new
problem. You know, you look at a
business dashboard, the data doesn't
look right, and you either instantly
lose trust in that dashboard or worse,
you use the data, make a bad decision,
uh, and then your reputation is
impacted.
And uh from personal fi uh finally third
example from some personal experience of
mine at a pre previous product offering
I worked on before data dog we were
offering really critical data to our
customers through our SAS application
that was used in official government
filings financial reports. So we
basically saw if we weren't really on
top of our in that case spark pipelines
we knew about it from our customers. And
I I just saw in that case just how
important how critical this data is when
it impacts a business's reputation or a
person a human being's reputation.
Started to take it uh a lot more
seriously.
Uh but why is this hard? Why is this a
problem in the first place?
Couple reasons.
one. Well, first what we see is a lot of
the issues that occur in data are
actually happening from upstream
application changes from our well-intent
well-intentioned software engineers that
don't know how this data is used
downstream in data pipelines or business
dashboards.
These teams are operating independently
of each other.
Luckily, this is starting to change,
starting to um and AI is only
accelerating this change with both teams
realizing the dependencies actually in
both directions. Applications uh
producing the data that goes downstream
into a warehouse, but that then feeding
into models that get used in these
applications.
So, increasing the appetite at least for
more collaboration.
Another challenge, our tooling, our
observability traditionally doesn't
focus on this. What we would normally
see in an observability platform is
things like my Java service isn't
throwing errors. Seems fine.
My consumers are up and running. My
Postgress database is performing.
And yet under the hood, these silent
failures are happening. Something in the
contents itself isn't being caught.
Third challenge, these systems are just
getting more complex. Take this overly
simplistic uh example diagram where we
have one very simple application service
feeding down into a data pipeline batch
and streaming
that then goes into models, reports that
then go back into applications.
But in reality, we know each of these
hops consists of multiple components to
say the least that are changing
regularly,
often daily.
At each step of the way, we have
multiple different teams responsible for
these technologies.
So, what can we do about it?
First, let's start catching data issues
that matter.
Let's help the teams owning these data
go from being reactive to um angry Slack
messages to proactive by catching issues
and knowing about them before anyone
else.
But putting manual checks on what the
data should look like feels not
feasible.
Luckily, machine learning can help us
here.
We've been working on this actually for
uh quite a bit. And you see the impacts
of some of the models we're working on
catching an anomaly and what's actually
a critical pricing data point to uh this
example scenario. And in this case I'm
pulling this in with custom SQL so that
data knows about this. But now I'm
wondering okay but does this matter?
So to get a bit more context, I flip to
a view of lineage and I see indeed this
data set is used not only by some
business dashboards and some downstream
data sets but also my rag application,
my vector database. Okay, this matters
enough to keep looking at it. But what
happened
upstream? I'm able to see not just the
data lineage in terms of what data feeds
into this data, but the applications
changing the data along the way. and in
this case an error in a Spark pipeline.
Drilling further, I'm able to see
exactly the various stages of how that
data was transformed and how that Spark
application looked and ran with a
specific execution. Uh, and uh, in this
case, all of the really funky things
that can go wrong in Spark weren't
actually the problem. It was actually
just the problem that the data wasn't
being populated upstream.
And looking even more upstream because
data dog is also observing my
application. I see it's this Kafka
stream that where a a producer
introduced a new feature, changed the
schema, my consumer didn't know how to
process it, my Spark pipeline didn't get
the data, lots of steps with lots of
technologies,
but I've just gone from being reactive
to not even knowing about this issue to
knowing about it before anyone else. And
I've able to connect the dots between
how this is all happening.
And one of the ways Data Dog is able to
do this is because we're bringing data
observability right next to application
observability. What a lot of our
customers already use this uses for.
Another way we're able to do this is
using open source standards uh like open
lineage which we're also contributing to
uh up uh heavily uh upstream if you look
at the PRs in the open lineage
foundation.
But I think what we see is in order to
catch these data issues for any teams
who care about them, which increasingly
is everyone, we need more than just a
simple a few simple data checks. We need
to think about this holistically. And we
talk about this as bringing
observability across the data life
cycle.
Stemming issues or catching issues in
the data, but bringing them back to uh
where where where the change actually
happened, whether it's in the data
pipelines or in our event streams or in
our upstream databases. and uh
application services.
Data dog data observability is currently
in preview. A big thank you to all of
our design partners here so far. I'm
really looking forward to discussing
this with uh those of you were able to
discuss with today here in Berlin. Um
and thank you very much
While observability practices have evolved in recent years, they have largely focused on application services and infrastructure. Yet it is data what powers our applications, businesses, and AI models. When data issues occur, the consequences can be far reaching, from poor product experiences to billing errors to misinformed AI outcomes. In this session, Jonathan Morin, Group Product Manager at Datadog, shares real-world examples of incidents and explains how data observability can address them, helping teams detect issues earlier, reduce costly downtime, and restore trust in their data.