KubeCon 2025: What's Next for Observability? #IBMPartner | DailyDevLists

Loading video player...

Full Transcript

5,222 words • EN

What's up, everybody?

I'm Ned Bellavance, and welcome to a special sponsored

video about the future of Observability.

A few weeks ago, I attended Kubecon in Atlanta,

and while the focus was on Kubernetes, like the name implies, there was a lot

of buzz and activity around Observability.

The day zero Observability Day event seemed to be the best attended

of the pre-conference activities with two separate tracks and rooms.

There were dozens of talks about observability-related topics,

and the Expo floor had a large concentration of vendors

specifically aimed at observability.

I guess that shouldn't be a surprise.

As applications have become more complicated and multi-tier,

trying to monitor and track their status has also become more complicated.

Simply firing up Nagios and setting some thresholds is not nearly enough.

You need tracing, correlation, fault detection, and application integration.

The explosion of AI on the scene only makes this whole thing more fraught.

But at the same time, it also might solve some of the problems.

To dig deeper into the world of observability,

I sat down with Drew Flowers and Jacob Yackenovich from IBM Instana Observability.

We talked about the challenges of application performance monitoring,

how AI is changing what observability means and what Instana is doing to embrace

AI while keeping humans in the loop.

We're here to talk about observability, and I can clearly see the two of you.

And we can see you.

We're already ahead of the game.

We're already observing everything.

I want to start with you, Drew, and find out how you got

into observability in the first place, because it's not something you go

to school for and you're like, When I get older, I want to be an observationalist.

I want to hang out with charts and graphs all day long.

Really, it boils down to I spent 17 years in various data center jobs,

engineering at first, architecture.

And in the last job that I had, when I joined the company,

their monitoring mission was in what you would call a bit of a state.

We had just a basic SolarWinds operation, but it was designed as a private cloud.

So we were delivering SaaS operations to utility companies,

but simply monitoring these data centers, mostly from a purely hardware standpoint.

What is my What is my VMS X host telling me?

What is my VM telling me?

What is my network, my storage telling me?

But the problem is that fundamentally, when you're delivering a software as

a service platform, it really isn't the hardware that truly

matters at the end of the day.

It's the application.

How is that behaving?

How is that interacting with the customers?

And that was a complete gap in our operation.

Okay, so you could see something's wrong with this host maybe,

or this VM is X amount utilized, but that didn't mean anything

from an application standpoint.

You couldn't connect the two dots.

It didn't mean anything when a customer calls me and says, I'm trying to run this

report against my metering system, and it's not responding.

Okay.

So that was the fundamental problem we had to solve at first.

And the only way to really do that is through a more application- rather

centric approach, which we didn't have. Okay.

So of course, me being the squeaky wheel in the room that says this isn't working,

what do you think happens?

You speak up, now it's your problem.

It's your job now.

So I got to sign the task of how do we fix this?

And at least when it came to Instana and Observability, that's where

my journey with this company truly began.

I had a POC for every tool you can imagine, from AppDynamics

to New Relic to Datadog to Dynatrace.

And what we were discovering was that either the product incurred too much

on our developer team to be able to implement,

or if it was automatic, like say, Dynatrace is,

it was just too expensive It's a positive for us to justify because we weren't

the largest operation under the sun.

And that's how I wound up through an advertisement,

believe it or not, on Reddit.

It was actually work.

Of this little robot that said, Let's stand, solve all your problems,

automatic application monitoring.

And at my wits end, I took a demo call.

And as I sat there and listened to the pitch, I saw a lot

of things that it just made sense. It clicked.

What they were trying to pitch me didn't seem to have any marketing fluff to it.

Okay.

I want to back up to something you said about the different potential

observability tools that were out there.

And in my experience, one of the main things a lot of them asked

you to do was have your developers go and update their code in some way to add

additional instrumentation so this service, whatever it was,

could collect that information and make those correlations.

Is that what you were indicating?

Was that an issue with some of the products you looked at?

Oh absolutely.

I mean, at the time, there was A little bit of development towards automation.

There were only a couple of companies.

At this point, it was more of your Instana and your Dynatrace were

the only two that I could find.

But one is very cost prohibitive because they're on the bleeding edge.

But when you looked at the other products, they were still reliant on a lot

of manual instrumentation.

They may provide you some libraries, but those libraries had

to be injected into the code.

On top of that, anything that you were doing custom, now you're talking about

entry and exit spans, by hand instrumentation.

Our development staff just said this was too much.

We need to focus on building product.

So it was unviable in our architecture.

And then you take the complication

of Kubernetes and Docker, where now it's not just about the manual instrumentation.

It's also about the fact that you have all of these native agents

that are designed per language.

And because of that, now you're talking about, I need to put

the Java agent in the Java container, the. Net agent in the.

Net container, on and on and on. I see.

You're more of a polyglot situation.

You have all these different code bases, all that need to be instrumented

slightly differently.

That's going to be really difficult for your development team to manage.

And if you're bringing in stuff from other development teams or outside,

that's not going to be instrumental. It at all.

I mean, even abstracted from the development team as

the operations guy, guess who has to manage all those

agents in each individual container?

Well, you did raise your hand.

Okay, so that all makes sense to me, your journey, how you got

from there to here today.

I personally don't know a lot about Instana, and my assumption when I started

doing some reading was, you're going to need to add some stuff

to the software or install an agent on every machine or

something along those lines.

But let me back up and ask you, what was involved in actually getting

Instana up and running in that POC?

So that's the funny thing.

If you go around our LinkedIn, and I think it might even be on our

YouTube, shortly after the acquisition, I made a video that literally said,

How long does it take to install Instana?

Because we get this question all the time.

You guys advertise on ease of use.

You advertise on automation.

How automatic is this thing?

There's a video out there that shows me end-to-end doing the whole thing

in one minute and 27 seconds.

Okay, so that's pretty fast.

What are you doing in that roughly 90 seconds?

Copy and pasting a curl command and getting a drink.

And it's the drink that takes the longest. Exactly.

Because I'm just wasting time while I'm letting the script It's a do its thing.

And that's really the nature of it.

You're talking about an architectural change, where before,

what we refer to as a sensor, which is a component of our agent,

it's an automatically deployed thing.

And we chose the term sensor very explicitly, Because when you talk about

language agents, that denotes manual effort.

You need to put this agent in place.

A sensor, by definition, is a bespoke piece of technology that is

designed to autonomously complete a job.

Okay.

So in our structure, Here, you have the agent framework that exists

in Kubernetes as a demon set or a process and a host.

And as that agent is running scans of the host, it's looking at process

trees, it's looking at the file system, it's identifying what is

actively running on this system.

As it identifies these technologies, it's dynamically deploying these sensors

that attach to each process individually.

So the automation is taking that former manual labor that you used to have to do,

and it's eliminating it through automation.

And it's a custom agent for each type of process.

So this one's running a JVM, or is it a different sensor?

The sensor has its own thing.

So you have the agent, and then say in the example you use, you have a Java system.

It's a zero touch implementation for Java tracing.

And this is what blew my mind, because our old app stack in my last

job was not a microservice architecture.

It was an app box with 37 JVMs and then a database server.

Very typical of the time.

But because Because of the nature of the automation,

there are two APIs under Java that it uses, JDK attach and JDK instrument.

Because of that, the sensor is able to hook the JVM,

grab the metrics, interact with the ByteStream,

inject the instrumentation, and it was doing this without

requiring me to restart the JVM.

That was the moment I became sold because when I installed this on my app box,

within two, three minutes, my dashboards are now populating

with traces from all 37 Java services, completely automatically

with zero effort from myself.

I want to go back to where we were in the deployment process.

So I have this curl command.

It's setting up an agent, and then that agent is deploying

sensors based off of processes.

What happens then?

Sensors report data back to the agent.

The agent is more or less a perpetual Kafka stream to the back-end.

So when you think about agents, one of the major problems

in observability, especially, is that they're intense against

the operation of the VM.

They require more resources.

They especially more RAM. Yes.

This is a byproduct of, effectively, agents that spoil data in memory

and then fire it off in a bulk. Okay.

We switched this around to be a Kafka stream.

And within our laboratory test environments,

the way that this works is that when a sensor reports data back to the agent,

the time it takes from the agent to reach the back-end in ideal network conditions

and visualize that information, it's usually about three seconds.

Okay.

So not only are you getting much faster telemetry than you typically

would in a standard bulk send environment, but the flexibility that provides allows

our agent to be extremely lightweight while collecting 100% of every trace.

We do not sample it at the agent or the back-end level,

and we collect metrics every single second, so it's near real time without

significantly impacting the host. Okay.

And what do you get at the end?

So I've instrumented it.

I've deployed the agent, I got the sensors, it's streaming

the stuff to the back-end.

What am I seeing How is it all getting connected together to give me

observability into my applications and environment?

Well, that automation that I talk about is not purely about taking away the manual

effort that's required to deploy things.

There is actually a piece of technology under the hood that is core to our data

model, and we call it the dynamic graph. Okay.

The idea is that the agents in their automation,

yes, they're helping you day to day deal with the maintenance of the tool,

but because they're running that discovery in perpetuity,

every single change that occurs in the environment is reported to our back-end.

And in real-time, the back-end is dynamically mapping everything.

It's not just about application dependency maps and how this service

connects to that service.

It's about underneath that, the host and agents installed on top

of the containers that are operating on the host, the process in the container,

the technology in the process, the service the technology serves out,

and the endpoints that are being served by the service.

So Instana has, in perpetuity, a real-time map of the entire environment

from infrastructure to the edge. Okay.

And that's really the key.

When you take that mapping, it allows us to apply artificial

intelligence and machine learning in a variety of ways for root cost analytics.

But it also gives us the capability of dynamically filtering dashboards

that we provide to you out of the box.

Because from an observability standpoint, a lot of vendors very much

focus on custom dashboards.

Datadog, for example, beautiful dashboards, right?

But guess what? You got to set all that up yourself.

Right.

And you have to maintain seeing that yourself.

Say you go in and you change a namespace or a pod name, that can

break your entire dashboard.

Within Instana, we actually flip this around.

We still allow customization, but we ask organizations,

where does that customization make sense?

And instead, we want you to consume telemetry data.

So everything from end-user monitoring to infrastructure to application

dependency maps and API mapping is provided completely out of the box.

And this is one of the core differences between us and a lot of the other products

in the space is that we can genuinely begin solving problems within the first 10

minutes of an agent installation because these dashboards are auto-populated.

They're delivered to you, and it's a consumption philosophy.

Consume telemetry.

Don't build it.

So within our systems, we're applying everything from base

machine learning statistical models to agentic AI, to causal AI, to generative

AI, all the way down through the stack.

But when we're applying these against our data, we're trying trying to answer

a question that, honestly, I don't feel enough.

Both customers and other organizations providing these services are

asking, Where is the value?

Oftentimes, there's this race to just simply say, We have AI.

But no one is standing back and saying, What is this really doing

for me on a day-to-day basis?

How is this changing my life?

Jacob, you and I were talking about that earlier before we had a chance

to start filming about finding the value of AI, because this happened in hast.

Yes.

The AI boom happened, ChatGPT happened, and suddenly every company executive was

demanding, Well, we need to add AI to our products immediately.

And so we did.

And I was like, Okay, we spent a lot of money on that project.

Was any of it worth it?

It's important to find out if the AI juice is worth a squeeze.

And what we're seeing are predominantly three primary approaches to using

agentic large language models or generally useful artificial intelligence

into the context of an application.

The first is what we call it an add-on style, where you got an application today

and you want to add a chat experience to your conversation or your application.

At the end of it, the user who is using that application can just perform

the regular workload of the given application If they use

the chat experience, great. It's an add-on.

But they can continue to function and provide the value the company is

trying to deliver with as normally, as we see as an add-on.

Whipped cream on top of your ice cream.

It's great if it's there.

It's fine. No big deal.

That's an add-on style.

The next predominant approach that we see our customers use is another

one called a blocking style.

Now, this one's a little different.

This is where as part of the overall application workflow,

I am injecting a agent-based or a large language model solution into the work

stream of the given application or service.

Think of a clearinghouse for checking to see if there's fraudulent activity

for a given transaction of an application.

Now, you cannot complete the application workflow without going through a large

language model or a GenTech-based solution to do so.

Right.

If you see that particular, what is effectively a microservice that is

generating content, start to hallucinate bias,

be erroneous in the content that it's generating,

now you're interrupting the overall workflow of the overall application.

So it's not about latency, error rate, performance or throughput anymore as one

of the initial sets of golden signals.

Now it's about wrong.

If the large language model or the agents that are producing that content are wrong,

that is a new golden signal, the new representation of being down.

Because if you're looking to see, for example, let's just

make a hypothetical example.

You want to book a flight from, say, London to Madrid.

And as part of that response, you see a recipe for pineapple chocolate

chip cookies as the response to that.

It might sound interesting to taste, but that's not...

Regardless, that's not why a particular user went to that application.

They went to buy a flight. Right.

So if that particular system is now generating content that is erroneous

or faulty in some way, it is wrong.

And the user, what are you going to do if a flight reservation system gives you

a recipe for a weird tasting food?

You're going to I'm going to go to a different site.

I'm going to go to a different company.

So maybe if I need weird recipes, I'll come back.

But generally speaking-I'm going to go someplace self-sep.

Yeah, I'm done. Exactly.

So understanding we have a brand new technology That is involved in the overall

application, understanding what's going on,

what's being generated, what are the prompts,

what are the evaluations of those prompts, is a new era of observability

to understand as users are taking advantage of these applications,

what's going on inside that particular point of view.

The third area is another one that's more of an emerging characteristic where

the application is an agent and then agent is an application, and there's no

discerning differentiation whatsoever.

That's more of a four or five-ish year timeline,

but we're seeing a predominant the first two characteristics,

either as an add-on style or as a blocking style from the uses of artificial

intelligence in our customers' applications.

Yeah, especially in that initial rush, it was easy to bolt on a chatbot,

which is like what everybody did, and everybody hated the new chatbot.

Not all the time.

So there was one time that I was trying to do something on a website, and the

chatbot was like, Oh, can I help you?

And I was like, Actually, you might be able to.

And I was like, How do I return this item?

And it was like, Oh, here's our page for returning the item.

I was like, A chatbot was helpful?

Stop the presses.

Running a hack.

I love the new AI revolution. All right.

You showed me the returns.

But I mean, generally speaking, that was not great.

But the second use case takes a little longer to implement,

but makes a lot of sense. It does.

Especially if you're doing fraud detection or something else that requires

a little more context and processing.

But yeah, if it's wrong, and it's being consistently wrong,

that means your application is down. Correct.

Exactly. Correct.

And it's not the way to grab that and understand the from the

perspective of the application.

It's not through traditional metrics, events, logs, and telemetry data.

It's getting into what's being generated inside the agent or inside the large

language model, understanding what's being prompted in there,

what are the evaluations coming back, and understanding exactly what's

happening inside those assessments.

The problem that we're seeing now is it's great to provide that information,

but if you've You've got millions of application transactions

on a given day, that's dozens of millions of prompts and responses, and human

beings can't look through all of those.

You might be able to peck at some of them, but in aggregate, you need a computer

to help you figure that out.

Are you saying the solution to my AI problem is AI?

A set of analytics. Let's go that way.

Fair.

That's for monitoring the AI that other folks are using.

Correct.

Are Are you also leveraging AI more in the existing products to assist SREs?

Yes.

One of the things that we found often with Instana is it is so

comprehensive in the representation of what is going on and Why systems are

behaving the way that they're behaving.

When operators, site reliability engineers, developers,

IT operations teams, executives, want to understand

what's going on and why, that can be a little overwhelming in terms

of understanding exactly the answer to that question.

And when thousands of dollars per minute are on the line,

you need to answer that question as definitively as you possibly can with the

highest degree of confidence behind it.

So what we've done is to find a set of algorithms and analytics,

including artificial intelligence and generative AI and several different

large language models in an agentic-based solution.

So this isn't just one ring to rule them all.

This is the right type of analytic for the right type of data for the right

purpose and outcome that we're trying to accomplish is to look through all

of our insight and then showcase here.

Here is something that doesn't smell right.

Here is an issue or here is an incident that Instana has detected based on its

analysis of metrics, events, log data or telemetry data or

a combination of any of them, and then showcase to us as people,

Here's where you need to pay attention.

Here's the evidence that I found and gathered.

Here's the probable cause of the issue that I've identified.

Here's the explainability.

Here is the reasoning.

Hit a button to summarize it so that you can see it in plain English.

And then once you're agreeing with that particular hypothesis,

here are the suggested actions to take.

And by the way, if you need help or if you want to say, for example,

I agree with the initial hypothesis, but I want to ask some more questions.

Then it makes sense to introduce a dialog where you can interact with Instana

through chat experiences as well.

Let me continue the conversation from the preliminary hypothesis.

That's a differentiator, in my opinion, because there's a couple of competitors

out there that think that you should just sit in front of a dialog box in an open

text and say, Do I have a problem? Question mark.

Yes, you have a problem. That's a toy.

You don't have time for toys in operations.

You're working on tools to be able to solve what's going

on because seconds matter.

So this is more to continue the conversation with Instana's data set

in English and decide whether or not you agree with that preliminary hypothesis is

being formed by asking sub a roll of follow-on questions.

But we're giving you our homework first and allowing you from there to ask

additional probing questions.

And then once you're finally in agreement with it, the last but certainly not least

thing that we do is, here's the set of actions that we suggest that you do.

Sometimes there aren't any.

And in those circumstances, especially if it's early hours or off-duty

time when you're an operator or in cyberliability engineering,

you're trying to figure out what action to take, or I might remember what to do,

but not exactly the particular sequence or steps to do.

We have another layer of artificial intelligence there to say,

help me generate a action, either be it a knowledge base article or

a Python script or an Ansible playbook, so that I can take

the context of the incident, put it into a knowledge-based article or

take those individual knowledge-based article steps and turn them into Ansible

playbooks or Python scripts, where as an expert, I'm utilizing

the technology that says, yes, correct.

I can look through the sequence of the steps that in Asana

is helping me generate. As the expert, I see exactly that.

I wouldn't have remembered it during off-duty hours or three

o'clock in the morning.

Sleep inertia is a very real thing to be able to go from what?

Sleeping state to waking state.

It's about 20 some minutes.

If you're lucky, depending on how you went to sleep the night before.

But the idea is I'm going to help you accelerate and look

at that information and say, yes, I see exactly the steps that I wouldn't

have remembered to do in that particular sequence, but I can take my args, parms,

and keys that I know, put them into what the script was

generated, and my confidence is higher that what we're about to go do

is going to rectify the situation because as the expert, this helped me figure out

exactly the steps I needed to take to rectify the situation

as quickly as possible.

So it's very much a human.

It's still in the loop as part of the whole process.

Absolutely.

Much akin to the way a mathematicianian uses a calculator.

This is going to help you accelerate the math that you're trying to come up with.

But in this case, the math is the operational state

of this very sophisticated environment that's your business.

It's also touching on something that, honestly, being somebody who's out

in the field interacting with our customers pretty much every day,

it touches on a real concern that especially SREs have.

When you bring AI into the conversation, oftentimes you get the question of,

Is this thing designed to replace me?

There's a certain level of trust that has to be built into an AI system.

And one of the aspects that I think we've really done from, in my opinion,

just more of an ethical standpoint is, as Jacob said, we're very adamant

about keeping the human in the loop.

These are not systems that are designed to replace a human being.

They're designed to amplify their ability to solve problems, whether that's through

automatic generation of scripts or pointing them in the right

direction faster.

And when you look around the space, especially when you talk to various

business leaders, that's not a philosophy that's necessarily

shared across the board.

But when you're trying to engage an engineering team,

when you're trying to get them on board with your philosophy behind observability,

that's really the key.

They want to know that this thing is not going to put them out of a job.

And realistically, we shouldn't be putting human beings out of the job because

these systems, depending on which research you look at, even now with the latest

models, you're looking at anywhere between 3- 37% of hallucinations being generated.

So these systems, no matter how good they're getting,

they're still not 100% trustworthy.

And that requires that human being to validate the data.

And for us, that's a big reason why I'm very proud of the fact that anything we do

with AI, if you hit the investigate for me button with our agentic systems, it

doesn't just tell you, here's the problem.

It says, here's the problem, and here is exactly how I came to this conclusion.

It's about transparency.

And by creating that transparency, We have a much easier time coming

to market with our customers and building trust because they see the reasoning,

they see the factuality, they see that it's not inherently

designed to put them out of a job.

And that helps us accelerate our trust building exercise with these companies.

And as part of that trust, it extends into an understanding

of explainability and details in our user experience so that people know

whether or not they're looking at fact that's coming from the observed

environment or whether or not it was generated content from our analytics.

So throughout our experience, when there are generative AI capabilities,

it shows this is generated by generative AI.

Or when we talk about that we've identified probable root cause,

probable being the keyword there.

We don't know for sure.

We're not going to get ahead of our skis We want a human being to verify this as

an initial hypothesis, but here's all the reasoning

and explainability, as Drew mentioned, to help the user get to the conclusion.

Sure, if you want to sift through the entire data set that's

throughout Instana, Go ahead.

But we're saying this is a rapid accelerant to being able to get

to that initial hypothesis.

And if you agree with it, take the next steps so much faster than it

would be to try to diagnose and get everybody in the war room and point

sideways at everybody else.

They understand, not my problem, somebody else's, and get to that central

understanding of what's happening and why.

The keyword is transparency to me.

Being able to know where it got this information from, why it made up this line

of reasoning, and you being able to investigate.

But because ultimately, at the end of the day,

if the system is down, if the application is not responding,

it doesn't matter if you or the Instana AI agent made the change,

you're the one who gets blamed for it.

You're the You're the one on the hook and responsible for the health

and the well-being of the system.

You can't just blame it on AI and throw up your hands.

At the end of the day, it doesn't matter what automation tool

you're using, there is still a human being that will be held accountable.

Automation is amazing, and AI can definitely help.

But I love how Drew and Jacob stressed the importance of people

as part of the solution.

And speaking of people, big thanks to both Drew and Jacob

for being excellent humans and taking the time out of their busy

schedule to chat with me.

Also, thanks to IBM for sponsoring this video.

It's partnerships like this that allow me to keep producing quality content for you.

And of course, thanks to you for watching and sharing your thoughts.

If you enjoyed the video, please give it a like and a subscribe

if you think I've earned it.

Until next time, stay healthy, stay safe out there, and be

kind to your fellow humans. Bye for now.

KubeCon 2025: What's Next for Observability? #IBMPartner

Ned in the Cloud

84 days ago

28:42

Observability & Monitoring

Rank #1

Description

AI is changing everything and observability is no exception. Join me on for a deep-dive into modern observability with Drew and Jacob from IBM Instana. We kick things off with Drew’s journey from traditional data center engineering to discovering Instana and transforming how his team understood their applications. From monitoring gaps to Kubernetes complexity, he shares the real-world problems that pushed him toward automated, application-centric observability. The conversation then unpacks how Instana actually works: lightweight agents, automatic sensors, zero-touch Java instrumentation, real-time dependency mapping, and strict data-privacy design born from European compliance standards. We explore on-prem vs SaaS deployments, air-gapped environments, and Instana’s “Dynamic Graph,” which continuously maps your entire stack from infrastructure to services. Finally, we dive into where observability is heading in the age of AI. Jacob breaks down how companies are embedding AI into applications, what “wrong” outputs mean for system reliability, and why new telemetry signals—like prompts, drift, and hallucinations—are critical to monitor. The team also shares how Instana uses AI internally to accelerate root-cause analysis while keeping humans firmly in the loop. It’s a fast, insightful look at the future of keeping complex systems running smoothly. --- Timestamps: 00:00 Intro 01:40 How Drew got into observability 03:43 Discovering Instana 07:42 How Instana Works 10:52 Security and Data Privacy 15:17 To SaaS or not to SaaS? 19:26 AI in Observability 26:37 Observing AI Prompts 29:22 Using AI inside Instana 34:03 Human-in-the-Loop 37:59 Final Thoughts ⭐ CONNECT WITH ME 🏃🦖 🌐 Day Two Cloud: https://daytwocloud.io 🌐 Chaos Lever: https://chaoslever.com 🌐 Visit my Website ► https://nedinthecloud.com 🗳 Pluralsight ► https://app.pluralsight.com/profile/author/edward-bellavance 🐙 Find the code at GitHub► https://github.com/ned1313 🐧 Twitter ► https://twitter.com/ned1313 👨‍💼 LinkedIn► https://www.linkedin.com/in/ned-bellavance/ For collaboration or any queries: ned@nedinthecloud.com ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ 🌮 About Me 🌮 Ned is a curious human with a knack for creating entertaining and informative content. With over 20 years in the industry, Ned brings real-world experience to all his creative endeavours, whether that's pontificating on a podcast, delivering live instruction, writing certification guides, or producing technical training videos. He has been a helpdesk operator, systems administrator, cloud architect, and product manager. In his newest incarnation, Ned is the Founder of Ned in the Cloud LLC. As a one-man-tech juggernaut, he develops courses for Pluralsight, runs two podcasts (Day Two Cloud and Chaos Lever, and creates original content for technology vendors. Ned has been a Microsoft MVP since 2017 and a HashiCorp Ambassador since 2020, and he holds a bunch of industry certifications that have no bearing on anything beyond his exceptional ability to take exams and pass them. When not in front of the camera, keyboard, and microphone, you can find Ned running the scenic trails of Pennsylvania or rocking out to live music in his hometown of Philadelphia. Ned has three guiding principles: Embrace discomfort, Fail often, and Be kind.

Video Details

Category

Observability & Monitoring

Featured Date

December 10, 2025

Quality Rank

#1

AI Recommended