You Deployed What?! Data-Driven Lessons on Unsafe Helm Chart Defaults - Yossi Weizman, Microsoft | DailyDevLists

Loading video player...

Full Transcript

4,001 words • EN

My name uh Yosi Wiseman. I'm a principal

security researcher at Microsoft. Um

Michael Kachinsky is my partner in this

uh research. Unfortunately, Michael

couldn't attend so I will represent both

of us

and this is the agenda for today. Um we

will talk about default

misconfigurations. Uh so we will start

with an overview of default

misconfigurations.

Then we will talk about um a real world

incident that actually started our

entire research. Then we will talk about

how did we identify misconfigurations in

large scale by using scanner we

developed. Uh then we will talk about

containerized AI applications and we

will share statistics and we will talk

about how can we secure our environments

from this uh type of risks. Okay. So

let's start.

So let's start by talking about

misconfigurations.

So in the past um many cloudnative

applications weren't very secured by

default and in general security wasn't

always a first priority and let's see

some examples that maybe some of you are

familiar with. Uh so first the first the

first example is Helm. As many of you

know, uh in its earlier versions, Helm

came with a serverside component called

Tiller. Uh this component had permissive

clusterwide permissions in order to

manage the deployments in the cluster.

And Tiller didn't enforce

authentication. So any pod in the

cluster could talk to it. So if an

attacker got even a limited access to

the cluster, uh they could leverage

tiller to get um full control.

It was a serious it was a serious uh

security risk and that's why in the

later versions uh tiller was removed and

helm was changed.

Another example is the old Kubernetes

dashboard which was a web interface to

manage the cluster. Depends on the exact

version. There were several um sometimes

it came without authentication. Also in

this case once you had initial access in

the cluster you could use the dashboard

for privilege escalation and in some

cases users even exposed the dashboard

to the internet uh which really helped

to the attackers in those cases and the

last example is couplow which is a very

popular framework for building ML

pipelines ML pipelines and kubernetes

um and here also in the past

installations of this framework didn't

include built-in authentication

and we saw quite many instances of coupe

flow back then uh that were exposed to

the internet without authentication and

were quickly exploited including in a

very large scale campaign that we

observed.

And what's common to all of those

applications

uh was that for one hand they operated

with very high privileges but on the

other hand they did not offer built-in

authentication and that's why workload

owners were basically only one mistake

from being compromised.

And here are some screenshots of blogs

and reports um of active attacks that

originate in misconfigurations.

Now cloud native applications became

more secure over time. There is no doubt

about it. Uh the example that we just

saw don't really apply anymore. But

apparently that such issues still exist.

And what if I told you that you can

still find even more severe cases?

Applications that are completely

exploitable even with no mistake from

the user side. Workloads that can be

exploited just by installing them with

the default credential with the default

settings.

And this is our focus for today. Cloud

native applications that are

misconfigured out of the box.

Okay. So let's talk shortly very briefly

about Helmcharts. defaults. Uh I assume

that all of you are familiar with Helm,

the package manager uh in Kubernetes.

And what's important for us uh for this

session is that Helm charts come with

default values uh which are the default

settings of the deployment that the

maintainer uh configured. If you don't

override it, uh that's going to be the

configuration of your workloads in it's

in the values.yamel file.

And sometimes as we are going to see

these default values have surprises.

Okay. So let's see real world examples.

Right.

So just before we talk about our first

finding, let's take a second to ask

what turns a misconfiguration into an

exploitable misconfigurations because

there are many misconfigurations. But

what turns a misconfiguration into an

exploitable one?

Usually, it's a combination of two

things. The first is internet exposure

and very weak authentication or no

authentication at all or as we just saw

in the news recently.

Uh yeah. Uh so misconfigurations are

basically everywhere. Uh so now let's

move on to the first finding.

Okay.

So we'll talk about Apachino. In late

2024, we identified suspicious network

activity on multiple Kubernetes

clusters. Um the activity was in the

same time. It was the same. It was

identical activity. So we understood

that there is a connection between the

clusters. And we found that they are all

running Apachino.

Apachino is a data store that is used

for low latency analytics and for

quering in real time. Okay. Uh we said

interesting.

We dived into it and what we saw is that

all of those clusters expose the

controller of Apache Pino externally by

a load balancer. So the controller is

the main brain of the framework. It

handles tables, schema and the overall

orchestration.

Uh and it makes sense that if someone

got access to the controller, they can

take they can take full control

over the application. Um but the

question is why so many clusters expose

it? and even if it's exposed, how

malicious actors can take advantage of

it.

So let's see how you deploy Apachino in

Kubernetes. So in the official

documentation, Apachino refer to the

Helm chart in the official GitHub repo.

And let's see what happens when you

install the Helm chart with the out

ofthe-box settings.

You can see that we get two load

balancer services. One for the

controller that we just talked about and

one for the broker which is a component

uh that handles the queries.

That's the section from the Helm chart

that is in charge for this exposure. And

as you can see, if you don't explicitly

disable it, the Helm chart will deploy a

load balancer service.

And the interesting part is that there

are no default authentication

mechanisms. So if you use the official

Helmchart with default configuration and

you don't override it, you simply open

your data store uh to the to everyone in

the internet.

Here you can see the controller

dashboard. Uh you can control the

application, you can query the data, you

can do basically whatever you want. And

that's exactly the pattern uh that we

discussed about earlier. exposure to the

internet plus weak or non-existing

authentication.

So what can we learn from this uh first

case? Uh so first default

misconfigurations that still are still a

thing. Uh still there are popular

applications that come severely insecure

by default and second detecting those

issues at scale is sometimes hard.

Understanding configuration of each and

every workload is difficult. Uh when you

manage hundreds of applications across

multiple clusters um it's really

challenging to really identify uh such

cases when they occur.

Okay. So at this point we had to choose

uh we could either write a blog post

about the finding or go down the rabbit

hole and start digging through more Helm

charts on GitHub. Uh and you can

probably understand uh what we decided.

Yeah. So we decided to look for

exploitable misconfigurations in large

scale. Uh so we scanned uh both GitHub

and artifact looking through YAML charts

and anything else that might expose

services. We searched for manifests that

use load balancer or node port.

Uh then we sorted the result by

popularity to prioritize the ones that

really uh matter in the wild. And

finally we deploy the most interesting

workloads in a test cluster to check the

exploitability.

So first we took the naive approach. Uh

we just started to look for YAML that

expose load balancers in GitHub code

search. Uh this isn't enough of course

because you cannot really sort the

results based on the number of stars and

popularity. So we couldn't understand

what's interesting to what is not. Uh

then we tried to apply some more

filters. Uh we searched specifically for

the values file of hem charts. Uh we

also filtered some noisy patterns like

test and tutorial environments. Uh but

still we were limited. Uh GitHub code

search shows only up to 1,000 results

and are much more potential

repositories.

So we decided to write a script that

split the GitHub queries to small

chunks. Uh for example, one thing that

we did was split the queries based on

the port numbers. Um so we searched for

load balancers with port numbers that

start with one then port numbers that

start with two and so on. Uh and in that

way we could collect much more results.

Uh then we sorted the results based on

the number of stars to identify the

interesting repos that are actually

popular.

So here are the results. So that's what

we did. We scanned more than 12,000 Helm

charts in almost 7,000 unique

repositories. We then deployed and

tested more than 300 popular

applications that received at least 500

stars.

And here are some key insights.

So the first thing is that we still see

popular applications with hard-coded

default credentials. If they are not if

they're not explicitly modified, uh they

go all the way to production.

Some applications still come out of the

box without any authentication mechanism

at all. like we saw before uh even a

simple one and some applications allow

self-chestration uh we are going to see

an example shortly.

Now those are some severe

misconfigurations uh that come out of

the box. Now real world data shows that

user rely heavily on default on default

configurations. Uh so it makes this

issue so dangerous.

Okay. So now let's deep dive into two

examples uh that we found. Uh we'll

start with measury. Uh measury is an

open source tool that lets user visually

design cloudnative applications. Uh with

measurery you can uh you get a cool drag

and drop interface uh that let you

create template of resources, share it

with others and deploy it on your

cluster and also you can deploy services

from cloud providers and integrate it

there. So deploying measury in

Kubernetes uh it means that you can

manage manage the resources uh in the

cluster. So for installation measuries

documentation refer to the helm chart uh

they created

and our GitHub scanner noticed that the

measury application is exposed by

default with a load balancer and could

be accessed from outside the cluster.

So when you access to the external IP

address you reach to this login page.

So far it seems okay. Maybe default

internet exposure is not ideal

especially not for management

interfaces. Uh but at least we have

authentication. The thing is that by

default when you press sign up you can

simply create a new user and then log

into the application.

And when you log in you are greeted by

this dashboard that contains all the

information about the cluster and the

deployed applications.

And when we navigate to the drag and

drop interface, we can deploy a new

container in the cluster. So here we

created a pod that shared the P and

network name spaces with the host. And

when we click deploy,

a new pod is running in the cluster with

our configuration.

Uh here you can see the permissions of

the application. So not surprisingly, uh

measury has cloud ad uh has cluster

admin privileges. So once you get access

to the dashboard, you can do essentially

anything in the cluster.

Uh you can also deploy measury by using

CLI that they provide in addition to the

Helm chart that uh we saw also with the

CLI we see that we see the same issue.

Uh and the application is exposed by

default to the internet and the lowest

registration.

So to summarize, gaining access to the

measure dashboard gives you effective

effect effectively uh cluster admin

permissions and depends on the

application on the cluster, it might

also allow lateral movement to different

workloads.

Okay, another example uh is Prometheus

and Graphana. I guess that you're all

familiar with uh both of them. So,

Prometheus is an open source monitoring

system that collect and stores metrics.

Uh, it gathers data from different

sources and uh stores it in time series

database. Um, and Graphana turns this

data into real time interactive

dashboards. So, Prometheus does the

collecting and Graphfana makes it uh

human friendly.

So while we uh while we looked at the

results from the GitHub scanner uh we

came across an an interesting chart. Um

this Helm chart under the Prometheus

community repo uh deploys Graphana and

Prometheus.

And when we deployed it, it seemed like

the password to the Grafana dashboard is

stored inside the Kubernetes secret.

And when we looked at the Kubernetes

secret, we were surprised to see that it

seems like the password to Graphana

isn't such a big secret. Uh you can see

here the hardcoded password

and as expected the out of the box

configuration in the values file of the

Helm chart included the hard-coded

secret.

Uh you can see here prom operator.

Uh actually there is already a pull

request to fix it. just sitting there

waiting to be murdered.

So by searching uh on Shan we can find

tens of thousands of exposed Prometheus

and Graphana instances. By design

Prometheus assumes it's running in a

trusted internal network. So it does not

require authentication by default.

Graphana does require a password by

default. But even if only very small

portion of the instances are

misconfigured, it's still thousands of

exposed services.

Okay,

so we talked about default

misconfigurations

and we talked about the scanner we

developed and now like any other session

probably in KrypCon, uh we are going to

talk about AI.

So first it's already clear that

Kubernetes became the underlying compute

of AI applications. Uh we see it in real

world data. Uh and we can also see it

here in this survey um by CNCF.

And

so at the beginning of the of the

presentation uh we talked about the

early days of Kubernetes if you remember

uh in which security wasn't always the

first priority. And in AI applications

we actually see um a quite similar

pattern. Organizations are in a race to

implement and integrate AI solutions.

And that speed often comes at the

expense of security.

um it's just faster to integrate AI

applications without thinking too much

about security and we are going to see

some examples.

So first MCP servers. So MCP probably

many of you are familiar with it. Uh MCP

is a protocol that allow AI agent to

discover and invoke tools. Um which can

provide access to external data sources

or capabilities. Uh you can run MCP

servers locally or remotely. Uh and we

see many MCP servers that are running in

Kubernetes. Uh MCP supports

authorization. It supports oathflow.

Um so tools can be invoked securely um

in the context of the user. But MCP does

not enforce it. Um, and wrong

implementation of MCP servers,

especially remote MCP servers, can lead

to severe consequences.

If the MC MCP server has access to

sensitive information and isn't properly

designed, users can remotely access to

that server and invoke the tools. And

here are some things that we saw. uh

remote MCP servers without authorization

that seem to expose data from ticketing

systems

uh HR data and also uh private code

repositories

and in general uh we see a large number

of containerized a applications which

are deployed in misconfigured way

um it's not always a default

misconfiguration sometimes users just

skip basic security principles simply

for convenience like we said um to get

things deployed fast um or avoid dealing

with some complex setups uh and let's

see a few example that we see in the

wild things that we actually see uh so

we have seen exposed MLflow instances uh

this allows anyone to manage MLflow

workload including expeditary data

upload malicious models and more

uh we have seen users that expose MJI

instances which is a data pipeline

platform as you can see in the

screenshot. MJI allows you to run shell

actually inside its container. Uh and

this container has a privileged service

account mounted to it. So once you have

access to MJI, you basically um you

basically cluster admin. Up until

recently uh the Helm installation of MJI

exposed the application by default to

the internet without authentication. Um

and after we reported the issue, they

immediately added um authentic

authentication by default. uh which is

better.

Another example is K agent which is a

platform for building AI agents

Kubernetes uh that automate cloudnative

tasks. Uh by default K agent is

accessible only internally which is

good. Uh but we see cases in which users

manually expose K agent externally to

the internet for convenience. uh the uh

since the out of the box installation

of K aent doesn't come with

authentication exposing K aent to the

internet means cluster takeover and

potentially also lateral movement.

Um in such cases attackers can also

steal the API keys that are used by K

agent to connect to the AI services like

OpenAI or Azure Foundry.

Another example uh that we saw is Nvidia

rag blueprint deployments which we have

observed exposed to the internet um as

well as Google ADK assistant uh

instances. So there are these are just

few examples of misconfigurations we

actually see in the wild and this

demonstrates how sometimes in the race

to implement AI in the organization

security is just left behind.

Okay. Um so now let's talk about

statistics um of what we saw.

So we performed a large scale analysis

of thousands of Kubernetes clusters and

that's what we saw.

40% uh 40% of the clusters has at least

one external facing load balancer.

20% of the clusters are exposed by a

Helm chart that creates a new B a new uh

load balancer and 3.5% of the exposed

applications also have a privileged

service account attached to them. Those

are maybe the most critical workloads.

Um now the fact that they're exposed

does not mean that they're

misconfigured. Yeah, it could be fine.

Uh but in case of a misconfiguration

they can lead to cluster takeover.

So let's talk about mitigations and

detections. As we said uh before

exploitable misconfigurations are

usually a combination of internet

exposure and uh weak authentication.

Let's begin with internet exposure. We

should always map the external surface

of our organization. Obviously some

applications should be internet facing

and it's by design and it's fine. But

sometimes this exposure is actually a

misconfiguration and there is an

interesting pattern uh that we see over

and over. Users manually change the

service configuration from cluster IP to

load balancer often for debugging and

testing purposes and just keep it as is.

You can identify such cases by

monitoring the Kubernetes audit log and

see when users change the service

configuration.

And if someone changed the service type

from cluster IP to load balancer, it's

something that you want to check at

least to check.

Now let's talk about authentication in

cloud native applications. Uh this is

often configured in the application

manifests. So look for problematic

settings like allowing anonymous access

and also look for usage of default

credentials or easy to guess ones.

And here are some community tools that

can help us to harden the cluster and

prevent those risks. Uh first KNO and

OPA open policy agent both allow you to

author policies to get deployment to the

cluster. You can use them to prevent

deployments of uh such misconfigurations

like we saw. Um and then we have cublin

which you can integrate to your CI/CD

pipeline to identify misconfigurations

and risky configuration before uh they

are deployed to the cluster. and truffle

hog which is also a useful tool that can

identify secrets in your uh you can also

integrate it to the CI/CD pipeline to

identify secrets uh before they reach to

the cluster.

Here's an example from taffle hog which

ident identify secrets and another

example is applying policy to recreation

of load balancers. Uh this example is

from kivero.

Okay. So as we said uh it's difficult uh

always to um to identify the

exploitation of such misconfiguration

and it's difficult to have a generic

rule that can identify such

exploitations

but still uh we have noticed something

interesting.

So many containers have predictable

activity and it's not new and it's not

surprising. Containers usually run a

single application and therefore behaves

quite consistent over time. But turns

out that most of them are very

consistent. Uh our large scale analysis

showed that by average containers are

running 4.5 different processes

and less than 1% of the containers are

running new process after their initial

bootstrap. So by tracking the behavior

of containers for example by EB by using

EPF uh EBPF tools we can try to identify

anomalies in the containers activity and

potentially identify exploitation uh of

misconfigurations.

And here are some known open source

projects that can help us uh to do it.

Uh Tracy from Aqua, Falco from Cystic,

Tetragonon and Inspect Gadget.

Um here is an example of inspector

gadget. Uh you can use it to track the

running processes in the container to

find such anomalies.

Uh and you can also use inspector gadget

to monitor the DNS query uh to the

containers of the containers. Uh so in

the bottom half we see a container that

access mining pool. Um for this

demonstration it's just by using

nsookup. And in the upper window you can

see that um you can see the actual DNS

request in the gadget that shows uh this

activity.

Um as we just said many containers are

very consistent. Um so for many of them

uh any abnormal or unusual activity

should raise a flag.

Okay. So let's start to wrap up what we

have seen today.

First, while cloud native security

definitely improved over the years, we

can still find quite many issues

including popular applications.

Second, what we seen in real world data

is that this issue affects organization

in all sizes from small companies to

large enterprises.

Everyone get exploited by

misconfigurations.

As we have seen throughout the session,

in some cases the misconfigurations come

out of the box. It's not even about the

mistake that somebody does. Uh sometimes

it's just relying on the default

settings. And that's why it's crucial to

really understand what we deploy and not

just trusting the default configurations

that come out of the box.

We have seen cases in which

misconfigurations were exploited in less

than 10 minutes from the deployment. Uh

so that makes it so dangerous because

the exploitations could be so fast.

And lastly, more than 50% of the attacks

that we see against containerized

applications originate in

misconfigurations and not software

vulnerabilities.

So even if your organization has the

best vulnerability scanner um with 100%

coverage and even if you automatically

block vulnerable applications, still you

only did half of the job because

misconfigurations are actually in charge

of more than 50% of the exploitations of

the active exploitations.

Okay, so uh we saw the default

misconfiguration still happen uh in 2025

and I hope that uh the session gave you

some insights about this problem and uh

hopefully also some tools to mitigate

the risk. Uh thank you very much. I'll

be down here to take questions and talk

to you. Thank you.

You Deployed What?! Data-Driven Lessons on Unsafe Helm Chart Defaults - Yossi Weizman, Microsoft

CNCF [Cloud Native Computing Foundation]

1 day ago

27:25

Kubernetes & Container Orchestration

Rank #1

Description

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands (23-26 March, 2026). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io You Deployed What?! Data-Driven Lessons on Unsafe Helm Chart Defaults - Yossi Weizman, Microsoft Most breach post-mortems start with “Which CVE?” However, ours usually end with “There wasn’t one.” We analyzed 10 B Kubernetes audit events and scanned over 3000 clusters to map compromise paths that rely solely on insecure defaults shipped by default in widely trusted Helm charts. The pattern is painfully consistent: world-reachable Service/Ingress, authentication set to “off by default,” and a pod that have permissions to go wild. We’ll chain those three defaults against Apache Pinot, Selenium Grid and Meshery all without a single vulnerability. To flip the script, we’ll walk through hardening the same workloads using existing community tools like OPA Gatekeeper, Kyverno, Pod Security Admission, and GitHub Actions to enforce guardrails before someone in your organization is going to deploy an "official" Helm chart.

Video Details

Category

Kubernetes & Container Orchestration

Featured Date

November 25, 2025

Quality Rank

#1

AI Recommended