Loading video player...
My name uh Yosi Wiseman. I'm a principal
security researcher at Microsoft. Um
Michael Kachinsky is my partner in this
uh research. Unfortunately, Michael
couldn't attend so I will represent both
of us
and this is the agenda for today. Um we
will talk about default
misconfigurations. Uh so we will start
with an overview of default
misconfigurations.
Then we will talk about um a real world
incident that actually started our
entire research. Then we will talk about
how did we identify misconfigurations in
large scale by using scanner we
developed. Uh then we will talk about
containerized AI applications and we
will share statistics and we will talk
about how can we secure our environments
from this uh type of risks. Okay. So
let's start.
So let's start by talking about
misconfigurations.
So in the past um many cloudnative
applications weren't very secured by
default and in general security wasn't
always a first priority and let's see
some examples that maybe some of you are
familiar with. Uh so first the first the
first example is Helm. As many of you
know, uh in its earlier versions, Helm
came with a serverside component called
Tiller. Uh this component had permissive
clusterwide permissions in order to
manage the deployments in the cluster.
And Tiller didn't enforce
authentication. So any pod in the
cluster could talk to it. So if an
attacker got even a limited access to
the cluster, uh they could leverage
tiller to get um full control.
It was a serious it was a serious uh
security risk and that's why in the
later versions uh tiller was removed and
helm was changed.
Another example is the old Kubernetes
dashboard which was a web interface to
manage the cluster. Depends on the exact
version. There were several um sometimes
it came without authentication. Also in
this case once you had initial access in
the cluster you could use the dashboard
for privilege escalation and in some
cases users even exposed the dashboard
to the internet uh which really helped
to the attackers in those cases and the
last example is couplow which is a very
popular framework for building ML
pipelines ML pipelines and kubernetes
um and here also in the past
installations of this framework didn't
include built-in authentication
and we saw quite many instances of coupe
flow back then uh that were exposed to
the internet without authentication and
were quickly exploited including in a
very large scale campaign that we
observed.
And what's common to all of those
applications
uh was that for one hand they operated
with very high privileges but on the
other hand they did not offer built-in
authentication and that's why workload
owners were basically only one mistake
from being compromised.
And here are some screenshots of blogs
and reports um of active attacks that
originate in misconfigurations.
Now cloud native applications became
more secure over time. There is no doubt
about it. Uh the example that we just
saw don't really apply anymore. But
apparently that such issues still exist.
And what if I told you that you can
still find even more severe cases?
Applications that are completely
exploitable even with no mistake from
the user side. Workloads that can be
exploited just by installing them with
the default credential with the default
settings.
And this is our focus for today. Cloud
native applications that are
misconfigured out of the box.
Okay. So let's talk shortly very briefly
about Helmcharts. defaults. Uh I assume
that all of you are familiar with Helm,
the package manager uh in Kubernetes.
And what's important for us uh for this
session is that Helm charts come with
default values uh which are the default
settings of the deployment that the
maintainer uh configured. If you don't
override it, uh that's going to be the
configuration of your workloads in it's
in the values.yamel file.
And sometimes as we are going to see
these default values have surprises.
Okay. So let's see real world examples.
Right.
So just before we talk about our first
finding, let's take a second to ask
what turns a misconfiguration into an
exploitable misconfigurations because
there are many misconfigurations. But
what turns a misconfiguration into an
exploitable one?
Usually, it's a combination of two
things. The first is internet exposure
and very weak authentication or no
authentication at all or as we just saw
in the news recently.
Uh yeah. Uh so misconfigurations are
basically everywhere. Uh so now let's
move on to the first finding.
Okay.
So we'll talk about Apachino. In late
2024, we identified suspicious network
activity on multiple Kubernetes
clusters. Um the activity was in the
same time. It was the same. It was
identical activity. So we understood
that there is a connection between the
clusters. And we found that they are all
running Apachino.
Apachino is a data store that is used
for low latency analytics and for
quering in real time. Okay. Uh we said
interesting.
We dived into it and what we saw is that
all of those clusters expose the
controller of Apache Pino externally by
a load balancer. So the controller is
the main brain of the framework. It
handles tables, schema and the overall
orchestration.
Uh and it makes sense that if someone
got access to the controller, they can
take they can take full control
over the application. Um but the
question is why so many clusters expose
it? and even if it's exposed, how
malicious actors can take advantage of
it.
So let's see how you deploy Apachino in
Kubernetes. So in the official
documentation, Apachino refer to the
Helm chart in the official GitHub repo.
And let's see what happens when you
install the Helm chart with the out
ofthe-box settings.
You can see that we get two load
balancer services. One for the
controller that we just talked about and
one for the broker which is a component
uh that handles the queries.
That's the section from the Helm chart
that is in charge for this exposure. And
as you can see, if you don't explicitly
disable it, the Helm chart will deploy a
load balancer service.
And the interesting part is that there
are no default authentication
mechanisms. So if you use the official
Helmchart with default configuration and
you don't override it, you simply open
your data store uh to the to everyone in
the internet.
Here you can see the controller
dashboard. Uh you can control the
application, you can query the data, you
can do basically whatever you want. And
that's exactly the pattern uh that we
discussed about earlier. exposure to the
internet plus weak or non-existing
authentication.
So what can we learn from this uh first
case? Uh so first default
misconfigurations that still are still a
thing. Uh still there are popular
applications that come severely insecure
by default and second detecting those
issues at scale is sometimes hard.
Understanding configuration of each and
every workload is difficult. Uh when you
manage hundreds of applications across
multiple clusters um it's really
challenging to really identify uh such
cases when they occur.
Okay. So at this point we had to choose
uh we could either write a blog post
about the finding or go down the rabbit
hole and start digging through more Helm
charts on GitHub. Uh and you can
probably understand uh what we decided.
Yeah. So we decided to look for
exploitable misconfigurations in large
scale. Uh so we scanned uh both GitHub
and artifact looking through YAML charts
and anything else that might expose
services. We searched for manifests that
use load balancer or node port.
Uh then we sorted the result by
popularity to prioritize the ones that
really uh matter in the wild. And
finally we deploy the most interesting
workloads in a test cluster to check the
exploitability.
So first we took the naive approach. Uh
we just started to look for YAML that
expose load balancers in GitHub code
search. Uh this isn't enough of course
because you cannot really sort the
results based on the number of stars and
popularity. So we couldn't understand
what's interesting to what is not. Uh
then we tried to apply some more
filters. Uh we searched specifically for
the values file of hem charts. Uh we
also filtered some noisy patterns like
test and tutorial environments. Uh but
still we were limited. Uh GitHub code
search shows only up to 1,000 results
and are much more potential
repositories.
So we decided to write a script that
split the GitHub queries to small
chunks. Uh for example, one thing that
we did was split the queries based on
the port numbers. Um so we searched for
load balancers with port numbers that
start with one then port numbers that
start with two and so on. Uh and in that
way we could collect much more results.
Uh then we sorted the results based on
the number of stars to identify the
interesting repos that are actually
popular.
So here are the results. So that's what
we did. We scanned more than 12,000 Helm
charts in almost 7,000 unique
repositories. We then deployed and
tested more than 300 popular
applications that received at least 500
stars.
And here are some key insights.
So the first thing is that we still see
popular applications with hard-coded
default credentials. If they are not if
they're not explicitly modified, uh they
go all the way to production.
Some applications still come out of the
box without any authentication mechanism
at all. like we saw before uh even a
simple one and some applications allow
self-chestration uh we are going to see
an example shortly.
Now those are some severe
misconfigurations uh that come out of
the box. Now real world data shows that
user rely heavily on default on default
configurations. Uh so it makes this
issue so dangerous.
Okay. So now let's deep dive into two
examples uh that we found. Uh we'll
start with measury. Uh measury is an
open source tool that lets user visually
design cloudnative applications. Uh with
measurery you can uh you get a cool drag
and drop interface uh that let you
create template of resources, share it
with others and deploy it on your
cluster and also you can deploy services
from cloud providers and integrate it
there. So deploying measury in
Kubernetes uh it means that you can
manage manage the resources uh in the
cluster. So for installation measuries
documentation refer to the helm chart uh
they created
and our GitHub scanner noticed that the
measury application is exposed by
default with a load balancer and could
be accessed from outside the cluster.
So when you access to the external IP
address you reach to this login page.
So far it seems okay. Maybe default
internet exposure is not ideal
especially not for management
interfaces. Uh but at least we have
authentication. The thing is that by
default when you press sign up you can
simply create a new user and then log
into the application.
And when you log in you are greeted by
this dashboard that contains all the
information about the cluster and the
deployed applications.
And when we navigate to the drag and
drop interface, we can deploy a new
container in the cluster. So here we
created a pod that shared the P and
network name spaces with the host. And
when we click deploy,
a new pod is running in the cluster with
our configuration.
Uh here you can see the permissions of
the application. So not surprisingly, uh
measury has cloud ad uh has cluster
admin privileges. So once you get access
to the dashboard, you can do essentially
anything in the cluster.
Uh you can also deploy measury by using
CLI that they provide in addition to the
Helm chart that uh we saw also with the
CLI we see that we see the same issue.
Uh and the application is exposed by
default to the internet and the lowest
registration.
So to summarize, gaining access to the
measure dashboard gives you effective
effect effectively uh cluster admin
permissions and depends on the
application on the cluster, it might
also allow lateral movement to different
workloads.
Okay, another example uh is Prometheus
and Graphana. I guess that you're all
familiar with uh both of them. So,
Prometheus is an open source monitoring
system that collect and stores metrics.
Uh, it gathers data from different
sources and uh stores it in time series
database. Um, and Graphana turns this
data into real time interactive
dashboards. So, Prometheus does the
collecting and Graphfana makes it uh
human friendly.
So while we uh while we looked at the
results from the GitHub scanner uh we
came across an an interesting chart. Um
this Helm chart under the Prometheus
community repo uh deploys Graphana and
Prometheus.
And when we deployed it, it seemed like
the password to the Grafana dashboard is
stored inside the Kubernetes secret.
And when we looked at the Kubernetes
secret, we were surprised to see that it
seems like the password to Graphana
isn't such a big secret. Uh you can see
here the hardcoded password
and as expected the out of the box
configuration in the values file of the
Helm chart included the hard-coded
secret.
Uh you can see here prom operator.
Uh actually there is already a pull
request to fix it. just sitting there
waiting to be murdered.
So by searching uh on Shan we can find
tens of thousands of exposed Prometheus
and Graphana instances. By design
Prometheus assumes it's running in a
trusted internal network. So it does not
require authentication by default.
Graphana does require a password by
default. But even if only very small
portion of the instances are
misconfigured, it's still thousands of
exposed services.
Okay,
so we talked about default
misconfigurations
and we talked about the scanner we
developed and now like any other session
probably in KrypCon, uh we are going to
talk about AI.
So first it's already clear that
Kubernetes became the underlying compute
of AI applications. Uh we see it in real
world data. Uh and we can also see it
here in this survey um by CNCF.
And
so at the beginning of the of the
presentation uh we talked about the
early days of Kubernetes if you remember
uh in which security wasn't always the
first priority. And in AI applications
we actually see um a quite similar
pattern. Organizations are in a race to
implement and integrate AI solutions.
And that speed often comes at the
expense of security.
um it's just faster to integrate AI
applications without thinking too much
about security and we are going to see
some examples.
So first MCP servers. So MCP probably
many of you are familiar with it. Uh MCP
is a protocol that allow AI agent to
discover and invoke tools. Um which can
provide access to external data sources
or capabilities. Uh you can run MCP
servers locally or remotely. Uh and we
see many MCP servers that are running in
Kubernetes. Uh MCP supports
authorization. It supports oathflow.
Um so tools can be invoked securely um
in the context of the user. But MCP does
not enforce it. Um, and wrong
implementation of MCP servers,
especially remote MCP servers, can lead
to severe consequences.
If the MC MCP server has access to
sensitive information and isn't properly
designed, users can remotely access to
that server and invoke the tools. And
here are some things that we saw. uh
remote MCP servers without authorization
that seem to expose data from ticketing
systems
uh HR data and also uh private code
repositories
and in general uh we see a large number
of containerized a applications which
are deployed in misconfigured way
um it's not always a default
misconfiguration sometimes users just
skip basic security principles simply
for convenience like we said um to get
things deployed fast um or avoid dealing
with some complex setups uh and let's
see a few example that we see in the
wild things that we actually see uh so
we have seen exposed MLflow instances uh
this allows anyone to manage MLflow
workload including expeditary data
upload malicious models and more
uh we have seen users that expose MJI
instances which is a data pipeline
platform as you can see in the
screenshot. MJI allows you to run shell
actually inside its container. Uh and
this container has a privileged service
account mounted to it. So once you have
access to MJI, you basically um you
basically cluster admin. Up until
recently uh the Helm installation of MJI
exposed the application by default to
the internet without authentication. Um
and after we reported the issue, they
immediately added um authentic
authentication by default. uh which is
better.
Another example is K agent which is a
platform for building AI agents
Kubernetes uh that automate cloudnative
tasks. Uh by default K agent is
accessible only internally which is
good. Uh but we see cases in which users
manually expose K agent externally to
the internet for convenience. uh the uh
since the out of the box installation
of K aent doesn't come with
authentication exposing K aent to the
internet means cluster takeover and
potentially also lateral movement.
Um in such cases attackers can also
steal the API keys that are used by K
agent to connect to the AI services like
OpenAI or Azure Foundry.
Another example uh that we saw is Nvidia
rag blueprint deployments which we have
observed exposed to the internet um as
well as Google ADK assistant uh
instances. So there are these are just
few examples of misconfigurations we
actually see in the wild and this
demonstrates how sometimes in the race
to implement AI in the organization
security is just left behind.
Okay. Um so now let's talk about
statistics um of what we saw.
So we performed a large scale analysis
of thousands of Kubernetes clusters and
that's what we saw.
40% uh 40% of the clusters has at least
one external facing load balancer.
20% of the clusters are exposed by a
Helm chart that creates a new B a new uh
load balancer and 3.5% of the exposed
applications also have a privileged
service account attached to them. Those
are maybe the most critical workloads.
Um now the fact that they're exposed
does not mean that they're
misconfigured. Yeah, it could be fine.
Uh but in case of a misconfiguration
they can lead to cluster takeover.
So let's talk about mitigations and
detections. As we said uh before
exploitable misconfigurations are
usually a combination of internet
exposure and uh weak authentication.
Let's begin with internet exposure. We
should always map the external surface
of our organization. Obviously some
applications should be internet facing
and it's by design and it's fine. But
sometimes this exposure is actually a
misconfiguration and there is an
interesting pattern uh that we see over
and over. Users manually change the
service configuration from cluster IP to
load balancer often for debugging and
testing purposes and just keep it as is.
You can identify such cases by
monitoring the Kubernetes audit log and
see when users change the service
configuration.
And if someone changed the service type
from cluster IP to load balancer, it's
something that you want to check at
least to check.
Now let's talk about authentication in
cloud native applications. Uh this is
often configured in the application
manifests. So look for problematic
settings like allowing anonymous access
and also look for usage of default
credentials or easy to guess ones.
And here are some community tools that
can help us to harden the cluster and
prevent those risks. Uh first KNO and
OPA open policy agent both allow you to
author policies to get deployment to the
cluster. You can use them to prevent
deployments of uh such misconfigurations
like we saw. Um and then we have cublin
which you can integrate to your CI/CD
pipeline to identify misconfigurations
and risky configuration before uh they
are deployed to the cluster. and truffle
hog which is also a useful tool that can
identify secrets in your uh you can also
integrate it to the CI/CD pipeline to
identify secrets uh before they reach to
the cluster.
Here's an example from taffle hog which
ident identify secrets and another
example is applying policy to recreation
of load balancers. Uh this example is
from kivero.
Okay. So as we said uh it's difficult uh
always to um to identify the
exploitation of such misconfiguration
and it's difficult to have a generic
rule that can identify such
exploitations
but still uh we have noticed something
interesting.
So many containers have predictable
activity and it's not new and it's not
surprising. Containers usually run a
single application and therefore behaves
quite consistent over time. But turns
out that most of them are very
consistent. Uh our large scale analysis
showed that by average containers are
running 4.5 different processes
and less than 1% of the containers are
running new process after their initial
bootstrap. So by tracking the behavior
of containers for example by EB by using
EPF uh EBPF tools we can try to identify
anomalies in the containers activity and
potentially identify exploitation uh of
misconfigurations.
And here are some known open source
projects that can help us uh to do it.
Uh Tracy from Aqua, Falco from Cystic,
Tetragonon and Inspect Gadget.
Um here is an example of inspector
gadget. Uh you can use it to track the
running processes in the container to
find such anomalies.
Uh and you can also use inspector gadget
to monitor the DNS query uh to the
containers of the containers. Uh so in
the bottom half we see a container that
access mining pool. Um for this
demonstration it's just by using
nsookup. And in the upper window you can
see that um you can see the actual DNS
request in the gadget that shows uh this
activity.
Um as we just said many containers are
very consistent. Um so for many of them
uh any abnormal or unusual activity
should raise a flag.
Okay. So let's start to wrap up what we
have seen today.
First, while cloud native security
definitely improved over the years, we
can still find quite many issues
including popular applications.
Second, what we seen in real world data
is that this issue affects organization
in all sizes from small companies to
large enterprises.
Everyone get exploited by
misconfigurations.
As we have seen throughout the session,
in some cases the misconfigurations come
out of the box. It's not even about the
mistake that somebody does. Uh sometimes
it's just relying on the default
settings. And that's why it's crucial to
really understand what we deploy and not
just trusting the default configurations
that come out of the box.
We have seen cases in which
misconfigurations were exploited in less
than 10 minutes from the deployment. Uh
so that makes it so dangerous because
the exploitations could be so fast.
And lastly, more than 50% of the attacks
that we see against containerized
applications originate in
misconfigurations and not software
vulnerabilities.
So even if your organization has the
best vulnerability scanner um with 100%
coverage and even if you automatically
block vulnerable applications, still you
only did half of the job because
misconfigurations are actually in charge
of more than 50% of the exploitations of
the active exploitations.
Okay, so uh we saw the default
misconfiguration still happen uh in 2025
and I hope that uh the session gave you
some insights about this problem and uh
hopefully also some tools to mitigate
the risk. Uh thank you very much. I'll
be down here to take questions and talk
to you. Thank you.
Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands (23-26 March, 2026). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io You Deployed What?! Data-Driven Lessons on Unsafe Helm Chart Defaults - Yossi Weizman, Microsoft Most breach post-mortems start with “Which CVE?” However, ours usually end with “There wasn’t one.” We analyzed 10 B Kubernetes audit events and scanned over 3000 clusters to map compromise paths that rely solely on insecure defaults shipped by default in widely trusted Helm charts. The pattern is painfully consistent: world-reachable Service/Ingress, authentication set to “off by default,” and a pod that have permissions to go wild. We’ll chain those three defaults against Apache Pinot, Selenium Grid and Meshery all without a single vulnerability. To flip the script, we’ll walk through hardening the same workloads using existing community tools like OPA Gatekeeper, Kyverno, Pod Security Admission, and GitHub Actions to enforce guardrails before someone in your organization is going to deploy an "official" Helm chart.