Shift Left in Action: Automating CIS Compliance With Kyverno and CNCF Power Too... Yugandhar Suthari | DailyDevLists

Loading video player...

Full Transcript

5,173 words • EN

All right, good morning everybody. Um,

thanks for joining. Um, so before we

start in I would like just to ask a

quick question like how many of you are

currently using like Kubernetes in your

organization as of today? Raise your

hands like okay pretty much most of us

right nice. So I think some of the

issues that I'm talking about would so

like if you've been using for quite some

time now probably you might be able to

uh sound familiar of these issues. So

the first one being let's say for

example when you're like let's say you

have a pipeline that is basically

tagging your images and all of those

right so let's say there could be some

cases where you are trying to like some

pipeline config is making changes like

tagging your image as latest and then

once you go into production right so

basically that's not the code that you

are expecting because it's a different

version altogether because the tag was

latest for example right or sometimes

you have an issue where let's say some

of your let's say uh you have deployed a

part and then it's crashing continuously

So the reason because when you look at

your describe right so basically it says

that there's no compute like let's say

resources available because I mean one

of the reasons that could be because

there are other parts in your class I

mean in your name space that are

basically using all of your compute like

memory or CPU and all of those within

your name space right that's one of the

reasons where there's no uh compute

available at all so because there's no

resource limit set on your in your pod

config that's where that's what is

happening right or sometimes you have

issues where your security team is

assigning you tickets right let's say

saying that hey you are using a

production like let's say non-root

you're using a root user in your pod and

then you need to fix that right or

sometimes you have images that are being

uh downloaded from your public uh uh

urls which you should not be doing that

so there's one ticket something like

that right so you have like different

issues that I was talking about right so

how can we and we are identifying all of

these issues once you're in production

right which means you're trying to after

deployment you're trying to fix those

issues and all which is basically all of

these are not like a what do you all

they're all common issues right they

could always be fixed ahead of time so

how can we do that is where I mean we're

going to be looking at shift left

approach in this session so where we

have an open source framework with all

of these policies and all of those so in

today's topic we're going to be covering

about what the problem is right let's

say why CS checks are required and then

how can we u let's say u use them to

basically like secure kubernetes

clusters and then we're also going to be

talking about an open source framework

that has all of these yes checks

automated and then we're going to with

Kano and then we're going to be seeing

how can we do that also we're going to

be looking at uh it's not not just

theory right we're going to be going

through a working demo where we're going

to spin up a cluster and then we're

going to try to automate it all the way

from zero to I would say like 99% under

15 minutes we're going to take a look at

that and then finally the metrics we're

going to look at an end toend pipeline

and then uh as well uh as part of this

coming to okay uh yeah let me introduce

myself. I am Yuganda Sutari. I spent

almost 18 years into IT and uh more than

six years into cloud security. I

currently work as a security engineer at

Cisco and uh yeah, I'm also a

contributor and reviewer on the Qo

project and Golden Cubastronet. So yeah,

I also actively blog at all

thingsincloud.com. Please check it out.

Yep, I think let's dive in. So coming to

Y CS benchmarks, right? So let's say so

when it comes to Kubernetes, right? So

it's basically a powerful but basically

not secure by default, right? It

basically provides building blocks for

your uh let's say platform engineers to

basically secure your cluster and all of

those using different options right so

basically so when it comes to CS

benchmarks so basically they provide us

with a standard baseline that basically

helps you secure your cluster right so

we're going to be looking at uh all of

those so when it comes to these uh

controls right so there's different

areas of focus let's say it could be

your control plane you have your worker

nodes there spot security there's roll

back so imagine all of these 64 policies

and all of those if you have to go

through each of them uh manually like

each of these clusters right so looking

at the scale right let's say for example

when you have uh thousands of clusters

right let's say it'll be hard for us to

uh verify all of these CS checks

manually or after deployment and all

it's going to be a problem so how can we

manage all of these at scale right so

that's where the solution comes that we

are taking is the shift left solution

that we have so which we are going to be

talking about that in detail

right so now coming to the shift

architecture right so here we have like

three-stage validations so plan time

validation right so at this stage so

most of the time when you are working

with uh kubernetes clusters right so

most of the time we generally either

using like open tofu or kubernetes so I

think most of the time infos has code

tools right so what if we could uh let's

say like instead of fixing the issues

that we are talking about right let's

say your eks cluster endpoints I think

most of the time you have a security

incident that says your eks cluster

endpoints are open to internet so you

have to fix those or so how about we can

fix all of these issues ahead of time

like or let's say audit is not enabled

and all of those right so we will look

at open tofu like we going to use open

tofu and then generally you have your

terraform plan right so you can

basically use that plan convert that to

a JSON file and then use node to

basically scan this JSON and then

basically tell it okay is this is the

blueprint plan that you have is this

compliant with my CS checks or is it

safe to be deployed right and then kerno

checks that plan and then you can

basically put that as a validation stage

step, right? So that way you're not

deploying anything that is unsecure to

your that's one that we're going to look

at that in detail. And then the next one

is deployment time, right? So now that

you have your cluster created, so the

next one is going to be um like how can

we like when you're when the developers

are deploying their workloads and all

right at runtime. So how can we block

all of those parts and all of those at

runtime. So and we have all of these

checks that we we're going to be loading

and then Q node does that during

admission process which we're going to

look at that as well in detail and then

uh we have runtime validations right so

when it comes to runtime validations you

have your cluster where you're running

like lot of compute nodes and all of

those so basically there are some checks

that we need to do in terms of like 13

checks CS checks that you have to look

at file validations file permissions or

audit configurations and all of those

which given cannot access but those

things I think I have a node scanner

that is basically deploy as a demon set

in your cluster that basically does

these checks and all of those but the

first two stages and all can be done

with given so we're going to be looking

at those and all of these are automated

in the open source project which we are

going to be looking at

all right so yeah so I thought I would I

know I think we had all the previous

sessions where we have we looked at the

basic steps but let me just quickly walk

through uh the Kubernetes uh flow right

so and then where does kibo come in this

so whenever you as a user right when I'm

doing my cubectl apply commands let's

say right when you're trying to create

or update your files right let's

basically uh the first thing that we do

is you send a request to your API server

and then you have the first one is you

have authentication authorization

happens and then your workload enters

into your admission process right

admission controller let's say and

that's where Qo comes in actually so in

there are two major components in your

admission controller one is a mutating

admission and the second one is your

validation admission so when it comes to

a mutation admission right so that's

this is where you app where Kano can

basically add in security defaults like

blocking like let's avoiding adding like

a non-root user or maybe could be in

terms of some annotations or labels that

are required and all of those. So you

can that it can take care of that as

part of the mutation admissions and all

of those and you have validation

admissions. So this is where actually

Kano enforces all of or blocks your

parts like if they do not meet your

policy and all of those right that's one

thing that we can do and then so how

does Kano so basically what Ko is doing

here is basically it's acts as a policy

guard for your cluster at the admission

process and making sure that let's say

and how does how it does is basically

it's a web hook based admission

controller and which includes you have a

ci admission controller which takes care

of all of those mutations validations

and all is taken care by admission

controller And then you have background

controller, right? Let's say you have

some of the workloads that are already

running. So to be able to check those,

you have this background controller

which we're going to be looking at that.

And there's cleanup controller as well

which basically cleans up any uh let's

say um workloads and all that are not

used anymore and all of those or expired

workloads. And then you have reports

which basically generates your uh CS

report like not CS reports but generally

your audit reports and all of that that

are required. And then once your

admission is complete so that's where it

enters into your uh HCD. So basically

what Kano is doing here is basically it

ensures that only your complaint secure

and policy aligned workloads enter into

your cluster. Right. So that's what Ko

is doing here.

All right. So now that we saw how like

where does Kono fall in uh like come

into picture in your Kubernetes flow.

Let's quickly take a look at the cluster

policies. Right. So let's I just took

one of the uh CS check policies that we

have to basically block the privileged

containers like some things that uh that

are very important here are which is

going to start with from the uh from the

first line right so your API version of

the kerno that you're using and then you

have the policy type which is basically

your cluster policy that's what I'm

doing here and then you have uh uh let

me quickly check and then yeah some some

things that that are very important here

is basically your validation failure

action like whether it's there's two

modes one is enforce And then you have

audit mode. Envos is where basically it

blocks and audit is it still allows but

basically it's uh you when you look at

the report you'll know that there's some

issue and all of those. We're going to

look at that in practice. Uh but and

when it comes to your cell policies

right so basically how does it evaluate

is on the left side if you see you have

I think um we are checking like let's

say object.spec.containers

and then dot you're checking if it has a

security context has a privilege uh

context and then is is set to true or

false. So basically your cuberno

verifies all of these uh like if you

look at the right side that's your pod

config. So it basically checks for

object that's a part object and then

spec.containers and then you have dot

security context dot your privilege. So

we're checking for has security context

has privilege and then like is it set to

true or false. So based on that your

during the admission process it

basically checks like whether it should

allow or deny this fraud and all of

those. So based on the mode as well like

enforce audit, right? That's what it's

doing here. And now uh yeah, so the

complete demo that we're doing, you can

scan this QR code for the open source

repository project. It has all of the CS

checks that are automated uh all of the

64 checks that we're talking about,

right? Runtime as well as plan time and

the node scanner. Uh so which are going

to be working walking through that. All

right let's

go on. Okay, I tried recording the

session as well. But let me walk you

through that.

All right, so this is our open source

project that I was talking about. Let me

just quickly walk you through uh the

repository architecture and stuff. So

starting with K8, right? This is where

you have all of the so I'm deploying a

node scanner. So which is basically as a

demon set to basically scan for the node

file permissions and all of those.

That's basically here. And then we have

open tofu uh here. So basically all of

the policies like let's quickly go

through with the open tofu right so this

is basically when you spin up a

kubernetes cluster and all of those you

generally are using kubernetes like

terraform or open tofu I'm just using

open tofu and then I purposefully I'm

deploying some of the non-compliant

parts here uh how we can like for

example you have no cloud watch logs or

no eks add-ons no network policies that

basically violates so which we're going

to see how ko can identify these uh

violations so I tried putting both of

them complaint and non-complaint just to

try it

And then we have policies. So this is

the important part. So all of the

policies that we were talking about

different types, right? Let's say

control plane, there's spot security,

there's arbback controls. So each of

them is basically um a policy and

they're all tested and then production

ready. So and like I said, they're all

using cell expressions as well to

validate all the checks and all of those

which you're going to be looking that in

runtime uh uh in in a minute. And then

you have your um

open tofu policies. So open tofu I think

this is more like you're generating a

plan and then you are converting it to a

JSON and then you're basically we going

to look at uh like when you're verifying

like how we write like what exactly the

policy that we are checking and all of

those like uh in like in real time like

probably in a minute that's one thing

and then you have your scripts here. I'm

just using these scripts to basically

test on the kind cluster. All of these

are integrated into the pipeline. So

when I'm doing so uh and then and all of

those checks are being done here and

then I also have created unit tests that

basically I'm testing it when I'm making

some changes into the cluster as well.

All the tests have like complaint and

non-compliant. So you can basically

integrate all of those uh within your

pipelines and all of those for your unit

testing or integration testing that can

be done with that. Right? So let's start

with uh

cloning. I just cloned this repository

before the session and then to just save

time and then I'm starting uh with the

practical demo. Right. So let's say

[clears throat] so the first thing that

I'm doing is just creating a client

cluster from scratch. So I already

created it just before like 5 minutes

before this session. So okay yeah so now

let's get started with the cluster

right. So you just have a Kubernetes

cluster that is up and running and let's

look check if there's any cluster

policies. So there's no cluster policies

like zero cluster policies right now.

And what we're going to be doing is

we'll basically create a a namespace and

then we're going to try to uh basically

uh deploy a non-compliant pod, right?

Which basically has recruit permissions.

Uh and then let's take a look at that.

Okay, it's still taking time, but so

we're going to be deploying this non uh

non-compliant pod and then uh install

kerno and then basically try to uh see

how it whether it blocks or identifies

that so and all.

So yeah, so if you see here, I think

there was no policies and all that are

being set up, but basically your pod is

still uh deployed without any issues

because there's uh no controls and all

in place. So that's part of that. So

which means you have like a uh an issue

that's basically in production. So now

when you have when we the phase two

we're going to be installing Kano. I'm

currently using 1.15.2 version. Uh so

let me just create I'm just installing

Kano. Uh so if you look at this right so

basically like we spoke about it

basically uh installs your given

admission controller there's background

controller cleanup and then reports and

all as you can see. Uh so like we spoke

about right basically you have your

admission controller which basically is

responsible for your mutations

validations and all of those which

basically and apart from that you can

also have your image verification and

scanning and all done as well like where

you can deploy from those and can be

taken care of those. So I'm just waiting

for the Kubernetes uh like ko parts to

come up. Right. So once we have this

kerno parts coming up, right? So what we

can do is after that we're going to see

uh like since we already have a

non-compliant part, we're going to check

if and I already have the background um

process as enabled to true. So basically

what it does is it also checks for any

issues with existing workloads as well

actually. And then it basically it

doesn't block it because it's already

running. So it basically puts it to the

rep to uh the repos. Uh let me just yeah

so here I'm just applying all of those

policies like all of the 49 CS checks

from the repository. Uh so let's give it

some time to basically load those

policies and all of these policies like

we saw right so basically are part of

your admission controller and then uh

that give users to basically validate.

Okay it's going to take some time. So

let's all right so yeah now that you see

you have all of those policies that are

deployed and now let's like once your

policies are deployed you can see that

I'm just trying to generate the report

reports so here you can see there's a

lot of uh failed so let's take a look at

one of the name space that we have so

this is the non-compliant part that we

deployed um and then you can see there

are a lot of checks that fa failed

basically some of them pass some of them

failed like and all of those so we can

probably use a describe command to

basically look at all of those policies

and all but let me so what we can do

what we're going to be doing is delete

the non-compliant part right and then

now let's try installing right so

basically previously it did allow and

now when we try to

uh run it right so basically you can see

it should have blocked that yeah so you

can see that there's an error um during

the admission process it will basically

say what what failed right let's say you

have your custom like 5.2 to that's

basically the CS check control number

but you have like denied privilege

containers are not allowed so you will

see like why it failed and all that

these are all part of the validation

messages that the user will get if

something is not complained right it's

basically blocking right and then I'm

also deploying a complaint pod which

basically shows you um oops uh like what

happened and all of so that's part of so

so as a phase one we just install like

non-secure parts then you're putting in

Qo into place and then you're applying

policies and then looking at whether

it's blocking or not. That's what we

did. And now moving on to the next

phase, right? Let's say we're that's

what that's about how you verify your

test at the deployment time, right? And

now we're going to be looking at uh plan

time tests. So for this, right? So let's

say I have my open tofu directory. So

that's what I'm just doing. So what we

generally do is we run tofu in it, which

basically generates like your

downloading your uh modules and then you

have tofu plan, which is basically

generating your plan.tf TF uh file and

then I'm just converting this to a JSON

file and then passing this JSON to Q no

actually and then saying hey check if

this uh cluster that I'm creating is

compliant or not. So that's what we're

going to check. So yeah you can see that

the plan is ready here. So with uh I'm

just deploying a non-compliant part.

We're going to check uh what happens.

Okay. All right.

I'm just generating a I'm looks like I'm

deploying both complaint and

non-compliant part. So you have like 18

resources that it's creating. All right.

So

all right

let's validate that. So what I'm the

only command that I'm running here is

basically kanojson and then scan the

policies for open tofu and then I'm just

passing it the tofu.plan that we

generally you will have like terraform

or open tofu can only do a tf plan just

convert that to JSON and then you can

basically use keon cla to basically uh

to scan your JSON file actually. And

then let's quickly take a look like you

can see that the plan says CS 5.1 failed

because for an IM authenticator failed

or there's a lot of checks that are

failing. Let's just take a look at how

we are comparing those right. So let me

go back to the repository. Um

so if you look at the off topu plan

so non-compliant let's go to

so this is your plan. So if you look at

the way we are validating is um

open to let's take one of the examples

right let's say if you look at the

terraform policies right let me just see

I'm just trying to see like the pretty

uh is it command shift P Uh

so what it does is basically it checks

for every like when you look at the plan

right so you basically have u uh the

value and then it basically checks for

okay let's say

n complent

okay so the way we check for that is

basically within the plan it checks for

the planned values dot it goes to the

root module and then the resources

within it and then let's say for the EKS

add-on right let's say for example it

checks for whether the VPC is set like

what values and all of those like

similarly when you for example for the

private cluster that we were looking at

right so it basically checks if the uh

endpoint is enabled like is false or

true and all of those and that's how it

basically validates uh your kubernetes

and all like yeah

and then it basically tells like whether

it's complaint or not so right so that's

what we're doing with kerno and then

let's quickly go to

All right. Okay. So that's with your

plan time test. So that's how you can

basically validate your uh plan time

test using open tofu and kerno. And now

moving on to the last one which is

basically runtime. Right. So we have

like a node scanner that I'm basically

deploying it as like a um uh demon set

within your cluster which basically

checks for all of your um u controls.

Okay. It's going to take some time but

so while you are waiting right so like

like like I said like we have all of

these uh plan time tests that we're

doing and then you have runtime test

with Qo and then the last one is

basically a node uh node scanner what so

QA cannot validate the file permissions

and all of those on the node. So

basically we using a node scanner which

is a a script that basically deploys a

demon set and all of those. Uh let me

just I'm just waiting for it to come up.

Yeah. So with that right so basically

you are basically performing everything

from scratch. We just created a kind

cluster deployed a non-complent pod

installed kerno and then verified if

it's blocking or not and then deployed

all of our policies like the complete CS

check policies and now let's quickly

look into and with that you have a comp

compliant cluster actually from end to

end from scratch. So now how can we

integrate all of these into a pipeline.

I just tried creating a GitHub actions

pipeline and whatever we just saw like

manually one by one I've tried to

integrate all of those into um uh like

GitHub actions. So you can basically

uh see I just ran it yesterday I

believe. Yeah. So all of these tests

that we looking at it right so basically

they're all integrated within your

pipeline like installing CLI and then

verifying your unit test and all of

those. And similarly with your plan

test, I'm just running your open TOFU

convert to JSON use cubode to basically

validate all of those. And then you have

your kind cluster integration that I've

done to basically check all of my CS and

all of those. And uh looking at the

reports, right? So you basically have

like a and in order to validate all of

my checks like whether it's working

correctly or not, I also have integrated

with Trivy and then using Triv C scanner

to basically check like the complaint

report and all of those with these.

Actually there are some failures I think

which uh it basically shows but that's

how you basically can uh do it from uh

end to end actually with all of the

reports and all. So

yeah I think so that's where I think the

cluster and then yeah any questions?

>> Yeah. Yeah, it looks like you're using

open tofu with the policies, but then

you're using to scan the policies and

apply it against the TF plan,

>> right?

>> But vero itself can write the policies.

Why not just have policies in term that

>> right so right so policies are again see

when when you talk about policies right

so basically you're talking about one is

uh CI check policies which you can write

in kerno as well but these are all like

let's say uh based on your terraform

plan that you're generating it's a

different what do you call uh I'm just

creating a valid like you can write

validation I'm even like with u open to

right I'm just writing a val kerno

validation policy itself so if you look

at one of these uh checks that we have

like let's quickly go through right I'm

also writing a kibo policy it's not like

a open tofu I'm not writing policies

with open tofu just to make it clear

open tofu is only used to generate the

plan and then I'm writing those checks

like if you look at this repository

come on yeah let's look at open tofu and

then I'm basically writing a kerno

validation policy actually uh

>> oh you're not sorry

So if you look at uh policies

you have open tofu and then you have

your let's quickly go to networking

right. So all of these are again open

tou I mean like kono policies like you

can see that it's basically still using

that kono API v1 alpha version with kind

validation policy and these are and your

open to cannot generate these right so

you you are basically I'm just writing

those actually I'm only using open tofu

to plan

>> so those are just policies

>> right

>> and then open what

>> open tofu is more like a terapform right

infrastructure as code tool that is

basically used to basically provision

your cluster and all of those

>> just gives you a

>> yes it just gives JTF plan.

>> So if I took the TF plan and I ran scan

against it, I don't really do open.

>> Yes. You don't need open. If you already

have the plan output plan, then you

don't need it. Yes.

>> Yeah. Um

>> yeah, you can please share your

feedback. Actually, you can scan the QR

code and then please give me feedback on

that. So yeah. Yeah. Any questions?

All

right. Thank you.

>> [applause]

Shift Left in Action: Automating CIS Compliance With Kyverno and CNCF Power Too... Yugandhar Suthari

CNCF [Cloud Native Computing Foundation]

1 day ago

25:13

DevSecOps & Security

Rank #1

Description

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands (23-26 March, 2026). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io Shift Left in Action: Automating CIS Compliance With Kyverno and CNCF Power Tools - Yugandhar Suthari, Cisco True shift-left security means catching compliance issues before they reach production. This session shows how to build a CNCF-native CIS compliance automation framework using Kyverno as the policy orchestrator, with kube-bench and OpenTofu for full-spectrum validation—from infrastructure plans to running clusters. Learn how CEL-based ValidatingPolicies enable unified plan-time and runtime checks, how to integrate OpenTofu for automated IaC scanning, and how kube-bench closes the loop with node-level audits. Live demos will showcase how this framework enforces CIS benchmarks across EKS, AKS, and GKE—proving that early, automated security validation truly scales.

Video Details

Category

DevSecOps & Security

Featured Date

November 25, 2025

Quality Rank

#1

AI Recommended