Loading video player...
In this video, we’ll build a simple but secure CI/CD pipeline for your
applications deployed in AWS EKS from scratch.
Let’s say we have an application repository. When we make any source code change,
it will trigger a GitHub Actions pipeline.
This pipeline will first increment your Git tag—let’s say from 0.1.0 to 0.1.1—and
use that tag to build and publish a Docker image to an ECR repository.
On the EKS side, we will deploy ArgoCD, which will monitor for any new images.
Once it detects one, it will perform a rolling upgrade on your application and carefully roll out
a new version while ensuring that each instance is healthy before proceeding to the next pod.
To authenticate GitHub Actions with AWS, we’ll use an OpenID Connect provider and dynamically
obtain temporary credentials—so no more static secrets stored anywhere on the GitHub side.
For the ArgoCD to authenticate with ECR, we’ll use pod identity to link
the ArgoCD Kubernetes service account with an AWS IAM role.
Now, to authenticate GitHub Actions to publish our Docker images to ECR, we only need to specify the
AWS region and the IAM role in our AWS account—no need to use any environment variables anymore.
On the AWS side, we’ll create an OpenID Connect provider with
the GitHub URL to establish trust and allow GitHub to authenticate with AWS.
At this point, potentially any GitHub user or repository can assume our role.
To fix this, first we need to create a regular IAM policy that defines permissions.
This policy would allow pushing images to only specific ECR repositories in our account.
And finally, we need to create an IAM role with a trust policy that would
only allow specific organizations or specific repositories on GitHub to assume this role.
In this example, we define the principal,
which is an entity that can use that role, and it’s going to be GitHub.
Next, we limit access to this role to only specific repositories.
We can use wildcards; we can whitelist entire organizations;
or we can just limit it to specific repositories.
In this case, only two of my repositories would be able to assume this role: frontend and backend.
Now, how does ArgoCD actually work? Let’s say
we have a Kubernetes cluster with ArgoCD and Image Updater deployed.
Then, we have an application repository and a Kubernetes repository where we keep all our
Kubernetes deployment files, such as deployments, services, ingresses, etc. And finally, we have ECR
. So,
when you initially deploy ArgoCD to your cluster and point it to the Kubernetes repository, it will
start synchronizing and applying all missing Kubernetes resources, such as our application.
The default interval is 3 minutes, but you can adjust it—just keep in mind that
GitHub might throttle your requests if you set it to a very low value.
Next, if we want to release and redeploy a new version of our application,
we’ll make changes in the source code, commit them, and push to the remote application repo.
GitHub Actions will then release a new version,
build the Docker image, and push it to the ECR repository.
Meanwhile, in the background, we’d set the ArgoCD
Image Updater to periodically check for new image tags in ECR.
The default poll interval is about 2 minutes.
So, when ArgoCD detects a new image, it will create a new file in the Kubernetes repository.
Then, ArgoCD will detect a drift between the Kubernetes state and the Git repo,
apply the changes, and release the new version.
In the worst case, it will take up to 5 minutes to redeploy a new version of the application.
Let me run a quick demo: when we create a new commit or just merge a pull request to the main
branch in the application Git repository, GitHub Actions will pick up this change,
build and publish a new Docker image to the ECR repository. On the other hand, ArgoCD will
detect that image, create a new commit, and perform a rolling upgrade of your application
in about 3 minutes on average, up to 5. If you want to learn more about Ingress,
Let's Encrypt, integration with AWS secrets, and many other common use cases,
I have a full course here on YouTube. Also, if you need any help with your infrastructure,
you can always reach out to me, and you can find more information in the video description.
Alright, let’s get started. First, you need to clone my public repository—link in the video
description as well. Here, you’ll find mostly Terraform code to provision all the necessary
infrastructure for our pipeline. First of all, we need to create an S3 bucket that
we’ll use to store Terraform code. In the global folder, you’ll need to change a few variables.
First of all, update the AWS region. Then, you need to create a unique S3 bucket across all AWS,
so pick a random name. A common pattern is to use your domain as a prefix, and then list all
your GitHub repositories that need access to the ECR. First, your username or org name,
and the repo name exactly how it’s structured on GitHub. So, for example, app-lesson-268
will contain the application source code. Next, update the same parameters in the
S3 backend partial configuration. First, we’ll create the S3 bucket,
and we no longer need a DynamoDB table to lock the Terraform state. Now,
in the backend configuration, keep it commented out since we don’t have an S3 bucket yet.
In the IAM, we’ll create an OpenID Connect provider to link with GitHub
Actions, an IAM policy and role with only access to our ECR repositories.
For example, here you’ll only list your actual ECR repositories for which you want to grant access.
In the IAM role, you would list only the GitHub repositories that need access to ECR.
And finally, we need to create ECR repositories, and optionally, you can create a policy to
remove old Docker images. Alright, let’s go ahead,
switch to the S3 bucket folder, and create the S3 bucket. First, initialize Terraform.
And then apply and supply global variables. By the way,
all the commands are available in the README file. It’s done. Now we have an S3 bucket. And for now,
this bucket is empty, as you would expect, since we used the default local Terraform backend.
This Terraform state file is stored locally as a JSON file, which is fine if you’re testing
something, but it’s almost impossible to work in a team with that approach because anyone
who runs Terraform needs access to it. What we want to do is upload that file to the S3 bucket.
Uncomment the backend section, and as you can see, we have empty region and bucket values,
which will be derived from the state.config file. Let’s go ahead and migrate the local
state to S3 by using this command. Now, in the S3 bucket, you’ll find
the same folder structure and the Terraform state as we have locally with our project.
Next step: to create the necessary IAM components, let’s switch to that folder
and initialize the state as well. After it’s done, apply Terraform.
Now we have an OpenID Connect provider,
an IAM policy with only access to selected ECR repos,
and finally, an IAM role which only GitHub Actions
and our own repositories can assume. I’m allowing all branches to use this role,
but you can restrict it even more. And the principal is GitHub Actions.
The last step here is to create ECR repositories. Switch to ECR, initialize the state, and apply.
I keep it separate just to simulate a real-world project, and you would have a
high-level abstraction script or Terragrunt that can apply all of your infrastructure in one step.
So now we have 2 ECR repositories. In this tutorial, I’ll be using only the first one.
And in the Terraform S3 bucket, we have 3 components so far: S3, IAM, and ECR.
Alright, now we’ll switch to the application repository, which you’ll find in the main repo
under this tutorial; it’s called app-lesson-268. And this is the source code for the application.
I have a version.txt file just for the demo to show the rolling upgrade—we’ll
modify this file from the CI/CD pipeline. And a simple Go application with two endpoints,
one of which reads the version file and returns it to the client. It’s a very simple application.
Then we have a multistage Dockerfile. And finally, the GitHub Actions pipeline.
You only need to change the region and the ARN of the role with your account ID.
Now we need ID token write permission for the OpenID Connect provider and
contents write to create a new tag in GitHub. We also have a "release a new version" step,
which takes the current Git tag and increments it based on your policy—either major, minor,
or patch version. Then we write this version to the txt file, and the final
step is to build and upload this image to the ECR repo. It’s a very simple pipeline,
which you can adjust based on your needs. And we have a script to fetch the Git tag
and increment the version. You can run it with 3 arguments,
possibly in different GitHub Actions workflows. And at the end of the script, we write that new
tag to the GitHub environment variable to make it available to other steps in the workflow.
Alright, so this is the application repository already on GitHub with no Git tags so far.
Now we just want to test and verify that our GitHub Actions pipeline
works and uploads to ECR. Let’s create an empty commit and push it to GitHub.
So you can see it’s already running; you can find it under the Actions tab as well.
The important step is to configure AWS credentials when GitHub Actions tries to assume the role. If
you misconfigured something, it will get stuck on this step with some kind of error. But in
our case, it was able to assume the role and is already building a Docker image. There is also
a new tag, so if the repo does not have a tag, this script starts with v0.1.0 and increments it.
Let’s wait until the actions finish the build and upload the Docker image. Okay,
looks good from the GitHub Actions. We have a new tag v0.1.1, and we should
have exactly the same tag in the ECR. So at this point, we verified that we can build,
release a new version, and upload the Docker image. Next, let’s create the VPC and EKS.
I’m going to switch to the original repo with Terraform, and under dev,
you can find the VPC folder. Don’t forget to update the region
and change the S3 bucket in the tfvars file, as well as to update the state.config file.
Now, under VPC, it’s a typical setup with a static Elastic IP address for the NAT gateway,
with two private and two public subnets with some tags specific to the EKS cluster.
Also, if you use a different region, don’t forget to replace the availability zones here
as well. And we have the partial backend config with a different key. By the way, Terragrunt
automatically generates the backend and follows the folder structure for the Terraform state. If
you like automation, take a look at Terragrunt. Let’s switch to the VPC folder
and initialize the state. And then apply it. This time,
we need to provide two environment files: one common for the entire dev environment, and the
second only specific for the VPC component. Alright, it’s done. Let’s take a look at the
VPC components in AWS. We have the VPC itself.
4 subnets. 3 route tables.
Internet gateway. Elastic IP address.
And finally, NAT gateway. That’s all for the VPC; next is EKS.
Under EKS, you’ll find the bare minimum for the tutorial: a policy for the load balancer
controller, ArgoCD values to run ArgoCD without TLS (which we never need inside Kubernetes),
and the most important config is for the image updater. You need to
carefully review and replace all values, such as the region and your account number. If your
CI/CD pipeline won’t work, it’s most likely because of this. And we use Pod Identity to
allow the image updater to assume the role. Next, we have the ArgoCD Helm deployment. In
the data.tf file, we dynamically retrieve VPC output variables, which is a common
pattern to split a large Terraform state into smaller, more manageable pieces.
Then we have the EKS itself. Don’t use anything smaller than large for
now—you might not be able to deploy all the pods. Then we have the image updater with a policy and
the role. And as you can see, we use Pod Identity to bind its Kubernetes
service account with the IAM role in AWS. Load balancer controller: it’s kind of optional,
but I need this for the demo. Then we have nodes, the Pod
Identity add-on, and that’s pretty much it. Let’s switch to EKS and initialize Terraform.
Next, let’s apply it, and this time we have dev and EKS variables that we supply
to Terraform. It takes maybe up to 20 minutes to create a cluster and deploy all components.
When the cluster is ready, we need to update our local Kubernetes config.
And we can check if we can access EKS. So far, so good. If you need to debug,
you most of the time need to tail the controller logs and image updater logs for permission errors.
Let me open a new tab and get an admin secret that was automatically generated
when we deployed ArgoCD with Helm. And we can port forward to access the ArgoCD UI.
It’s going to be available on localhost:8080,
with the username "admin" and the password from the Kubernetes secret.
Alright, so far we don’t have any applications deployed using ArgoCD yet.
Now let me switch to another repository, which is called k8s-lesson-268; it contains only
Kubernetes deployment files. When using ArgoCD, you can only use Helm charts or Kustomize for
the image updater to work, but with FluxCD, for example, you can use plain YAML files. I
personally don’t like to create Helm charts for each application unless it’s generic and can
work with many of your applications. I prefer Kustomize, which in my opinion is simpler and
still provides enough flexibility to adjust or modify your deployment in different environments.
So this base folder will be shared between all your environments. In the deployment,
don’t use namespace, and make sure you update the ECR image. Then we have a Kustomize file
with all the Kubernetes resources that we want to deploy. Namespace—don’t worry, we’ll override
this in each environment. And finally, service of type load balancer. We’re using IP mode,
and it’s internet-facing so I can run a demo. Now, under the envs folder, we have similar
environments: prod, which is empty for now, and dev. Under dev,
we have my-app with custom variables that we can override based on the environment, for example,
namespace, count, and maybe ingress hosts. We can also specify the initial version
that we want to deploy. So far, we have only the v0.1.1 image tag. And finally,
the reference to the base folder. For now, we’ll use the App of Apps pattern,
and in the next video, we’ll convert this to ApplicationSet and discuss the benefits of doing
so. So in this folder, you would place ArgoCD application resources for each app you want to
deploy. Don’t forget to update the image name and GitHub repository, and that’s all we need.
So it’s already on GitHub. But now we need to somehow grant ArgoCD running in our EKS
cluster access to clone this repo, and the most common approach is to use deploy keys.
Let’s go ahead and generate a key pair locally for ArgoCD, and keep the passphrase empty.
Alright, first get the private key. And we need to create a Kubernetes secret with
this key. It’s in the main repository with Terraform under the k8s folder.
Just replace this key. By the way, I have a full ArgoCD course on YouTube as well,
where I show how to seal these secrets. And replace the Git URL, which is very important,
but the secret name can be an arbitrary string. So this is on the EKS side;
now we need to get a public key and upload it to GitHub. It’s under settings, deploy keys.
Let’s call it ArgoCD and paste the key. We also need write access since ArgoCD
will make a new commit with each new image tag, but you can opt out and use a Kubernetes secret
instead. But in my opinion, this approach is better, especially if you need to quickly
restore your EKS from scratch—you get all your current versions committed.
But in the case of a secret, they will be lost. So we can start deploying applications to EKS. So
in the main repo, you’ll find the apps application that would apply all your applications in the dev
environment. You only need to replace the Git URL. The first step is to create a secret. Then apply
the apps application. And in a few seconds, you’ll see that ArgoCD will start to deploy applications.
So now we have the v0.1.1 version deployed. In EKS, you’ll find that your applications
should be running in the custom namespace that you set in the Kustomize file.
Now for the demo, I need the public load balancer to be ready; let’s make sure it’s
in active state first. It usually takes a few minutes for the load balancer to be ready. Also,
let’s take a look at the target group. Since we’re using IP mode, the target group will consist of
Kubernetes pods' IP addresses. Now let me check if the DNS for the load balancer is ready as well.
Yep, I can resolve it to the public IP address. Now I can use curl to hit my application’s version
endpoint. Alright, so far we have the same v0.1.1 version. Let me run this in a loop so
you can see that the application was upgraded. Now let’s switch to the application repo and
create another empty commit to trigger the CI/CD pipeline. So like I mentioned before,
it takes up to 2 minutes to detect a new image and up to 3 minutes to sync the state with GitHub.
Now while we wait, let’s see if we have a new version in Git first. So GitHub Actions
is running, which is a good sign, and it’s almost finished already since it’s a tiny image. In Git,
we have a new tag v0.1.2. And let’s see if we have it in ECR as well.
If you don’t want to wait, you can just click refresh and it will sync immediately,
but I’ll wait instead. So in about 3 minutes, ArgoCD deployed a new version, and I can
confirm it from the application endpoint. When you use the Git write-back method,
ArgoCD, after it detects a new image, creates a new commit to Git, and that’s the file. You can
find it under dev and your application folder. Now if for some reason it didn't work,
you may want to tail the controller logs to see if there are any problems with your pipeline.
Also, you can check the logs for the image updater, which is responsible for monitoring
ECR repositories, and if you misconfigured permissions, you’ll find errors in those logs.
Create a Secure Kubernetes CI CD Pipeline. 👨💼 Mentorship/On-the-Job Support/Consulting - https://calendly.com/antonputra/youtube or me@antonputra.com 👀 Check out recommendations from my previous clients - https://www.linkedin.com/in/anton-putra/details/recommendations/ ▬▬▬▬▬ Experience & Location 💼 ▬▬▬▬▬ ► I’m a Senior Software Engineer (13+ years of experience) ► Located in San Francisco Bay Area, CA (US citizen) ▬▬▬▬▬▬ Connect with me 👋 ▬▬▬▬▬▬ ► LinkedIn: https://www.linkedin.com/in/anton-putra ► Twitter/X: https://twitter.com/antonvputra ► GitHub: https://github.com/antonputra ► Email: me@antonputra.com ▬▬▬▬▬▬▬ Timestamps ⏰ ▬▬▬▬▬▬▬ 0:00 Intro 1:15 GitHub & AWS Authentication 2:42 ArgoCD Workflow 4:22 Demo 5:11 Creating S3 Bucket, IAM Role, and ECR Repositories with Terraform 10:23 Creating a GitHub Actions CI/CD Pipeline 13:37 Creating an AWS VPC with Terraform: Subnets, Gateways, and More 15:14 Creating an AWS EKS Cluster with Terraform: IAM Roles, Node Groups, and More 18:16 Create Kubernetes CI/CD Pipeline ▬▬▬▬▬▬▬ Source Code 📚 ▬▬▬▬▬▬▬ ► Original Source Code: https://github.com/antonputra/tutorials/tree/268/lessons/268 #aws #eks #kubernetes