Loading video player...
So I'm Dan Shuain. Uh I hope you are
having a good time till now with few
good sessions out there. And today I'll
be talking about another session which
is about uh how and what terapform will
look like when you will be going to
scale it. So we all know that today it's
not about that okay whenever you will
going to work for a company as a devops
or as s engineer. So you will have a
module to manage or you will write few
lines of code just like how we do while
loading. So that's not how things work.
Usually it's a huge infrastructure. When
you get in, you get your credentials,
you get your accesses and then you have
to take care of entire the infra how it
look like where are the modules, how
many teams are involved. So at times
whatever we have learned sounds very
easy. However, whenever we go into the
production things look more tricky and
different. So hence my topic is more
about how we can manage uh I would say
not 100 plus exactly but many modules if
you are managing as a part of a team
wherein you are working as a devops
engineer or an S sur
I prepared that what are the key
pointers I'll covered in my quick
session so one starting will be uh
giving the problem what exactly my
context will be on what topic I will be
discussing then we'll discuss about the
scale and complexity furthermore we'll
discuss about what is how our
architecture approach will look like and
what are the best practices we can go
ahead and do. Governance of course comes
as one of the key topics when we talk
about this thing. Apart from that uh one
more thing which we'll be discussing is
about what are the typical tools or you
can say the automation things which we
can take care of while doing this. Okay.
So uh I'll start with uh if you see the
slide. So of course you all know that
okay and a very layman teleform is
nothing just about it's it's a tool it's
help us to manage or deploy resources on
any cloud platform or any other uh
platform wherein whatever we are doing
manually we are doing via code so people
say it's a kind of an automation people
say it's how uh we can reduce human
errors it's how how we can scale because
we need not to write everything again
again or we need not to do do everything
on click ops thing because that may
prone to human errors. So now whenever
the the code strength increases be it
let's say if you are a developer also
you might see that okay uh when you
write a lot of code uh things may not go
in the way you want it. Hence it's
always wise and how you are managing it.
So you need to ensure that not only the
technical part is important but also how
you are keeping track of how things
should work and move.
Okay. So uh when we say in a very small
scale setup let's say we are a team of
few people and we having small
infrastructure or let's say we are
managing client which is not having a
huge terapform infrastructure it's quite
easy to manage there are few modules
there are few environments and there are
few resources and you are deploying it
consistently regularly or maybe managing
it or shuffling it as and when required.
However, there are chances or there are
cases wherein you are working for a
company, you are working for a team XYZ.
There are other teams also which are
together managing the same
infrastructure. They have been named
differently that okay these guys are
from platform these guys are from S,
these guys are from uh automation or
they do the scripting and all these
things. So now whenever there are
multiple people involved in different
different uh teams so the count may of
course go to a certain level plus there
are many problems that come along with
it. I would not say problem in the first
place but I would say many complexity
comes along with so few challenges which
everyone faces I think you might have
faced already. It's one is that about uh
duplication about module the board team
A have built a module about a VPC or a
VNET. Similarly, team B is also
creating. Okay. So the agenda may be
different or maybe these two VPCs are
being used differently. But that not
that is something which can create some
sort of uh you know issues while you are
managing it going forward because
suppose I want to do some changes that
does not mean uh the other buy is also
making some changes. So the naming
convention we need to take care of for a
customer. Let's say we are managing
customer uh ABC. So there should be
proper naming convention which we should
take care. It's not like any and
everyone is pushing the code. Of course
code will go live. There will there
won't be any issues. But going forward
as people will come along, people will
replace things may not go the way it
should be. So there should be uh places
where it could be inconsistent
standards. You will find updates will be
slow and you might see that okay there
are some different versioning. Suppose
my team have decided that okay we'll be
going ahead with version 1.2. 24 for our
terapform code which we are managing for
all the repos and everything and we did
we did it but that does not mean
everyone others in the other team who
are also managing the same
infrastructure have done so. So that
also we need to take care. But whatever
we are doing for the benefit of the
customer or the for the benefit of our
team that should be going going in a way
which is very smooth and uh with clear
ownership that okay who will be doing
what and if not if you you are not doing
also you need to ensure you are uh you
know uh informing people who should be
readily informed because by that way the
customer is also aware what is happening
around the infrastructure plus the other
people who may have to take some steps
as a proactive measures so they can also
take care of
Then with uh with with people say okay
CI/CD is something which you can make my
things smooth uh they can make my things
uh very quick to do but at times it they
can also choke uh the CI/CD pipelines
can also have hard to manage. So
[clears throat] so this is a quick uh
setup how our single team setup or maybe
how learning setup will look like.
Whenever we have learned terapform in
our earlier times we might have seen
that okay there's a single team or there
I'm a one user I have created some docs
I have created some terapform code I I
have a demo infrastructure where I'm
just getting resources ABC comput
storage everything will go live
terapform plan apply everything works
smooth however in reality whenever you
are working for an organization you see
there are many teams there are many
modules so like I said BPC or let's say
there are some few basic components or I
would say basic uh networking components
on top of it the every infrastructure
you know uh comes along with. So we need
to take care of every minute details how
our naming should look like how the
entire pipeline should look like how my
flow will look like and uh if there is a
point of you know wherein we can use our
code which is uh we have learned in oops
also which is called inheritance. So in
this is not just about you know uh
getting the code it's more about
reducing the complexity of space and
time that you are not writing the code
redundantly again and again but
utilizing it in the best fashion way.
Okay. So with more teams with more
modules things can become tricky. So the
scaling at times become an issue in that
case. So okay so we need to understand
when we scale things uh it's not about
just you know uh adding more resources
or adding more code into it. There are
many other things that come along with.
So we need to ensure consistency is not
broken. If there are more repos there
are you know uh people's at one place
might have been using monorreo thing
wherein they writing everything in
single repo. They might have created
different environments in the form of
dev prod or you can say QA but they are
writing everything in in one shot sort
sort of thing. However the other side of
side of people might be doing in some
other way. They might have created
different different uh monor repo. they
have created different different reports
for everything. So they might have
created some silos which eventually have
their own benefits. So the approach
which you are taking is not uh uh always
correct or wrong. It's more about how
you are effectively implementing in your
organization or in your team so that
everyone is you know uh aware and how
things should move. So the main key
areas how or what are the key challenges
uh which comes when we scale it's more
about uh the complexity of the code then
the coordination uh there are teams who
might be working 24 by7 also there are
people who are working from some other
part of the nation or other part of the
of the globe so you need to ensure
everyone is aware even if he's working
from European hours if he's working from
US so whatever changes we are doing so
it should be streamlined in a way
suppose I have done some xyz thing in my
Indian times So when I give the handover
though they should also be clear and
aware that okay how I should take it
ahead from one place to other. So in
that case the coordination becomes a key
role then the governance of course
governance at at times might sound that
okay this is something which is you know
u always keeping an eye on us but that
that's not the case it's more about how
you are implementing those changes how
you are in implementing some policies.
Suppose you have created some own set of
policies that okay for these kind of
people this should be the IM roles like
we have learned in GCP also the concept
of list privileges. So we need to ensure
only the necessary people should have
necessary accesses. Then we don't also
control how we are managing our G get
repo or our version controlling tool. So
who are the people who can approve the
request? It should not not be like that
okay today I have done some changes I
just raise the request and anyone can
approve it. So that governance can also
helps to ensure our code is secure
because at times people who are not very
known they might see something else and
they might just approve changes which
are not supposed to be uh good for the
infrastructure. So that's why governance
is also necessary. Then comes the
performance and automation. Automation
is a very uh I would say a term which is
known across industries uh well accepted
well used but it have its own pros and
cons in terms of how you are you know
using it. Whenever you are deploying
something resource some resources or
anything out there there are lot many
hiccups that may arise before and after
it. So you need to ensure whenever you
are scaling uh the work your workflow
should be very smooth and accurate.
So uh like I discussed again this is
just you can say visual representation
of how our problem might look like with
modules. Uh people say that okay modules
is a kind of a reusable code. So instead
of writing code I will reuse it again.
But you need to also define how models
within uh all these models have been
taken care have been written and how
they have been flowing from one place to
other. There might times be that okay
there might be some clashes in terms of
two modules written in a similar way.
They might be used interchangeably but
uh they have not been used. So there
could be some violation in terms of
policies. There could be a module drift
which is more about this repetitive uh
set of modules. Apart from that uh this
is a basic thing which we learned about
how the issues may come come across. But
now we'll discuss about how to win this
case. I mean if there is uh if there are
some situations out there or if you are
already working to infrastructure or you
are starting from scratch how your
design should look like so that you can
you know uh go through all these
challenges which we have discussed in
the previous slides. So our uh argu
approach should be very uh clear in
terms of how we are defining your core
modules, feature modules and
environmental modules and uh don't take
it go with the names it more about just
understanding that how my uh see
whenever you are starting from a for any
terraform let's say deployment if you
are creating a new infrastructure the
customer has recently onboarded so the
plan should be the agenda should be that
okay today we are starting maybe for
with a number of resources in future we
may go beyond on any scale. So things
may move to at any certain level. So we
need to ensure that uh we are creating
an infrastructure which is quite
scalable in terms of everything
everything I mean from a very minute
details about how your naming should
look like then how your file structure
should look like how your folder
structure should look like and how your
uh core databases or maybe core modules
look like. If it's a datadriven company
you need to ensure that that okay they
will be heavily relying on data. So that
should be made from scratch in a way
that it can be easily scalable. So there
are certain ways which we can ensure
that there are few modules on top of it
other module regist or you can say there
are few things on top of it other
resources are created. So network as we
know is storage these are very basic
components on top of it every other
resources are created these days or ever
in history as well. So even in uh an era
wherein there was no virtualization
there was data center thing the first
and foremost thing was having a physical
boxes in in data centers on top of it
creating uh the network pipeline wherein
how you are uh putting physical cables
out there just to ensure every boxes is
well connected. So that was the first
and key component or key you can task
before deploying any data center out
there. So we need to we need to
understand what are the basic principles
like I've been discussing along that
versioning should be very correct in
terms of if you are upgrading go in a
certain fashion that every everything
around your infus should be on same
versioning
is also about how your code is being
implemented make sure your previous code
is immutable suppose there were some n
number of errors out there in your code
written on October 2025 but if you are
releasing every version on 15th of let's
say month so the November version should
be up should be with all the changes But
the October version should be kept
intact at times to troubleshoot you need
to see that okay what are things I can
do now in order to ensure uh if anything
is going wrong and if you have some
learnings from your previous module or
previous history or version you can
easily uh improvise on that part and can
me proceed. So shared registry also
helps how that okay you are there are
few datas which is always useful. So you
want to keep a track of it all the
copies you are managing in a certain
place so that whenever in the later run
or maybe someone else or maybe we are
creating another duplicate
infrastructure similar to that so we can
use it. So share registry helps to
ensure every team out there have a
access on it and whenever they can use
as a single source of truth to ensure
they want to create a new make some new
design so they can utilize the design
which they have earlier created and can
further more utilize it. So this is how
our basic uh infrastructure look like.
We'll start with the module registry at
the bottom. So that includes where
exactly my storage will uh take place.
All the modules all the governance
everything will be on top of it there
will be core modules. Core modules is
more about like I said for any
infrastructure network log IM and
logging these are the basic components
which are required. Okay. So we'll
create few core modules and this will be
taken care as you can say the primary
modules out there. Furthermore, based on
the customer specific requirements,
there will be some feature modules which
will be more about only the components
which are not regularly used or maybe
are only customer-driven that okay, this
customer might be using AKS in a sure,
they might be using some DB let's say
postray for any XYZ customer and they
might be using apps. So even if I
replicate this infrastructure from one
place to other to other customers, the
core will remain the same, the
environmental layer will remain the same
only feature will be fluctuating as per
the requirement. So there are few things
which we can keep it as a standard for
our company also as a service. Let's say
we are service provider for 10 number of
customers. So even in our sales page we
can ensure that okay this is how we'll
take care of the infrastructure. So we
can easily make other understand how we
will be taking care of your
infrastructure and how we are ensuring
security scalability and everything out
there.
Furthermore uh we'll be discussing how
your model design will look like if you
want to create the best out of it. uh
you you know there's this a basic flow
that you will develop the things you
will validate if it is working fine you
will test you will doing some versioning
versioning helps uh in terms of also uh
keeping care that okay if there are
versions there are updates which you do
in the second place first but but before
that there are some versioning which are
done by the vendor itself so if you're
using any number of uh product based
companies in your uh in your work let's
say you're using uh postra you using
Oracle SQL you are using GC PCP also you
are using terapform you are using
powershell so those come also with some
updates and on top of it you need to
ensure how and when is the best time for
your infrastructure to scale to a next
version out there. So versioning is a
very key out there and you should be
taking care of that part very uh
seriously in terms of keeping in mind
the security context also because as you
go forward the support for the previous
versions go you know in vain and so they
are not very useful standardization like
I said the naming the tagging that
variable convention should be very
smooth and clear it should not be very
you can say standard or very very common
that okay this is a BPC so you need to
ensure if it is a BPC it is for which
infrastructure Because the person who is
managing the pipelines out there in the
form of CI/CD he might be using genkins
or Azure DevOps or cloud build or
whatever it is he or she should be very
clear about for which infrastructure
he's pushing the code and by any means
he or she should not push the code to a
wrong repo. So then standardization is a
very you can say good part of
communication which is very useful in
when you talk about this thing.
Furthermore uh we will discuss about
what are the governance and version
management thing and how they useful.
When I say governance like we also
discussed earlier, it's more about you
know keeping a eye on the bird eye view
so that everything is going smooth. So
we need to en ensure testing is done
well. We need to ensure the CI/CD
promotion is going in a proper sequence.
You are doing some changes in dev. You
should go then to staging then to prod
going straight away to skipping or any
any stage won't be useful. It might make
sense to some extent at once or twice
but eventually this is not the right way
of doing it. So always go ahead for a
promotion in a promotion way when you
are moving from a non-product
infrastructure to a product
infrastructure whatever you're doing it
of course policy and code policy is code
I think there was another session
earlier about IA and then there was a
topic called I think OAC that is about
OPA so whenever you're doing that you
are just ensuring that okay there are
few policies which should be implemented
and those policies should be taken care
while deploying the resources and this
will also ensure your uh resources are
well governed and well managed.
This is the similar the same thing which
we have discussed in the previous slide.
I think we are [clears throat] okay the
one of the final things which we uh
before before wrapping is more about how
the collaboration will look like and why
it is needful. So whenever you do
collaboration it should be in terms of
clear communication it should be uh you
need to ensure how well you are giving
documentation. I think we can all we can
all agree that okay whenever we are
stuck in any technical issues out there
we often prefer going ahead to the
official documentation from the Google
Microsoft or any big giant you name it
because they have maintained those
documentation in a way that even if you
follow those uh you will be have a clear
understanding how things work even if
you are stuck to a point x uh point x
then you have any number of solutions in
a very well- definfined way so hence
this documentation is very helpful and
needed at times code reviews like I said
this a very known practice. So this we
should not talk much about it on that
part. Communication like I said not
every commission can be formal on emails
or way there are few communication which
should be done on a on a way wherein the
person who want to share some
information and the person who are the
receiver should be sync and they should
discuss with among themselves. It could
be a on slack also. It could be with
some comments also or maybe it could be
in the form of any other medium by which
they are comfortable with. Okay. Then
furthermore you can also enhance those
uh collaborations by creating some you
know modules or you can say using some
different different tools for metric
dashboard. Suppose you have created a
deposit okay how many people have raised
uh wrong pull request out there. So if
you are keeping track of all the
mistakes which have been taken care
using some dashboards technically only
or maybe you are using some feedback
loops so you can improvise going forward
that okay last time we have done this uh
amount of mistake but this time we can
reduce it. Then we talk when you talk
about the tooling and automation in the
final another stage it's more about how
and what are the tools which which you
can use in terms of scaling. So there's
a a very good tool uh I was reading
quite some time ago which is about
renovate boat. It's a dependency update
tool which is can help you to understand
how your dependency will look like. Then
if you want to discuss about the drift
detection you can go ahead for the
automated plan thing. This is also a
very good tool and also you can also use
alert triggers to identify if whatever
your expectations are are not been fed
up then you can get the alerts that okay
these two modules are colliding
somewhere else or maybe these two
resources are having similar names or
they are colliding in some way which is
not good for infrastructure. So a view
of how your infrastructure is going
ahead and and improving. So we you can
also have a look on that part.
So like I said before all these closing
this there are few things which we have
learned out there. It's more about how
you are you know validating your request
how you are governing your things and
how you are keeping things standard.
Technically you can write code for
everything but even if you are observing
the recent outages the the one which is
going around in AWS I think there was a
recent update on Azure also. So these
guys have not done anything wrong uh
purposely or maybe they are lack of they
they have lack of knowledge. It's more
about they might have missed some tricks
out there in communicating well or maybe
taking things seriously well and hence
the outages are coming and impacting
huge amount of people. So if if you're
doing good collaboration and good uh
understanding between the people then I
think things can go smooth and in a good
fashion. Yeah. So this is the final
diagram which before I close which is
about how your infrastructure
works if you're having with multiple
teams. So there is a platform engineers
who are creating everything they have
put it into a form of uh terraform
modules compute modules or you can say
network modules on top of it the other
engineers the the other team like dev
team ops team and QA team are using the
pull requesting and creating the modules
based on your requirement they are uh
using the ones they have been getting
from the platform team and then
furthermore they're writing only those
modules which are required and by by
this way every team is aware that okay
from where what modules to fetch in
order to use and deploy the resources
for XYZ customer.
So I think I hope I it was
time is up. So anything else if you have
please do feel free to ask the questions
in the comment section and I hope it was
okay to conclude
As infrastructure grows in complexity, so does the Terraform codebase behind it. In this talk, I’ll share our journey of scaling Terraform across multiple teams and business units, managing over 100 reusable modules in a production environment. You'll learn how we approached: - Designing reusable and opinionated Terraform modules - Structuring our codebase to support scale and team autonomy - Managing state files securely and efficiently with remote backends - Enforcing standards using CI/CD, pre-commit hooks, and policy as code - Handling module versioning, breaking changes, and team coordination - Lessons learned from real incidents — and what we’d do differently - This session is for anyone looking to take their Terraform usage from isolated scripts to a scalable, production-ready platform shared across teams. Speaker: Divyanshu Mishra Subscribe to our YouTube Channel → https://www.youtube.com/c/HashiCorp?sub_confirmation=1 For hands-on interactive labs, visit HashiCorp Developer → https://developer.hashicorp.com/ HashiCorp, an IBM company, helps organizations automate hybrid cloud environments with Infrastructure and Security Lifecycle Management. HashiCorp offers The Infrastructure Cloud on the HashiCorp Cloud Platform (HCP) for managed cloud services, as well as self-hosted enterprise offerings and community source-available products. For more information, visit hashicorp.com. For more information → https://hashicorp.com LinkedIn → https://linkedin.com/company/hashicorp X → https://x.com/HashiCorp Facebook → https://facebook.com/HashiCorp