Terraform at Scale: Managing 100+ Modules Across Teams | DailyDevLists

Loading video player...

Full Transcript

4,627 words • EN

So I'm Dan Shuain. Uh I hope you are

having a good time till now with few

good sessions out there. And today I'll

be talking about another session which

is about uh how and what terapform will

look like when you will be going to

scale it. So we all know that today it's

not about that okay whenever you will

going to work for a company as a devops

or as s engineer. So you will have a

module to manage or you will write few

lines of code just like how we do while

loading. So that's not how things work.

Usually it's a huge infrastructure. When

you get in, you get your credentials,

you get your accesses and then you have

to take care of entire the infra how it

look like where are the modules, how

many teams are involved. So at times

whatever we have learned sounds very

easy. However, whenever we go into the

production things look more tricky and

different. So hence my topic is more

about how we can manage uh I would say

not 100 plus exactly but many modules if

you are managing as a part of a team

wherein you are working as a devops

engineer or an S sur

I prepared that what are the key

pointers I'll covered in my quick

session so one starting will be uh

giving the problem what exactly my

context will be on what topic I will be

discussing then we'll discuss about the

scale and complexity furthermore we'll

discuss about what is how our

architecture approach will look like and

what are the best practices we can go

ahead and do. Governance of course comes

as one of the key topics when we talk

about this thing. Apart from that uh one

more thing which we'll be discussing is

about what are the typical tools or you

can say the automation things which we

can take care of while doing this. Okay.

So uh I'll start with uh if you see the

slide. So of course you all know that

okay and a very layman teleform is

nothing just about it's it's a tool it's

help us to manage or deploy resources on

any cloud platform or any other uh

platform wherein whatever we are doing

manually we are doing via code so people

say it's a kind of an automation people

say it's how uh we can reduce human

errors it's how how we can scale because

we need not to write everything again

again or we need not to do do everything

on click ops thing because that may

prone to human errors. So now whenever

the the code strength increases be it

let's say if you are a developer also

you might see that okay uh when you

write a lot of code uh things may not go

in the way you want it. Hence it's

always wise and how you are managing it.

So you need to ensure that not only the

technical part is important but also how

you are keeping track of how things

should work and move.

Okay. So uh when we say in a very small

scale setup let's say we are a team of

few people and we having small

infrastructure or let's say we are

managing client which is not having a

huge terapform infrastructure it's quite

easy to manage there are few modules

there are few environments and there are

few resources and you are deploying it

consistently regularly or maybe managing

it or shuffling it as and when required.

However, there are chances or there are

cases wherein you are working for a

company, you are working for a team XYZ.

There are other teams also which are

together managing the same

infrastructure. They have been named

differently that okay these guys are

from platform these guys are from S,

these guys are from uh automation or

they do the scripting and all these

things. So now whenever there are

multiple people involved in different

different uh teams so the count may of

course go to a certain level plus there

are many problems that come along with

it. I would not say problem in the first

place but I would say many complexity

comes along with so few challenges which

everyone faces I think you might have

faced already. It's one is that about uh

duplication about module the board team

A have built a module about a VPC or a

VNET. Similarly, team B is also

creating. Okay. So the agenda may be

different or maybe these two VPCs are

being used differently. But that not

that is something which can create some

sort of uh you know issues while you are

managing it going forward because

suppose I want to do some changes that

does not mean uh the other buy is also

making some changes. So the naming

convention we need to take care of for a

customer. Let's say we are managing

customer uh ABC. So there should be

proper naming convention which we should

take care. It's not like any and

everyone is pushing the code. Of course

code will go live. There will there

won't be any issues. But going forward

as people will come along, people will

replace things may not go the way it

should be. So there should be uh places

where it could be inconsistent

standards. You will find updates will be

slow and you might see that okay there

are some different versioning. Suppose

my team have decided that okay we'll be

going ahead with version 1.2. 24 for our

terapform code which we are managing for

all the repos and everything and we did

we did it but that does not mean

everyone others in the other team who

are also managing the same

infrastructure have done so. So that

also we need to take care. But whatever

we are doing for the benefit of the

customer or the for the benefit of our

team that should be going going in a way

which is very smooth and uh with clear

ownership that okay who will be doing

what and if not if you you are not doing

also you need to ensure you are uh you

know uh informing people who should be

readily informed because by that way the

customer is also aware what is happening

around the infrastructure plus the other

people who may have to take some steps

as a proactive measures so they can also

take care of

Then with uh with with people say okay

CI/CD is something which you can make my

things smooth uh they can make my things

uh very quick to do but at times it they

can also choke uh the CI/CD pipelines

can also have hard to manage. So

[clears throat] so this is a quick uh

setup how our single team setup or maybe

how learning setup will look like.

Whenever we have learned terapform in

our earlier times we might have seen

that okay there's a single team or there

I'm a one user I have created some docs

I have created some terapform code I I

have a demo infrastructure where I'm

just getting resources ABC comput

storage everything will go live

terapform plan apply everything works

smooth however in reality whenever you

are working for an organization you see

there are many teams there are many

modules so like I said BPC or let's say

there are some few basic components or I

would say basic uh networking components

on top of it the every infrastructure

you know uh comes along with. So we need

to take care of every minute details how

our naming should look like how the

entire pipeline should look like how my

flow will look like and uh if there is a

point of you know wherein we can use our

code which is uh we have learned in oops

also which is called inheritance. So in

this is not just about you know uh

getting the code it's more about

reducing the complexity of space and

time that you are not writing the code

redundantly again and again but

utilizing it in the best fashion way.

Okay. So with more teams with more

modules things can become tricky. So the

scaling at times become an issue in that

case. So okay so we need to understand

when we scale things uh it's not about

just you know uh adding more resources

or adding more code into it. There are

many other things that come along with.

So we need to ensure consistency is not

broken. If there are more repos there

are you know uh people's at one place

might have been using monorreo thing

wherein they writing everything in

single repo. They might have created

different environments in the form of

dev prod or you can say QA but they are

writing everything in in one shot sort

sort of thing. However the other side of

side of people might be doing in some

other way. They might have created

different different uh monor repo. they

have created different different reports

for everything. So they might have

created some silos which eventually have

their own benefits. So the approach

which you are taking is not uh uh always

correct or wrong. It's more about how

you are effectively implementing in your

organization or in your team so that

everyone is you know uh aware and how

things should move. So the main key

areas how or what are the key challenges

uh which comes when we scale it's more

about uh the complexity of the code then

the coordination uh there are teams who

might be working 24 by7 also there are

people who are working from some other

part of the nation or other part of the

of the globe so you need to ensure

everyone is aware even if he's working

from European hours if he's working from

US so whatever changes we are doing so

it should be streamlined in a way

suppose I have done some xyz thing in my

Indian times So when I give the handover

though they should also be clear and

aware that okay how I should take it

ahead from one place to other. So in

that case the coordination becomes a key

role then the governance of course

governance at at times might sound that

okay this is something which is you know

u always keeping an eye on us but that

that's not the case it's more about how

you are implementing those changes how

you are in implementing some policies.

Suppose you have created some own set of

policies that okay for these kind of

people this should be the IM roles like

we have learned in GCP also the concept

of list privileges. So we need to ensure

only the necessary people should have

necessary accesses. Then we don't also

control how we are managing our G get

repo or our version controlling tool. So

who are the people who can approve the

request? It should not not be like that

okay today I have done some changes I

just raise the request and anyone can

approve it. So that governance can also

helps to ensure our code is secure

because at times people who are not very

known they might see something else and

they might just approve changes which

are not supposed to be uh good for the

infrastructure. So that's why governance

is also necessary. Then comes the

performance and automation. Automation

is a very uh I would say a term which is

known across industries uh well accepted

well used but it have its own pros and

cons in terms of how you are you know

using it. Whenever you are deploying

something resource some resources or

anything out there there are lot many

hiccups that may arise before and after

it. So you need to ensure whenever you

are scaling uh the work your workflow

should be very smooth and accurate.

So uh like I discussed again this is

just you can say visual representation

of how our problem might look like with

modules. Uh people say that okay modules

is a kind of a reusable code. So instead

of writing code I will reuse it again.

But you need to also define how models

within uh all these models have been

taken care have been written and how

they have been flowing from one place to

other. There might times be that okay

there might be some clashes in terms of

two modules written in a similar way.

They might be used interchangeably but

uh they have not been used. So there

could be some violation in terms of

policies. There could be a module drift

which is more about this repetitive uh

set of modules. Apart from that uh this

is a basic thing which we learned about

how the issues may come come across. But

now we'll discuss about how to win this

case. I mean if there is uh if there are

some situations out there or if you are

already working to infrastructure or you

are starting from scratch how your

design should look like so that you can

you know uh go through all these

challenges which we have discussed in

the previous slides. So our uh argu

approach should be very uh clear in

terms of how we are defining your core

modules, feature modules and

environmental modules and uh don't take

it go with the names it more about just

understanding that how my uh see

whenever you are starting from a for any

terraform let's say deployment if you

are creating a new infrastructure the

customer has recently onboarded so the

plan should be the agenda should be that

okay today we are starting maybe for

with a number of resources in future we

may go beyond on any scale. So things

may move to at any certain level. So we

need to ensure that uh we are creating

an infrastructure which is quite

scalable in terms of everything

everything I mean from a very minute

details about how your naming should

look like then how your file structure

should look like how your folder

structure should look like and how your

uh core databases or maybe core modules

look like. If it's a datadriven company

you need to ensure that that okay they

will be heavily relying on data. So that

should be made from scratch in a way

that it can be easily scalable. So there

are certain ways which we can ensure

that there are few modules on top of it

other module regist or you can say there

are few things on top of it other

resources are created. So network as we

know is storage these are very basic

components on top of it every other

resources are created these days or ever

in history as well. So even in uh an era

wherein there was no virtualization

there was data center thing the first

and foremost thing was having a physical

boxes in in data centers on top of it

creating uh the network pipeline wherein

how you are uh putting physical cables

out there just to ensure every boxes is

well connected. So that was the first

and key component or key you can task

before deploying any data center out

there. So we need to we need to

understand what are the basic principles

like I've been discussing along that

versioning should be very correct in

terms of if you are upgrading go in a

certain fashion that every everything

around your infus should be on same

versioning

is also about how your code is being

implemented make sure your previous code

is immutable suppose there were some n

number of errors out there in your code

written on October 2025 but if you are

releasing every version on 15th of let's

say month so the November version should

be up should be with all the changes But

the October version should be kept

intact at times to troubleshoot you need

to see that okay what are things I can

do now in order to ensure uh if anything

is going wrong and if you have some

learnings from your previous module or

previous history or version you can

easily uh improvise on that part and can

me proceed. So shared registry also

helps how that okay you are there are

few datas which is always useful. So you

want to keep a track of it all the

copies you are managing in a certain

place so that whenever in the later run

or maybe someone else or maybe we are

creating another duplicate

infrastructure similar to that so we can

use it. So share registry helps to

ensure every team out there have a

access on it and whenever they can use

as a single source of truth to ensure

they want to create a new make some new

design so they can utilize the design

which they have earlier created and can

further more utilize it. So this is how

our basic uh infrastructure look like.

We'll start with the module registry at

the bottom. So that includes where

exactly my storage will uh take place.

All the modules all the governance

everything will be on top of it there

will be core modules. Core modules is

more about like I said for any

infrastructure network log IM and

logging these are the basic components

which are required. Okay. So we'll

create few core modules and this will be

taken care as you can say the primary

modules out there. Furthermore, based on

the customer specific requirements,

there will be some feature modules which

will be more about only the components

which are not regularly used or maybe

are only customer-driven that okay, this

customer might be using AKS in a sure,

they might be using some DB let's say

postray for any XYZ customer and they

might be using apps. So even if I

replicate this infrastructure from one

place to other to other customers, the

core will remain the same, the

environmental layer will remain the same

only feature will be fluctuating as per

the requirement. So there are few things

which we can keep it as a standard for

our company also as a service. Let's say

we are service provider for 10 number of

customers. So even in our sales page we

can ensure that okay this is how we'll

take care of the infrastructure. So we

can easily make other understand how we

will be taking care of your

infrastructure and how we are ensuring

security scalability and everything out

there.

Furthermore uh we'll be discussing how

your model design will look like if you

want to create the best out of it. uh

you you know there's this a basic flow

that you will develop the things you

will validate if it is working fine you

will test you will doing some versioning

versioning helps uh in terms of also uh

keeping care that okay if there are

versions there are updates which you do

in the second place first but but before

that there are some versioning which are

done by the vendor itself so if you're

using any number of uh product based

companies in your uh in your work let's

say you're using uh postra you using

Oracle SQL you are using GC PCP also you

are using terapform you are using

powershell so those come also with some

updates and on top of it you need to

ensure how and when is the best time for

your infrastructure to scale to a next

version out there. So versioning is a

very key out there and you should be

taking care of that part very uh

seriously in terms of keeping in mind

the security context also because as you

go forward the support for the previous

versions go you know in vain and so they

are not very useful standardization like

I said the naming the tagging that

variable convention should be very

smooth and clear it should not be very

you can say standard or very very common

that okay this is a BPC so you need to

ensure if it is a BPC it is for which

infrastructure Because the person who is

managing the pipelines out there in the

form of CI/CD he might be using genkins

or Azure DevOps or cloud build or

whatever it is he or she should be very

clear about for which infrastructure

he's pushing the code and by any means

he or she should not push the code to a

wrong repo. So then standardization is a

very you can say good part of

communication which is very useful in

when you talk about this thing.

Furthermore uh we will discuss about

what are the governance and version

management thing and how they useful.

When I say governance like we also

discussed earlier, it's more about you

know keeping a eye on the bird eye view

so that everything is going smooth. So

we need to en ensure testing is done

well. We need to ensure the CI/CD

promotion is going in a proper sequence.

You are doing some changes in dev. You

should go then to staging then to prod

going straight away to skipping or any

any stage won't be useful. It might make

sense to some extent at once or twice

but eventually this is not the right way

of doing it. So always go ahead for a

promotion in a promotion way when you

are moving from a non-product

infrastructure to a product

infrastructure whatever you're doing it

of course policy and code policy is code

I think there was another session

earlier about IA and then there was a

topic called I think OAC that is about

OPA so whenever you're doing that you

are just ensuring that okay there are

few policies which should be implemented

and those policies should be taken care

while deploying the resources and this

will also ensure your uh resources are

well governed and well managed.

This is the similar the same thing which

we have discussed in the previous slide.

I think we are [clears throat] okay the

one of the final things which we uh

before before wrapping is more about how

the collaboration will look like and why

it is needful. So whenever you do

collaboration it should be in terms of

clear communication it should be uh you

need to ensure how well you are giving

documentation. I think we can all we can

all agree that okay whenever we are

stuck in any technical issues out there

we often prefer going ahead to the

official documentation from the Google

Microsoft or any big giant you name it

because they have maintained those

documentation in a way that even if you

follow those uh you will be have a clear

understanding how things work even if

you are stuck to a point x uh point x

then you have any number of solutions in

a very well- definfined way so hence

this documentation is very helpful and

needed at times code reviews like I said

this a very known practice. So this we

should not talk much about it on that

part. Communication like I said not

every commission can be formal on emails

or way there are few communication which

should be done on a on a way wherein the

person who want to share some

information and the person who are the

receiver should be sync and they should

discuss with among themselves. It could

be a on slack also. It could be with

some comments also or maybe it could be

in the form of any other medium by which

they are comfortable with. Okay. Then

furthermore you can also enhance those

uh collaborations by creating some you

know modules or you can say using some

different different tools for metric

dashboard. Suppose you have created a

deposit okay how many people have raised

uh wrong pull request out there. So if

you are keeping track of all the

mistakes which have been taken care

using some dashboards technically only

or maybe you are using some feedback

loops so you can improvise going forward

that okay last time we have done this uh

amount of mistake but this time we can

reduce it. Then we talk when you talk

about the tooling and automation in the

final another stage it's more about how

and what are the tools which which you

can use in terms of scaling. So there's

a a very good tool uh I was reading

quite some time ago which is about

renovate boat. It's a dependency update

tool which is can help you to understand

how your dependency will look like. Then

if you want to discuss about the drift

detection you can go ahead for the

automated plan thing. This is also a

very good tool and also you can also use

alert triggers to identify if whatever

your expectations are are not been fed

up then you can get the alerts that okay

these two modules are colliding

somewhere else or maybe these two

resources are having similar names or

they are colliding in some way which is

not good for infrastructure. So a view

of how your infrastructure is going

ahead and and improving. So we you can

also have a look on that part.

So like I said before all these closing

this there are few things which we have

learned out there. It's more about how

you are you know validating your request

how you are governing your things and

how you are keeping things standard.

Technically you can write code for

everything but even if you are observing

the recent outages the the one which is

going around in AWS I think there was a

recent update on Azure also. So these

guys have not done anything wrong uh

purposely or maybe they are lack of they

they have lack of knowledge. It's more

about they might have missed some tricks

out there in communicating well or maybe

taking things seriously well and hence

the outages are coming and impacting

huge amount of people. So if if you're

doing good collaboration and good uh

understanding between the people then I

think things can go smooth and in a good

fashion. Yeah. So this is the final

diagram which before I close which is

about how your infrastructure

works if you're having with multiple

teams. So there is a platform engineers

who are creating everything they have

put it into a form of uh terraform

modules compute modules or you can say

network modules on top of it the other

engineers the the other team like dev

team ops team and QA team are using the

pull requesting and creating the modules

based on your requirement they are uh

using the ones they have been getting

from the platform team and then

furthermore they're writing only those

modules which are required and by by

this way every team is aware that okay

from where what modules to fetch in

order to use and deploy the resources

for XYZ customer.

So I think I hope I it was

time is up. So anything else if you have

please do feel free to ask the questions

in the comment section and I hope it was

okay to conclude

Terraform at Scale: Managing 100+ Modules Across Teams

HashiCorp

55 days ago

22:29

Devops Whitelist

DevOps Whitelist

Rank #2

Description

As infrastructure grows in complexity, so does the Terraform codebase behind it. In this talk, I’ll share our journey of scaling Terraform across multiple teams and business units, managing over 100 reusable modules in a production environment. You'll learn how we approached: - Designing reusable and opinionated Terraform modules - Structuring our codebase to support scale and team autonomy - Managing state files securely and efficiently with remote backends - Enforcing standards using CI/CD, pre-commit hooks, and policy as code - Handling module versioning, breaking changes, and team coordination - Lessons learned from real incidents — and what we’d do differently - This session is for anyone looking to take their Terraform usage from isolated scripts to a scalable, production-ready platform shared across teams. Speaker: Divyanshu Mishra Subscribe to our YouTube Channel → https://www.youtube.com/c/HashiCorp?sub_confirmation=1 For hands-on interactive labs, visit HashiCorp Developer → https://developer.hashicorp.com/ HashiCorp, an IBM company, helps organizations automate hybrid cloud environments with Infrastructure and Security Lifecycle Management. HashiCorp offers The Infrastructure Cloud on the HashiCorp Cloud Platform (HCP) for managed cloud services, as well as self-hosted enterprise offerings and community source-available products. For more information, visit hashicorp.com. For more information → https://hashicorp.com LinkedIn → https://linkedin.com/company/hashicorp X → https://x.com/HashiCorp Facebook → https://facebook.com/HashiCorp

Video Details

Category

Devops Whitelist

Feed

DevOps Whitelist

Featured Date

January 6, 2026

Quality Rank

#2

AI Recommended