Loading video player...
Good morning everyone. Today I'm going
to talk about the most exciting
developments in software engineer. How
we are using artificial intelligence to
transform code at massive scale far
beyond simple autocomplete suggestions
to intelligent device transformation. We
are living in a pivotal moment.
Developers have traditionally relied on
like line by line tools and manual
refactoring for large changes. What if
we could use AI for doing entire package
migrations, build and bug fixing or
which involves automated error
resolutions, code refactoring,
structural improvements, maintaining a
specific behavior, fixing code style
issues, enforcing a standards
automatically, static analysis based
repairs, fixing litter or analyzer
warnings, static analysis annotations
such as adding type hints or
documenting. Let me show you how we
actually using AI powered large scale
code transformations.
Before we talk about solution, let's
understand the core problems we're
trying to solve. There are three
fundamental challenges that prevent
standard LLMs from handling large scale
transformations effectively. First,
context window limitations. Here's a
reality. Most code bases are massive.
You can't fit an entire repository in a
single LM API call. We need to break the
code base into intelligent chunks and
orchestrate multiple LLM calls each with
the right context. This requires
sophisticated mechanism that understand
software structure not just splitting
code randomly at character boundaries.
Problem two knowledge cutff lm are
trained up to a certain date. If your
code bases use a library that was
updated yesterday, the model wouldn't
know about it. This is especially
critical in programming where API
changes happen frequently and outdated
knowledge can lead to incorrect
solutions. Problem three,
hallucinations. LMS are fantastic at
generating code that looks good
syntactically correct, but they're also
prone to making things up, using APIs
that don't exist, and ignoring edge
cases or implementing logic that
reasonable that seems reasonable, but it
is fundamentally flawed for critical
transformations. This is unacceptable.
These three challenges mean that LMS
alone even powerful ones simply cannot
reliably handle repository scale core
transmissions without human
interventions at every step. Here enters
code plan. Code plan is basically a
symbolic framework that combines the
creative power of large language models
with the rigor and precision of
traditional symbolic reasoning and
static analysis. What does code plan do?
It automates large independent code
changes. Think package migrations across
your code base, multiple file
refactorings, or modernizing legacy
code. The key innovation here is
treating repository level coding as a
planning problem, not just a generation
problem. Code plan integrates LLM with
intelligent planning. It doesn't just
generate code. It reasons about what
needs to happen, breaks it into steps,
validates those steps, and orchestrates
the execution. Some real use cases are
C# package migrations across your entire
code repository, Python dependency
updates that touch multiple services,
static analysisdriven repairs when
you're fixing violations or compliance
issues, adding type annotations to
untype code at scale. Code plan is
extensible. It's built on modules and
transformations, actions and prompts
that you can customize for your specific
in software engineering tasks. Code plan
at a high level is a symbolic framework
for building AI powered software
development workflows for inner and
outer loops for software engineering. It
in aggregates the power of LM with
intelligent planning for tackling
complex coding tasks that cannot be
solved using LLM alone. It basically
combines the power of LLM and symbolic
static analysis approaches. Okay, let's
let me walk you through what how code
plan actually works. The architecture
has majorly three components. Input code
plan engine and the output. The input
basically consists of your code
repository or directory, knowledge based
repositories with precise transformation
instructions and examples, static
analysis reports and logs that identify
issues. The code plan engine consists of
four major aspects. Prompt contest the
first prompts contextualization.
Before we pass before we ask anything to
the LM, we carefully prepare prompts
with most relevant context. This is
crucial. Bad context can lead to bad
outputs. Second, planning and
orchestration. Complex transformations
aren't one short operations. Code plan
breaks them into structured steps and
coordinates the execution making
decisions about which tools to use and
in what order. Dependency analysis using
a we pass your code using abstract
syntax trees, a representation that
captures the actual structure of your
code, not just the text. This lets us
understand the relationship between the
different components. Static analysis.
We validate the transformations and use
static analysis to guide them. Is this
the is this transformation correct? Did
we break any builds or unit test? Will
it break any downstream downstream
dependencies? Stack analysis gives us
the confidence. And finally, the output
is basically the transfer generated code
ready to be integrated. Here's a
walkthrough of the architecture diagram.
On the left side, as you can see, the
inputs consist of the code repository,
the knowledge based repo or the
instructions. The static analysis allows
for verifying or for performing static
analytics based transformations.
The main engine consists of prom
contextualization,
climing orchestration, dependency
analysis which uses abstract syntax
trees, static analysis, validation and
the LLM. And on the right side, the
generated code is the output code which
gets generated transforming your
repository. At a high level, technical
magic which happens at code plan majorly
consists of three techniques. LLM
planning. Instead of asking an LLM to
solve a problem in one shot, we asked it
to plan. Break the complex task into
structured steps. Choose which tools and
techniques to use. Decide the order of
ex execution. This enables the goal
directed execution rather than hoping on
a general purpose model gets it right.
Abased chunking. We don't split the code
at arbitrary character positions. We use
a to which is abstract syntax to
identify meaningful boundaries, function
classes, loops. Each chunk remains
syntactically valid on its own. This
produces better embeddings, better
retrieval and more accurate LM
responses. RAG, the retrieval augmented
generation. A rag basically consists of
a retriever and a generator. The
retriever finds relevant chunks from
your knowledge base and instructions and
the code base and the LLM generator
basically integrates the context into
its responses. This combination
basically reduces the halonations
overcomes the context length limitations
and incorpor incorporates fresh
knowledge into the training data.
Together these techniques create a
powerful system for handling repository
scale transformations. Okay. Now let's
go deeper into the abstract syntax
sheets as well as the rank. Okay, let's
walk through an a
concrete with an example. So here's a
simple Python code a function that
greets someone. When we pass this a with
a passer, we get a hierarchical tree
structure. At the top is a function
defaf node. Under that we have the
function name greet the arguments name
and the return statement. The return
statement contains a binary operation
string which is string contract
concatenation combining hello with the
variable name.
Now this might seem like a minor detail
but it's transformative.
When an LLM sees the structured
representation instead of raw text, it
understands the code at a much deeper
level. It knows these are distinct
semantic components. It can reason about
how changing one part affects the
others. This structure enables
meaningful code chunking for LLM
processing and accurate relationship
between the different components. LLM
see code as plain text. It's just a
sequence of characters. They don't
inherently understand the function is
different from a loop that a variable
declaration is different from a function
call. They make educated guesses based
on statistical patterns. This leads to
generate generic inaccurate responses
and potential hallucinations. The AS
solution breaks the code into organized
hierarchical representation. Now the LLM
can understand methods, properties,
arguments, data types, and logic flow.
It sees the skeleton of your program.
This makes the AI responses far more
accurate and insightful. When you feed
an LLM and AS instead of raw text,
you're given a semantic understanding
layer. That's the major difference
between a generic tool and a specialized
tool. Okay. Next, chunking. Now, we're
going to look at the two approaches for
breaking the code into chunks. The first
one is traditional and the next one is
bbased chunking. Traditional
character-based trunking treats code
like any other text. You might split add
500 characters or,000 characters. Here's
what happens. You end up cutting through
middle of functions. A function
definition gets split across chunks. An
inclusive statement sits alone. The
semicolon statement that closes a
statement might be in the next chunk.
The code structure is disregarded. You
get malformed syntactically invalid
fragments.
Statements lack proper syntax closure.
The semantic meaning is completely
destroyed. An LLM asked to process the
fragments would get completely confused.
It's like asking someone asking someone
to understand a sequence broken randomly
across lines. Next, now let's see how
based chunking significantly improves
our use case. Each chunk represents a
code boundary. The pre-processor include
is its own chunk. The using declaration
is its own chunk. The complete functions
are preserved as a single chunk. this
which makes more sense and provides
better understanding to the LLM. The
code is split at meaningful boundaries,
functions, control structures, using
declarations, etc. Each chunk is
syntactically valid and complete.
Semantic integration is completely
preserved. When an LLM receives
syntactically valid code that represents
semantic boundaries, it can generate
better confirmation. There's no guessing
about context that was cut off. The
model understands the complete picture
of that chunk. This is a foundation
technique which we use for enabling
repository scale transformation. The
planning along with the aalis helps us
split the task into meaningful sections
and loop or iterate over the LLM calls
providing meaningful chunks of the back
again and again. Next we're going to
look at retrieval augmented generation.
We use rack majorly for two major
aspects. Knowledge cutff and hiereration
problems. Rack has two simple
components. The retriever searches
through your knowledge base
documentation examples existing code
best practices majorly trying to
understand what is the transformation
you're trying to achieve. It uses the
retrie uses a retrieved documentation
combines it with the abased chunking and
provides it to the generator. The
generator takes whatever the retriever
found and process it through an LLM. The
model integrates this fresh up-to-date
context into its respon. The beauty of
this approach is that the LLM has
current information about your specific
APIs, frameworks, and patterns. It's not
operating on the general knowledge from
its training data. It's operating on the
actual codebase and the knowledge base
of instructions which we provided to it.
This combination dramatically reduces
hallucinations, overcomes context length
limitations and ensures your transforms
use latest frameworks and the best
practices from your organization. Okay,
let's talk about real world impact.
First time savings organization doing
mainframe migrations have seen 70%
reduction in modernization timelines.
Task that used to take 3 to 5 years are
completely automated.
Quality improvements code plan because
it integrates with the feedback loop
with static analysis. The
transformations are completely valid and
preserves the original behavior. There
are fewer manual errors in large scale
transformations. Higher confidence in
the results. Developer productivity.
Developers are freed from repetitive
errorrone task. Allows focus on high
value architecture and design work.
enables non-experts to perform code
transformations as well. Enterprise
applications, packet transformations
across hundreds of files, hundreds of
repositories, legacy code modernization,
compliance and security updates at
scale. These are real use cases
delivering real business value today.
Future and challenges. We need to be
honest about the current limitations.
Context length still limits extremely
large code bases. Some code bases are
simply enormous. Model hallations is not
completely disappeared. They reduce but
validation is still essential.
Repository scale analysis is
computationally heavy. It's not free.
Complexity and handling dynamic
languages such as Python and JavaScript
is higher as we struggle with the
traditional static analysis with them.
Emerging trends. One of the biggest
emerging trends is multi- aent systems
where multiple specialized agents have
different parts of transformation.
Continuously learning from developer
feedback as to how developers modify its
output integration with actual
development tools such as your ID, CI/CD
pipelines, not just standalone
processing. I would like to conclude
with some key takeaways. First we are
all we are moving from a line by line
autocomplete to a repository wide
reasoning from simple text generation to
structured planning with validation.
Neural approaches are pure LLM based on
neural approaches are being integrated
and made neurosymbolic hybrid systems
that combine the best of both worlds.
Planning frameworks that decompose
complex tasks as brace understanding
multicontext integration are some of the
main aspects which are enabling this
transformation. Bottom line AI powered
large core transformations are no longer
a theoretical or distant future. They
are achieving production ready results
on real world repositories. Right now
this technology is transforming what's
possible in software engineering. We're
not replacing software developers. We
are amplifying or enabling them to
achieve more. We are automatically we
are automating what's repetitive and
errorprone so that they can focus on
what's creative and strategy. The future
of soft engineering is at repository
scale intelligent planning and powered
by IR. Thank you.
Read the abstract ➤ https://www.conf42.com/Prompt_Engineering_2025_Janak_Bhalla_code_developers_scale Other sessions at this event ➤ https://www.conf42.com/prompt2025 Join Discord ➤ https://discord.gg/yQneDJdJGV Chapters 00:00 Introduction to AI in Software Engineering 00:59 Core Problems in Large-Scale Code Transformations 02:33 Introducing Code Plan: A Symbolic Framework 04:16 How Code Plan Works: Architecture and Techniques 07:56 Deep Dive: Abstract Syntax Trees and Chunking 13:08 Real-World Impact and Future Trends 15:09 Conclusion and Key Takeaways