Loading video player...
[Music]
Okay, thank you everyone. Hello, my name
is Luke Sandberg. I'm a software
engineer at Verscell working on TurboAC.
So, I've been at Versell for about six
months uh
which has given me just enough time to
come up here on stage and tell you about
all the great work I did not do.
Prior to my time at Verscell, I was at
Google where I got to work on our
internal web tool chains and do weird
things like build a TSX to Java bite
code compiler and work on the on the
closure compiler.
So when I arrived at Vcell, it was
actually kind of like stepping on to
another planet like everything was
different and I was pretty surprised by
all the things we did on the team and
the goals and the goals we had. So today
I'm going to share a few of the design
choices we made in TurboAC and how how I
think they will let us continue to build
on the fantastic performance we already
have. So
to help uh motivate that this is our
overall design goal. So from this you
can immediately infer that we probably
made some hard choices. So like what
about cold builds? Those are important,
but you know, one of our ideas is you
shouldn't be experiencing them at all.
And that's what this talk is going to
focus on. In the keynote, you heard a
little bit about how we leverage
incrementality
to improve bundling performance. And the
key idea we have for incrementality is
about caching. We want to make every
single thing the bundler does cachable
so that whenever you make a change, we
only have to redo work related to that
change. Or maybe to put it another way,
the cost of your build should really
scale with the size or complexity of
your change rather than the size or
complexity of your application. And this
is how we can make sure that Turopac
will continue to give developers good
performance uh no matter how many icon
libraries you import.
So uh to help understand and motivate
that idea, let's imagine the world's
simplest bundler, which maybe looks like
this.
So, uh, here's our baby bundler. And
this is maybe a little bit too much code
to put on a slide, but it's going to get
worse. So, here we parse every entry
point. We follow their imports, resolve
their references recursively throughout
the application to find everything you
depend on. Then at the end, we just
simply collect everything each entry
point depends on and plop it into an
output file. So, hooray, we have a baby
bundler. So obviously this is naive but
if we think about it from an incremental
perspective no part of this is
incremental.
So we definitely will parse certain
files multiple times maybe depending on
how many times you import them. That's
terrible. Uh we'll definitely resolve
the react import like hundreds or
thousands of times. Uh so you know ouch.
So if we want this to be at least a
little bit more incremental we need to
find a way to avoid redundant work.
So let's add a cache.
So you might imagine this is our parse
function. It's pretty simple and it's
probably kind of the workhorse of our
bundler. You know, very simple. We read
the file contents, hand them off to SWC
to give us an a. So let's add a cache.
Okay, so this is clearly a nice simple
win. Um, but you know, I'm sure some of
you have written caching code before.
Maybe uh there's some problems here like
you know what if the file changes. This
is clearly something we care about. Um
and you know what if the file isn't
really a file but it's three sim links
in a trench code. A lot of package
managers will organize dependencies like
that.
Um and we're using the file name as a
cache key. Is is that enough? Like you
know we're bundling for the client and
the server. Same files end up in both.
Does that work?
We're also storing the as and returning
it. So now we have to worry about
mutations.
So you know uh and then finally isn't
this a really naive way to parse. Uh I
know that everyone has massive
configurations for their for the
compiler like some of that has to get in
here. So uh yeah these are all great
feedback. uh and uh this is a very na
naive approach and to that of course I
would say yeah this will not work. So
what do we do about fixing these
problems?
Please fix and make no mistakes.
So okay
so maybe this a little bit better. uh
you know you you can see here that we
have some transforms. We need to do
customized things to each file like
maybe down leveling or implement use
cache. We also have some configuration
and so of course we need to like include
that in our key for a cache. But maybe
right away you're suspicious like is
this correct? Like is it actually enough
to identify a transform based on the
name? I don't know. Maybe that has some
complicated configuration all of its
own. and okay and like is this two JSON
value going to actually capture
everything we care about? Will the
developers maintain it? How big will
these cache keys be? How many copies of
the config will we have? So I've
actually personally seen code exactly
like this and I find it next to
impossible to reason about.
Okay, we also tried to fix this other
problem around invalidations.
So we added a callback API to read file.
This is great. So if the file changes,
we can just nuke it from the cache. So
we won't keep serving still contents.
Okay, but this is actually pretty naive
cuz like sure we need to nuke our cache,
but our caller also needs to know that
they need to get a new copy. So okay, so
let's start threading callbacks.
Okay, we did it. We threaded callbacks
up through the stack. You can see here
that we allow our caller to subscribe to
changes.
We can uh just rerun the entire bundle
if anything changes. And if a file
changes, we call it. Great. We have a
reactive bundler.
But this is still hardly incremental. So
if a file changes, we need to walk all
the modules
uh again and uh and produce all the
output files.
So, you know, we saved a bunch of work
by par uh by having our parse cache, but
uh this isn't really enough. And then
finally, there's all this other
redundant work. Like, we definitely want
to cache the imports. We might find a
file a bunch of times and we keep
needing its imports. So, we want to put
a cache there. And you know, resolve
results are actually pretty complicated.
So we should definitely cach that so we
can reuse the work we did resolving
React.
Um but uh okay now we have another
problem. Uh your resolve results change
when you update dependencies or add new
files. So we need another call back
there.
And we definitely also want to like
cache the logic to produce outputs
because you think about in an HMR
session you're editing one part of the
application. So why are we rewriting all
the outputs every time? And oh also you
might like delete an output file. So we
should probably listen to call back uh
listen to changes there too.
Okay. So maybe we solve all those things
but we still have this problem which is
every time anything changes we start
from scratch. So kind of the whole
control flow of this function doesn't
work because if a single file changes,
we'd really kind of want to jump into
the middle of that for loop.
And then finally, our API to our caller
is also hopelessly naive. They probably
actually want to know which files
changed so they can like push updates to
the to the client.
So yeah, so this approach doesn't really
work. And even if we somehow did thread
all the callbacks in all these places,
um do you think you could actually
maintain this code? Do you think you
could like add a new feature to it? Uh I
don't uh I think this would just crash
and burn. Uh and you know to that I
would say yeah.
So once again, what should we do?
uh you know just like when you're
chatting with an LLM, you actually first
need to know what you want. Um and then
you have to be extremely clear about it.
So what do we even want?
So you know uh we considered a lot of
different approaches and many people on
the team actually had a lot of
experience working on bundlers. Um so we
came up with these kind of rough
requirements. So, we definitely want to
be able to cache every expensive
operation in the bundler.
And it should be really easy to do this.
Like, you shouldn't get 15 comments on
your code review every time you add a
new cache.
And um and then I don't actually really
trust developers to write correct cache
keys or track inputs uh or track
dependencies by hand. So, we should
handle uh we should definitely make this
foolproof.
Next, uh, we need to handle changing
inputs. This is like a big idea in HMR,
but even across sessions. So, mostly
this is going to be files, but this
could also be things like config
settings. And with the file system
cache, it actually ends up being things
like environment variables, too. So, we
want to be reactive. We want to be able
to recmputee things um as soon as
anything changes, and we don't want to
thread call backs everywhere.
Uh finally, we just need to take
advantage of modern architectures and be
multi-threaded and just generally fast.
So maybe you're looking at this set of
requirements and some of you are
thinking uh what does this have to do
with a bundler
and to that I would say of course you
know my management team is in the room
so we don't really need to talk about
that.
But really, I'm guessing a lot of you
jump to the much more obvious
conclusion,
this sounds a lot like signals.
And yeah, I am describing a system that
uh like signals. It's a way to compose
computations, track dependencies with
some amount of automatic memoization.
And I should note uh that we you know we
drew inspiration from all sorts of
systems especially the rust compiler and
a system called salsa.
And there's even an academic literature
on these concept called adapons if
you're interested. Okay, so let's take a
look at what the uh let's see what this
looks like in practice and then we're
going to take a very jarring jump from
code samples in JavaScript to Rust.
So here's an example of the
infrastructure we built.
Uh
a turbo function is a cached unit of
work in our compiler.
So we can uh once you annotate a
function like this uh we can track it.
We can construct a cache key out of its
parameters. Um and that allows us to
both cache it and reexecute it when we
need to.
These VC types here you can think of
like signals. This is a reactive value.
VC stands for value cell but um signal
might be a little bit of a better name.
uh when you declare a parameter like
this, you're saying this might change. I
want to I want to re-execute when it
changes. And so how do we know that? So
we read these values via await.
Once you await a reactive value like
this, we automatically track the
dependency.
And then finally, of course, we do the
actual computation we wanted to do and
we store it in a cell. So
because we've automatically tracked
dependencies, we know that this function
depends on both the contents of the file
and the value of the config.
And by and every time we store a new
result into the cell, we can compare it
with the previous one and then uh if
it's changed, we can propagate
notifications to everyone who's read
that value. So this concept of changing
is key to our approach to
incrementality.
Um and yeah again the simplest case is
right here. If the file changes, TurboAC
will observe that it invalidate this
function execution and re-execute it
immediately.
And then if we happen to produce the
same a
uh we'll just stop right there because
we compute the same cell. Now you know
for parsing a file there's hardly any
edit you can make to it that doesn't
actually change the a um but we can
leverage the fundamental composability
of turboac uh functions to take this
further. So here we see another turbopac
cache function
uh extracting imports from a module. Uh
you know you can imagine this is like a
very common task we have in the bundler.
We need to extract imports just to
actually find all the modules in your
application. Uh we leverage them to pick
the best way to group modules uh
together into chunks. And of course the
import graph uh is important to basic
tasks like tree shaking.
Um and so because there's so many
different consumers of the imports data,
a cache makes a lot of sense. So this
implementation isn't really special.
This is like what you would find in any
kind of bundler. We walk the a collect
imports into some special data structure
that we like um and then we return them.
But the key idea here is that we stored
them into another cell.
So if the module changes, we do need to
rerun this function because we read it.
But if you think about the kind of
changes you make to modules, very few of
them actually affect the imports. So you
change the module, you update the
function body, you know, a string
literal,
uh any kind of implementation detail,
it'll invalidate this function and then
we'll compute the same set of imports
and then we uh then we don't invalidate
anything that has read this. So if you
think about this in like an HMR session,
this means that we do need to reparse
your file, but we really don't need to
think about how to do chunking decisions
anymore. We don't need to think about
any kind of tree shaking results because
we know those didn't change. So we can
immediately jump from parsing the file
doing this simple analysis and then
jumping right to producing outputs. And
this is one of the ways we have really
fast um fast refresh times.
So uh this is pretty imperative. Uh
another way to think about this basic
idea is as a graph of nodes.
So here on the left, you might imagine a
cold build. Initially, we actually do
have to read every file, parse them all,
analyze all imports, and as a side
effect of that, we've collected all the
dependency information from your
application.
And then when something changes, we can
leverage that dependency graph we built
up to propagate invalidations back up
the stack and re-execute TurboAC
functions. And so if they produce a new
value, we stop there. Otherwise, we keep
propagating the invalidation.
So
great. You know, this is actually kind
of a massive oversimplification of what
we're doing in practice, you might
imagine. Uh so in TurboAC today, there
are around 2500 different Turboask
functions. And in a typical build, we
might have literally millions of
different tasks.
So it really looks maybe a little bit
more like this.
Now, I don't really expect you to be
able to read this. Couldn't really fit
it on the slide. So maybe we should zoom
out.
[Music]
Okay, so that is not obviously helpful.
In reality, we do have better ways to
kind of track and visualize um what's
happening inside of Turopac, but uh
fundamentally those works by throwing
out the vast majority of dependency
information. Um and now I'm guessing
that some of you maybe actually have
experience working with signals. Uh
maybe bad experiences.
Uh, you know, I for one actually like
stack traces and being able to step into
and out of functions in a debugger. So
maybe you're like suspicious that this
is a complete panacea. Like it obviously
comes with trade-offs.
Um, and yeah, so and to that I would of
course say
well you know what I'd actually say is
all of software engineering is about
managing trade-offs. We're not always
solving problems. Exactly. But we're
really picking new um sets of trade-offs
to deliver value.
So to achieve our design goals around
incremental builds and turboac, we put
kind of all our chips on this
incremental reactive programming model.
Um and this of course had some very
natural
consequences.
So, you know, maybe we actually really
did solve the problem of hand rolled
caching systems and cumbersome
invalidation logic. Um, in exchange, we
have to manage some complicated caching
infrastructure. Um, and of course, you
know, that sounds like a really good
trade-off to me. I I like complicated
caching infrastructure.
Um, but we all have to live with the
consequences.
Um, so the first, of course, is just the
core overheads of this system.
you know, so if you think about it, uh,
in a given build or HMR session,
uh, you're not really changing very
much. So, we track all the dependency
information between like every import
and every resolve result in your
application, but you're only going to
actually like change a few of them. So,
most of the dependency information we
collect is never actually needed.
So, you know, to manage this, uh, we've
had to focus a lot on driving on
improving the performance of this
caching layer, um, to drive the
overheads down and let our system scale
to larger and larger applications.
And the next and most obvious is simply
memory. You know, caches are always
fundamentally a time versus memory
trade-off, and ours is doesn't really do
anything different there. uh our our
simple goal is that the cache size
should scale linearly with uh your the
size of your application but again we
have to be careful about overheads.
Uh this next one is a little subtle. Uh
so we have lots of algorithms in the
bundler as you might expect and some of
them kind of require understanding
something global about your application.
Uh well that's a problem because anytime
you depend on global information it
means any change might invalidate that
operation. So we have to be careful
about how we design these algorithms.
Compose things carefully so that uh we
can preserve incrementality.
And uh finally uh this one's maybe a bit
of a personal gripe. Uh everything is
async in Turboac. And so this is great
for horizontal scalability but once
again it harms our fundamental like you
know debugging performance profiling
goals.
Um,
so, uh, I'm sure a lot of you have
experience debugging async in like the,
uh, in the Chrome dev tools, and this is
generally a pretty nice experience. Not
always ideal, but I assure you, Rust
with LLDB is like light years behind.
Um, so to manage that, we've had to
invest in custom visualization,
instrumentation, and tracing tools. And
look at that, like another
infrastructure project that isn't a
bundler.
Okay, so let's take a look and see if we
made the right bet.
So uh at Verscell, we have a very large
production application. Uh we think it's
maybe one of the largest in the world,
but you know, we don't really know. Uh
but it does have around 80,000 modules
in it. So let's take a look at how
Turopac does on it. for fast refresh.
Um, we really dominate what Webpack is
able to deliver, but this is kind of old
news. Turbopac for Dev has been out for
a while and I really hope everyone is at
least using it in development. But, you
know, the new thing here today, of
course, is that builds are stable. So,
let's look at a build. And here you can
see a substantial win over Webpack for
this application. This particular build
is actually running with our new
experimental file system caching layer.
So about 16 of those 94 seconds is just
flushing the cache out at the end. Uh
and this is something we're going to be
working on improving as file system
caching becomes stable. But of course
the thing about cold builds is that
they're cold. Nothing's incremental. So
let's take a look at a actual warm
build. So using the cache from the cold
build, we can see this. So this is just
a peak at where we are today. Because we
have this fine grain caching system, we
can actually just write out the cache to
disk and then on the next build, read it
back in, figure out what changed and
finish the build. Okay, so this looks
pretty good, but a lot of you are
thinking like, well, I, you know, maybe
I personally don't have the largest
Next.js application in the world.
So, let's take a look at a smaller
example. The React.dev website is quite
a bit smaller. Uh, it's also kind of
interesting because it's a React
compiler. It's unsurprisingly an early
adopter of the React compiler and the
React compiler is implemented in Babel
and this is kind of a problem for our
approach because it means for every file
in the application we need to ask Babel
to process it. So and fundamentally I
would say we or me I I can't make the
React compiler faster. It's not my job.
My job is Turboac.
Uh but we can figure out exactly when to
call it.
So looking at fast refresh times, uh I
was actually a little disappointed with
this result. Uh and it turns out that
about 130 of those 140 milliseconds is
the React compiler. Um and both Turbopac
and Webpack are doing that. But with
Turopac, we can after the React compiler
has processed this change, we can see,
oh, imports didn't change. Chuck it into
the output and keep going.
Once again, on cold builds, we see this
kind of consistent 3x win. And just to
be clear, this is on my machine. Uh but
again, no incrementality on a cold
build. And in a warm build, we see this
much better time. So again, uh with the
warm build, we already have the cache on
disk. All we need to do is basically
once we start, figure out what files in
the application change, re-execute those
jobs, and then reuse everything else
from the previous build. So the basic
question is, are we turbo yet? Yes. So
yeah, this was discussed in the keynote
of course. Turbo pack is stable as of
next 16 and we're even the default
bundler for next. Uh so you know mission
accomplished. You're welcome. Uh but
[Applause]
and if you notice that uh revert revert
revert thing in the keynote, that was me
trying to make Turboac the default. It
only took three tries. Uh but what I
really want to leave you with again is
this uh you know because we're not done.
We still have a lot to do on performance
and finishing the swing on the file
system caching layer. I suggest you all
try it out in dev. And uh that is it.
Thank you so much. Please find me, ask
me questions.
[Music]
The turbopack team explains how we are making every build incremental and fast. Get a demo today: https://vercel.com/contact/sales/demo