Loading video player...
A lot of people love Notebook LM, but if
you've watched this channel for a while,
then you'll notice that I've never
covered it. Although it's a cool tool,
as an AI power user, I always have this
need to tinker and customize everything
behind the curtain. And I always like to
be able to tap into the knowledge or
resources of whatever platform I'm
building on from APIs, MCP servers, so I
can get more leverage in other
applications. So instead of waiting for
the notebook LM team to build their
product exactly the way that I wanted
it, I took matters into my own hands.
[music] With the help of Cloud Code, I
made my own notebook LM but on steroids.
Meaning I was able to replicate every
existing functionality that already is
out there on top of that be able to add
things like an MCP server, an API, and a
lot of other hidden features that people
have been asking for for months. But let
me tell you, this was not
straightforward. So, in this video, I'm
going to walk you through my version of
the tool, the exact design thinking
principles I applied, the architecture
decisions, and everything that you need
to see to be able to replicate this on
your own. And even if you're not
technical, trust me, you're going to
want to watch this whole video. It's on
the longer side, but I can guarantee
that you're going to learn a ton. So,
with that, let's get into it. So, this
is my version of Notebook LM Reimagined.
And if you can see the home screen here,
you have the ability to go back to
pre-existing notebooks. So, just like
you would with the normal notebook LM,
you can go to create a new notebook
right here. Add some form of emoji. I'll
call this demo. And we'll click on
create notebook. And then this will
spawn, as you will be familiar with, all
the features that you like on top of
some new ones. So, you have the ability
to create a multi-person podcast, one of
the more famous features of Notebook LM,
the ability to create videos, to do deep
research, to upload sources, and have
those sources autorag. and then from
those sources create different
flashcards, quizzes, all the things that
you love and know about Notebookm if
you've used the product. But instead of
me yapping about it, let me show you. So
let's go to one of our existing ones
called AI research notes. So you'll see
here if I drag this over, this has a
text about machine learning. So if I go
to something like the audio podcast and
I go to brief summary and I click on
something like generate here, this will
go and generate audio. I'll walk you
through exactly what it's generating
from where shortly, but this will take
around 10 to 15 seconds for the brief
version and come back with the famous
multi-person podcast. And when we get
back a response, we get something like
this where you can see a script back and
forth between Alex and Sam. And if we
click on this, you should be able to
hear the back and forth conversation.
>> Hey Sam, welcome back to the podcast.
Today, I wanted to dive into a term we
hear all the time, but maybe don't fully
get. machine learning.
>> Oh, perfect. I feel like it's
everywhere. My streaming service uses
it. My phone uses it.
>> You get the idea. So, the audio podcast
works all fine. On the video side, you
can generate an explainer video,
documentary, presentation. Here's the
history where you can pull up prior
videos. I'll just mute this right here.
Or yeah, let me give that a shot. I'll
just put this on no volume. You can see
generate small micro videos to longer
videos. And then just like the rest, you
have the ability to do deep research,
fast research using Gemini's actual
APIs. If we go to view right here,
you'll be able to see prior versions of
the research such as this very
sophisticated query I sent. Why is
machine learning so cool? Comes back
comes back with sources. There's fast
mode. There's deep mode. On the study
materials, this section took quite a
while, but you can click on flashcards,
quiz, study guide. If I click on flash
cards, it will ask me do you want basic,
intermediate or advanced? Any specific
focus area number of items. So I can say
like give me 15. Let's do the intro to
machine learning course. We'll click on
generate. It will go and actually put
together the flash cards. And when it's
ready, it will pop it up at the left
hand side or the right hand side of the
screen. And you'll see I can just go
through all of my cards. I can expand
the screen. I can download them. I can
use keyboard shortcuts to go through all
the functionality that you'd want. And
then if you want a mind map, then we can
upload even more resources here if we
want. And then when we generate the mind
map again, we can customize it to our
needs. When it's ready to go, it will
use behind the scenes basically mermaid
to allow us to pop this up, zoom in,
look at all the relationships, and very
similar to what you'd see elsewhere. And
then you'd go to let's say quiz put this
together. If you say basic and click on
generate and you get an interactive quiz
that you can go through. Go to the next
question. I'm going to yolo this
response. Okay. Clearly I'm incompetent.
So I will go to FAQ now. Click on
generate. This will also seemingly
[snorts] work the same way. This will
pop up on the right hand side of the
screen. And you get the idea. So you'll
see all these FAQs generated. And the
beauty is everything's really fast. And
the reason why it's fast is the way we
built it. Just to end off things like
data tables, reports, slide decks. If I
click on slide deck, this will not just
create a normal slide by slide
breakdown, but it will also allow you to
download it as a PowerPoint file. And
you can see right here we have slide by
slide, agendas, and bullets. And if we
want, we can click on download as
PowerPoint. This right now the way I've
configured it is not to be overly
beautiful but it works and it's well
structured and it gives you the
foundation of actually building all
this. So it even renders the markdown.
And beyond that we do have things like
infographics which will use the nano
banana API. We can pick an infographic
or digital illustration style. Let's say
yeah let's do digital illustration.
We'll do colorful colorful and bold.
We'll click on generate and then this
will ping basically the API responsible
for generating images and it will come
back with the payload. And there we go.
We get not one but four different images
that we can download at the same time,
expand, go through one by one and it's
very colored. I definitely didn't know
it was going to be this colorful, but it
works. But beyond that, there's more.
Obviously, I don't want you to think I'm
full of it. So, let's actually send over
a request to see how it works. If I do
something like summarize the key points,
this will take whatever's in the sources
and come back with a response grounded
in what those sources say. So you'll see
right here it says machine learning is a
subset of AI that allows systems to
learn and improve from experience. It
has the direct sources. You can click on
the sources, go to the original source,
and if there is a specific snippet, it
should show it here. You can always go
back to different chat histories to go
through your conversations or do
something that's harder to do on the
normal Notebook LM, which is downloading
it in whatever way you want. So, if you
want to download the entire set of
assets, chats, etc. from your personal
notebook LM, then this will export it as
a zip, and you'll be able to navigate
through all the resources. And you can
see right here, it's a 10 megabyte file.
It takes through all the history, all
the images, everything that you've done.
And the best part is you control
everything. But wait again, there's
more. There's much more. You have the
ability to go to settings and then add a
system behavior where notebook can take
the persona of a simple explainer, a
critical reviewer. You can have
different preferences on response
length, uh, tone, inline style, etc. And
then the part that I really built this
for, which is the ability to go to your
home screen, go to your settings, and
then go to the API request builder. Then
you can pick whatever operation you
want. Whether it's chatting directly
with the notebook via API, listing the
sources, generating the multi-person
podcast audio externally, generating the
video, deep research, flashcards,
everything you can imagine. And the cool
part is if we go into any this is an
example of me pinging this exact same
resource asking the following question.
So the question I asked here is what are
the key insights from this document? I
can just change this to say what is this
document
about and then we will execute this
step. This will ping our API that's
actually hosted on a real server and
come back with a response. Is it machine
learning? Yes, it is. It says this
document provides an intro to machine
learning. So there it works and you can
see the citations right here. You can
see the suggested questions that come
after the input tokens as well as the
model used, which in this case is the
Gemini 2.5 flash model. You can use
whatever you want. I just happen to make
certain decisions based on familiarity
with the response patterns of certain
models versus others. And if you want to
use this anywhere else, I made it so
that it creates an HTTP request for
NADN, for make, for Zapier, and all you
have to do is crank out your own API key
for your own service and then hit it
from wherever you want. And for whatever
reason, if you want to be able to share
the same infrastructure with your team,
a colleague, a partner, this is an
accountbased system, meaning you can log
in, log out, create different accounts,
and each account will persist all the
notebooks and assets of those notebooks
within that account. And lastly, the
most important feature, the theme. Right
now, it's on the boring purple, but you
can go to my favorite midnight blue or
the cool crimson. change your
experience, change your environment, and
go from there. So, now that I've proven
to you that I've recreated and added on
top of an existing platform, all from
scratch using Cloud Code, let's get down
to brass tax as to how I actually put it
together. So, to bring this to reality
with Claude Code, all we had to use was
Superbase, Versel, the Gemini API, and I
added one more API on top of it for the
video gen because the Veo 3 models from
Gemini are eyewateringly expensive. so
expensive that I accidentally spent a
hundred bucks just trying to iterate and
build the app. But theoretically, I
designed it this way because all of
these tools have a fairly generous free
tier if you just want to take the
barebones cloud code plan, the barebones
superbase plan, and the barebones versel
and Gemini pay as you go models.
Theoretically, you could make this GPDR
sock 2 compliant by swapping out all the
services with, let's say, Amazon Bedrock
to use the cloud models or to use Gemini
on Vertex on their cloud. Or you could
switch it up to run on local models if
you want. And like you saw, there are
three core ways to interact with my
version of the app, which is not just
the web app that you know and love, but
the API as well as anywhere else like
Nitn or Zapier. And just in case you're
not imagining big enough, up until now,
doing rag and naden has been possible. A
lot of people use Superbase, the Gemini
file search. This is probably one of the
most convenient and cool ways to have a
rag externally since it's all done at
the browser level. Now, the TLDDR of the
app is that it's built on what's called
Nex.js and it sends requests to Versel,
which is not only hosting the actual
platform itself, but on top of that,
it's hosting all of the different API
endpoints. And if you don't know what an
API endpoint is, it stands for an
application programming interface, which
means that you can just hit the back end
of a service. So in case that doesn't
really register, whenever you use
something like Gemini or Chat GBT on the
front end and you say make me an image
or make me a video of a monkey running,
that behind the scenes is calling
different tools and services. Those
services are the video generators, the
image generators, the audio generators,
etc. And then when you make that
request, it goes and does the dirty work
behind the curtains and brings you back
the result. So what makes builds like
this powerful and all kinds of things
that you can do in cloud code is now we
live in a world where everything is
modular. You can have the foundation
built the way you want. Then you can
port in whatever services that you want
that you feel are best for whatever
you're trying to do. And to store and
support the app, we have Superbase
storing everything in the database. We
have Gemini responsible for the majority
of the rag since it's quick, it's
efficient, and most importantly, it's
cheap. And then when it comes to storing
everything, so you can download
everything at once and export your
entire notebook. This is all being
stored at the Superbase level in
Superbase file storage. So diving deeper
in the nitty-gritty, we have cloud code
where we're building basically our
version like we said of notebook LM. We
have Gemini texttospech API for the back
and forth podcast. We're using the
Alibaba video model. I use this one
because it's 10 times cheaper, if not
more, than the Gemini 3.1 latest model.
We have, like I said before, Superbase,
which is amazing as a vector database
for authentication. And then we're using
Versel, like I said, for deployment. And
then we're using fast API. This is a
framework that lets you create these
services that I referred to before that
you can hit from any other application.
And if you're a bit more curious on
this, give me a second. Now, the app as
a whole has 50 API endpoints. And these
endpoints include the ability to talk
to, create, list all the notebooks, the
sources, go directly to chat with an
existing notebook, and create all the
assets that I showed you before. And
obviously, I'm not the one who built
this myself. I'm the one who told
Claude, "Okay, let's build the
functionality. Let's use this service,
and then let's create a way that I could
access said service or create said asset
from said service externally." And if we
go back to the app, you'll see that at
the right hand side of the settings tab,
I created this API documentation tab
where you can click on everything from
chat to global chat to audio overviews,
video overviews, the study materials. It
will show you each and every way that
you can interact with this service. And
in case you're not technical, the
biggest TLDDR of TLDDRs, when you see
the word post here, this just means that
you're sending something to the service
and you want to get back some form of
response. A get basically means that you
are trying to retrieve what's available.
So if you're asking what notebooks do I
have, what are their names? This is
where doing a get request would make
sense. Patching is really for updating,
which you would rarely use, and deletion
is deleting. What happens when you send
a query or request? Well, behind the
scenes, you ask question. Your question
goes to the back end that I've now shown
you. This goes through, sees the
research papers, any YouTube videos,
anything you've added to the sources. It
sends the context and the question to
Gemini to the file search API and it
comes back with a response with
citations. Now, to get to the point
where you can actually ask the question,
you obviously have to upload a file. So
when you upload a file, it goes to
superbase storage and then it gets
synced and sent to the Gemini files API.
This basically does rag on the fly. Now
you can use whatever rag you want, but
for my purposes since I'm already using
Gemini in quite a few areas, I just use
the file search API as well. And
primarily because I'm lazy because that
API already supports things like PDFs,
DOCX files, text files. So I didn't want
to have to go the next natural step and
teach Superbase exactly how to handle
those files and how to rag them. I want
to take the path of least resistance. So
what happens is when the user sends a
query, that query goes and pings the
Gemini file search API which has its own
versions of these notebooks. It then
goes through the chunks associated with
those notebooks to find the most
semantically similar the closest match
to the vector coming in. So the query
how does X work turns into a vector.
That vector is then sent to the file
search API to look for the closest
matching vectors and then you get the
response along with the citations
associated with said response. And the
key trick here is because notebook LM
allows you to ask a question to all the
sources in the notebook, we had to make
sure that when we send a question from
the user interface, it goes and queries
and checks all the knowledge sources in
that UI. And this is really where the
devil in the details come in where
having something like Claude Code as
your companion can help push you through
all of these conceptual barriers. Now,
how did I make the audio overview
podcasting work? Am I a prodigal genius?
Absolutely not. What I did do is take
advantage of the fact that Gemini has a
texttospech API that allows for
multisspeaker. So step one is when we
send the request, it looks at the
sources that we have. It then injects
the context of that source and creates a
script with Gemini 2.5 Pro. You can use
whatever you want. It then creates the
text to speech in multisspeaker mode
using Gemini text to speech which I
think also uses 2.5 as the base and then
it creates a MP3 audio file that we
render on the user interface itself. And
like we said before we have a deep dive
version, a brief summary version and
then we have a debate where they just go
at it for the video. Even using the
Alibaba model at scale, especially if
you want to do like a 10-second video or
five of them, it'll still cost you
three, four, five bucks, but it's way
cheaper than Veo, which will cost you an
arm and a leg. So, the way this one
works is, again, we use 2.5 Pro to
create the description of the scenes.
Now, it's 5 seconds or 10 seconds, so we
don't have too many scenes here. And
then it goes and sends it to the Atlas
cloud. That's where the Juan 2.5 model
from Alibaba is hosted. Again, you can
swap this out for whatever API you want.
All you'd have to do is give the
documentation of that API, throw it into
cloud, say go and swap one for this
other one, pun intended, and then go
from there. And then unlike before we
had a MP3, this result output would be
an MP4. And to render it on screen, this
is why this is stored in superbase
storage and allows you to share it as
well. On the deep research side, like we
said before, you could send whatever
query you want. So you could say, who is
Mark Kashef? And I could say research.
This will go and do its thing. And
behind the scenes, this is what it's
doing. It has the Gemini 2.0 flash fast
mode. You could update this to the
latest model, Gemini 3. Whatever you
want. This is just super cheap, and I
wanted to prove out the concept more so
than worry about the quality, per se, or
you could use the Gemini 2.5 Pro deep
research mode, which is marketkedly more
expensive. So, be careful with that one,
too. Behind the scenes, do you as the
user send a query? That query is sent to
the backend. That backend then sends it
to the Gemini API to generate a report,
generate citations associated with that
and pull that from the API and then pull
that over to Subbabase and then display
that on your user interface at the end.
And this is where you see the response
back in chat when you come back with a
response. Now the study materials, how
did we take care of these? So the
flashcards, the quiz, all of these I had
to target one functionality at a time.
That's the key thing with vibe coding
that a lot of people miss, especially
when they talk about all these loosey
goosey frameworks to oneshot a whole
app. You don't oneshot this. You build
this incrementally, one feature at a
time. As you build each feature,
technically each feature is a separate
chat because each one needed a little
back and forth to perfect the way it
behaves, the way it works, which API
it's using, how fast it was, how it
rendered on screen. These are all
details that matter. Which is why when
people go on X or YouTube and say, "I
can build this whole thing in like 10
minutes with one mega prompt and run
Claude autonomously on its own." Odds
are you won't get the level of detail or
output or quality that you're looking
for. So for the study materials
generation, we're using 2.0 Flash across
the board. And starting off with flash
cards, by default, we're generating 10
flash cards. And once we have all the
content and the raw content, assuming
that there's a custom AI personality, it
will take that into account, which is
why this whole interoperability of the
app is important. If we set a notebook
setting for it to be more authoritative
or simple, that should be injected in
most of the features outputs. So then if
there is a custom personality, that's
injected into the prompt. And then we
get the flash cards and then usage
stats. Basically if you pass or fail
guessing it, it will document that and
then store that in superbase. So all
these functionalities here are basically
doing the same thing where it injects
all the context in memory within the
context of creating the flashcards or
the quiz and it sends it to different
system prompts. The one for quiz will
say go and make x amount of questions
and answers based on all this material.
It will then inject all that material
inside. Meaning there is an upper bound
the way I built it for how many sources
you can add because Gemini as of today
has a million context window. If you
have I don't know 500 pages or 700 pages
you will start to hit the limit of said
context. Now are there ways around that
especially engineering ways? Absolutely.
I wanted to build a foundation and give
you the foundation so you can do
whatever you want with it. Now the
second last set of features are the
creative outputs which is again the data
table export reports slide decks and
infographics. And the way this works
behind the scenes is you have the core
sources and then you have the Gemini API
and then we are always outputting the
source output as a JSON and then from
that JSON we transform it to whatever
the end state needs to be. This applies
to the PowerPoint and the report
primarily because those need to be
transformed from JSON into a PowerPoint
file and a DOCX file. The infographic
from Gemini comes back as a JPEG which
we render on screen as a PNG. So you
don't really have to worry about that.
And the data table renders as an Excel
CSV file. So phase 2 is taking that data
and I chose to use free JavaScript
libraries to create DOCX files in
PowerPoint. But here's where you could
add your own flavor to use a cloud skill
like the PowerPoint skill or the Excel
skill and use the API associated with
that to be able to really make them
beautiful and powerful. And last but not
least, once it's stored in Subbase, we
want to make sure that we can actually
autod download it from the browser
itself. And if you're non-technical,
this is what's happening behind the
screen. User clicks generate. This
creates the slide deck that comes back
as title, sections, and slides in the
JSON itself, just data. And then we
store that in superbase and then we go
and create the PowerPoint files and docx
files and CSV files from that as a
result in a format that you can download
onto your computer. As you can see here,
there are multiple passes to go from
click to output. And last but not least,
when you want to configure the settings
of the notebook to be simple explainer,
critical reviewer, you want to set the
preferences, a lot of this is prompt
manipulation. So if you go to this part
of the screen, no persona is just
business as usual. Critical reviewer
questions assumptions, finds weaknesses,
basically pushes back on you. Simple
explainer, self-explanatory. Technical
expert is going to be more on the
technical side of things. Creative
thinker probably what I would choose.
And then custom is where you can write
your own custom instructions that apply
to the response that you get back from
the Gemini file API. So the output
preferences once again are pass through
parameters in the system prompt. So when
we go to here anytime you trigger an API
call this will build the persona
instructions and then it will inject the
persona instructions within any API call
we make to Gemini so that when we ask a
question we're always injecting this in
the payload of the API call to that
service. So so long as these custom
instructions persist this keeps getting
injected. So, that's an overview of the
app from an architecture standpoint.
Now, I'll go through an example of what
a PRD or a product requirements document
might look like to be able to accomplish
and build this for yourself. And on top
of that, I'll walk through some starter
prompts for each type of functionality
that you might want to be able to think
about. And I'll make available to you
along with all of this, a whole care
package in the second link in the
description below. Now, to make this as
straightforward as possible for you to
take some of these files and recreate
the whole process and really make it
your own, what I did is initially to
create the V1, V2, V5000 of this
project, I went to Perplexity Pro,
specifically the labs feature. I like
labs because you can ask it to go look
for the latest documentation on a series
of different software or APIs and then
tell it optimize and create a markdown
file that is also optimized for the
latest version of Claude Code. So while
it goes and researches the requirements
of the APIs and services that you need,
it's all grounded also in the latest
version of cloud code. So it takes that
into account when putting together that
prompt. But naturally, I had to make
many iterations to get to the point of
the demo that I've been showing you. So
what I asked it to do is a bit of a
retroactive exercise, which I honestly
recommend for all of you, whatever it is
you end up building. So, it's one thing
to write an initial plan, but it's
another thing to finish exactly what you
set out to finish, and then you go back
and tell Claude Code something along
these lines. Go through all the code
that you've put together and all the
features that we've implemented ever
since this initial plan. And then you
can tag your initial plan. And the best
part of this is that over time as you
build different projects, every single
time you do this mega reverse prompting
on the project, you not only learn what
you could have done better in the
planning, but even if you're not
technical, you'll start to learn these
concepts by osmosis. So, if we take a
look at this vision document, obviously
there's a lot here, but you'll notice
the way it's designed is it's using
something called asi art. And as art is
right here where it creates these tables
and these diagrams. So instead of
actually having a mermaid diagram, this
is written in markdown. So Claude can
actually read this by the way. So if you
have a way to communicate how a system
should flow, this in my opinion is one
of the mega hacks that you can do to
convey that to the AI. So in here we
start off saying for Claude, this is how
to use this document. This document is a
three-part specification for building
notebook LM reimagined. Before starting,
ask the user to set up these
prerequisites. So I made this for you so
that if you are less technical this will
ideally hold your hand and tell you okay
go and make a superbase project go and
get the authentication token so you can
use the superbase MCP to make life
easier. Go and set up Versell because
Verscell is where you're going to host
the platform as well as all the API
endpoints that we'll hit externally if
you want to be able to use it from
Nenm.com or Zapier. I also tell it read
the documents in this order. So, first
the vision document, two the project
spec, and three the implementation
guide. Now, one thing that's heavily
underrated is talking to Claude like an
actual human. So, it's one thing to give
it specs. I see people just give it a
grocery list of things to build over
time, but it's a completely other thing
to tell it this is why I want you to
build this, like this is the inspiration
behind it. If you can explain the
context very deep into a project, you'll
be surprised. it might do something
thoughtful that you didn't expect
because it understands the core
foundation of the direction and where
you want to go. So it's one thing to say
I want to build a clone of this app
where I can autorag a series of sources
and then do all these following features
versus say I wish that I could give this
product the ability to interact with it
external from just the browserbased
platform. So here we go through the
design principles. So API first always
meaning every single functionality that
you make make sure that you're thinking
at the back of your head how you can
make this accessible externally and then
in this case you can use whatever
database you want for a lot of vibe
coders I usually recommend superbase
because it's MCP is good enough when
you're building the MVP that it can take
control take control of building all the
tables adjusting the tables and building
the edge functions that you need and
then as we go through this is the
architecture overview so I'm giving an
example of how we want to be able to
interact with the back end. And this is
the series of gaps that it filled. So
these are all the API endpoints that
it's going through and it tells you API
notebooks sources chat audio video.
So this makes it abundantly clear
exactly what the end state of this API
should look like. And then you have the
services we'll use, what we're building,
and all the features and sub features.
And then I tell it what we don't want.
So in this case, I didn't want to build
this to run locally. I wanted to build
it so I could run it in the cloud. But
you could totally make a version of this
that's like an open notebook LM where
you could make it run on everything
that's on your desktop. And then it goes
through what the user experience should
look like from a developer standpoint
for no code users for Gemini different
model references. So here's where we're
mapping all the models to the features.
So you'll see right here, if we want a
fast chat, then we'll use Gemini 2.0
Flash. You could use three, you could
use 2.5, whatever you want. For text to
speech, we're telling it to use this
specific model for typical operation
costs. This was helped quite a bit
initially with perplexity labs because
it went and searched all the cost of
these services. The philosophy, so this
is what notebookm is and then notebookm
reimagined is here's our platform, build
whatever you want. So basically build a
solid ground that's modular enough that
any one of you can hook up whatever
services or swap whatever services that
you're looking for. And then we go
through and we tell it next steps and
this is also underrated at the end of
the document to tell it what is the next
step since there's some recency bias
there. So this will exit the first
document and then let's remove this go
to this project spec. This will go
through a table of contents through what
the MCP setup should look like. How we
should use AI studio in terms of the
APIs and all the models that we need to
use. The Verscell setup. Again, I use
the Versell MCP to make my life a little
bit easier. You go, you create the
account, you hook up the MCP with one
line. If you don't know how to do the
oneline setup, all you can do is go to
something like a perplexity and say, can
you give me the onelined installation
command that I can use to install the
versel MCP and then give me any
extraneous links that I would need to go
grab whatever tokens or whatever
parameters I need to fill in that one
line. So, we can make it a oneshot
operation. So this tells whatever
research platform you want to go and
look through the documentation to come
back with that command and ideally it
should write in caps what you need to
swap out. So these are nine times out of
10 actual API keys or tokens. So you can
see here it tells you to grab your uh
versel token and it tells you this is
the command that you paste. It would be
this whole thing right here and then you
would sub the token from here that we
grab. and it tells you what you need to
fill in, where you can fill it in from,
and the format you need to fill it in,
and then you should be good to go from
there. So, you copy that once you have
those credentials. You put it in your
cloud code. It won't work right away.
You'll have to restart cloud code or put
a brand new session in place, then do
/mcp, and then make sure that it's
there. And beyond that, we get a glimpse
of the system architecture, the
directory structure. This is really
advanced. This basically is telling it
how to organize everything. Again, much
easier to be retroactive than proactive.
But once you see this, again, especially
if you're not a technical person, you
can start to see the logic of how the AI
likes to categorize different, in this
case, Python scripts, which are a proxy,
a direct proxy for functionality. So,
when it comes to routers, you'll see
that all of these endpoints like
creating the notebook, the sources, the
chat audio, all of them are under this
routers folder. So, you can start to see
how even the front end is organized. So
even if you have never been able to
appreciate this as a non-technical
person, you can start to really learn
from how this is applied. These next
section tell it how to interact with the
superbase MCP. You can override these
features. This is the way that I built
my table. You can make it your own. And
then the rest of this are a series of
specs that you can read because again
I'll be giving this to you. And last but
not least, we have the implementation
guide which basically tells it what the
prerequisites are if you want to get
started. I made it so that it encourages
you to use the Superbase MCP and the
Versell MCP if you want to use different
services. Completely up to your heart's
desire. And then asks you for a Gemini
API key. And lastly, it asks you if
you're interested in video, if you want
to go to Atlas Cloud since it has a much
cheaper video model. It's not as good as
Gemini Veo, but infinitely cheaper. You
can swap in whatever you want. And then
this goes through again a series of
specs for all the superbase stuff that
needs to be built. the implementation
order. Again, if you are a developer and
you understand what you're looking for,
this is where you'd want to either
ignore this, remove this, swap it in for
what you think would be useful, then the
rest of this gives the rest of the
foundation it needs to do what it needs
to do. And most importantly, it gives it
this checklist. Because the context
window of cloud code maxes out so often,
especially when you're using an MCP
where it's taking tons of tokens to
write tables, get feedback, and search,
it's good to have this checklist where
it's not just a checklist for
checklist's sake. You make it literally
check off everything it finishes. And
this is helpful because even as you
compact conversations, if it's checked
off phase one and a part of phase three,
like up until here, you'll be able to
tell it go and refer back to the
implementation guide and pick up where
you left off. And then it should see
that these are the remaining ones for
phase 2 and then go from there. So now
that we've seen the demo, we have a
decent understanding high level of how
everything works. We've seen the project
requirements documents that you'd need
to go on this journey. What are some
good best practices for prompting? Once
again, I got you covered. So all you'd
have to do is theoretically take those
three files, the implementation guide,
the vision document, and the other one.
Take that and put it into a blank slate
brand new cloud code folder. And you
could use this in whatever ID of choice.
You could use cursor, you could use
anti-gravity. And then you would do
slashinitialize. Slash initialize will
just push claude to read those specs and
those guides and create a summary to
itself called claude MD. Once we have
cloudmd, then you can run a prompt
similar to this where you can say
execute the implementation plan in
cloudmd from start to finish. You have
full autonomy to set up superbase with
the uh superbase mcp. Now, I tell it
this because Claude will usually warn
you. If you're already paying for
Superbase, then it will say this will
cost you 10 bucks. If you're a brand new
user, it's one of your first couple
databases. I believe it is free. They
have a decently generous free tier. And
then it tells it to create the front-end
scaffolding. Basically, build the
framing of the house before you put in
the furniture and build everything else.
Build the foundation, implement the core
features, test by interacting with local
host. One thing you can also say is now
that we have an updated version of cloud
code where it can use claude in Chrome,
meaning it uses this extension, you'll
see right here to interact with local
host, you can make it go and check its
own work. So essentially, you let it run
for a more autonomous period of time to
go through look at your specs, compare
what it sees on screen, and this will
save you a lot of back and forth. And
one very important thing that most
people don't care about, especially if
you're not a dev, is tell it to commit
progress incrementally. Ideally, it
shouldn't take you too much time,
install GitHub, make an account, install
the app on cloud so that you can create
a project, and incrementally keep
committing things as you progress
because things will break. And sometimes
you'll get to a point where 80% of the
stuff you want is there, but it's been
built wrong. So, the last 20% isn't even
possible. So the more that you have
checkpoints, the higher the likelihood
that you'll be able to rewind, go back
in time, and then build the right way.
For the database and backend, you can
say design and implement the database
schema for insert X feature. And these
are the requirements. Create tables with
proper foreign keys and indexes. If you
don't know what these words mean in
plain English, a foreign key allows you
to create relationships between
different tables. Because if you have
one table for the notebook and one table
for the audio related to something you
created for that notebook, ideally there
should be some relationship between them
so that they if you have an API call
you're trying to make if you're trying
to marry them in some way that it is
possible. The rest of these are slightly
more technical but again this will be
all available to you so you can read
through it after. And then next are
feature requests. So let's say you want
to add newer features and the ones that
I've added myself. You could say
implement feature name with the
following requirements and you could
write a user story. So this is where you
can take on the hat as a product person
and say as a user I want to do X action
so that you get Y benefit. Again the
more human context you can give the
higher the likelihood they will do a
good job. Another thing is if you can
open brand new sessions for each feature
like I said before ideally each session
should be one core feature of the app.
You shouldn't set up authentication and
then go and make the UI pretty and then
go and add audio mode all in the same
chat. The context will bleed, things
will mix, things won't work, or things
will take infinitely longer than they
have to. Now, UI-wise, you can bring in
Gemini as a second set of eyes because
it's better at UI. For me, I just brute
forced Claude code to become better. I
would still say it's not beautiful. Uh,
you could continue and make it better,
but you can say this component feels
cluttered. Simplify the layout. The
loading states are jarring. And if you
don't know what a loading state is,
pretty much when you go through the
different parts of the app, notice here
how it's loading fairly quickly. If I go
to settings and I go to API
documentation, it is almost
instantaneous. When I load a brand new
source, it is pretty quick. It wasn't
quick for most of this build. It was
actually very slow. And sometimes it
would show different parts like it's
called a skeleton. It shows the skeleton
of the page while it loads because it's
taking so much time. So if you choose to
embark on building something like this,
then once you have the 8020, you want to
start really focusing on the 20, which
is why is this taking 10 seconds to
load? Why is this showing weird
components on the page while it loads?
It's probably indicative of something
bigger or a bigger problem in the app
itself. Now, when it comes to quality
control, you can spin up a separate
session that goes and looks through the
codebase and make sure that the
important things are working that all
the API endpoints return real data.
Sometimes Cloud will do this really
naughty thing and create something with
fake data. So, it will say, "Oh, I'm
done the feature." But it's actually
lying to your face. It just put fake
data in that place. And that happened
with video where it said, "Okay, I know
how to use the video API. I just
generated a sample. Go and take a look.
And then I would go write a query,
generate the video, wait 3 minutes for
absolutely no reason. It basically ran a
simulation of how long it could take to
create said video and then created a
template that I physically couldn't
play. Which is why, again, the less you
take on at once, the better. Now, when
it comes to bug fixes, some people are
amazingly lazy at not giving the right
instructions. Because if you say the app
is slow, Claude has free reign to delete
things or remove things that could speed
it up at the cost of functionality,
security, etc. Ideally, you want to say
feature is not working as expected.
Current behavior, this is what happens.
Expected behavior, this is what should
happen. Steps to reproduce, step one.
Step two, investigate the root cause
before implementing a fix. Then you can
say ideally report back to me what you
plan on doing and which files might be
affected by you doing that thing. One
last thing I'll give you is the power of
exploration. So let's say you're
building the app and for whatever reason
maybe notbookm there's another product
where you love a particular feature. One
cool thing I like to do is log into that
service, give Claude permission to use
Claude and Chrome using that extension
to go and take over my browser, go and
interact with that app and learn about
the user journey or the service that I
really like or the component I really
like and try to recreate a version that
we can marry and merge into our existing
app. So the structure would be analyze
competitor reference ideally give it the
domain and say you're logged in and then
focus areas focus on the user flow from
start to end start to end rather how
they handle specific feature UI patterns
worth adopting features that are
missing. Sometimes one of the best ways
to get cloud code to be better at UI is
just tell it go and look at this website
go and navigate it and look at how
seamless it is how buttery it is. Look
at all the different colors that are
easy on the eyes that aren't as boring
and purpley as you do by default. So
with that, you now have an understanding
of the prompts, of the process, and more
importantly of the ultimate goal. So we
live in a world right now, which is why
I'm harping so much on cloud code this
year, where you can build 80 to 90% of
whatever you want. It's just a matter of
you putting in the work and
understanding what are all the piece
parts involved to create this beautiful
monstrosity of an app that you can run
locally and you want to build it
incrementally. So you want to focus on
building locally first and then once you
build locally you graduate to thinking
about what if someone else wanted to use
this if you want someone else to use
this. A good example of this are the API
endpoints. So when you go to the
settings here and you want to create the
API request builder this by default
cloud will make it local host. So it
will make it so you can run it locally
on your computer. If you have
microservices that you can interact with
you have to realize what you want as the
end state to tell cloud you know what I
want to deploy all these endpoints on a
server so other people can use it and I
can use things like nen or make to
access it remotely. So, like I said,
I'll make all the resources I just
showed you in this video available in
the second link in the description
below. Now, if you're feeling
particularly lazy and you just want the
outcome without all the hard work, then
naturally, I do make all of these repos
available to my exclusive community
members in my early AI adopters
community. So, if that interests you,
then check out the first link in the
description below. And lastly, and most
importantly to me, if you found this
video helpful, if you like this level of
depth, this probably took me 40 to 50
hours to build, plan, design it, and
create the story so I could try to
educate as best as I can all the
different processes that you'd need to
build something like this. So, if this
did help you, I would be infinitely
grateful if you left a comment on the
video or shared it with someone because
it helps the video, helps the channel,
and gives me the courage to take on
bigger builds like this and show them to
you. And with that, I'll see you the
next
Join My Community to Level Up ā” https://www.skool.com/earlyaidopters/about š Grab the Complete Build Kit: https://bit.ly/45UylIq š Book a Meeting with Our Team: https://bit.ly/3Ml5AKW š Visit Our Website: https://bit.ly/4cD9jhG š¬ Core Video Description What if you could build your own NotebookLM - but actually own it? In this build walkthrough, I rebuild Google's NotebookLM from scratch using Claude Code and make it genuinely useful for power users. You'll see exactly how to create a RAG-powered research platform with document chat, multi-person podcast generation, video explainers, deep research mode, and study materials like flashcards and quizzes. The key difference from Google's version: everything is API-first. The web UI is just another client - you can hit your own NotebookLM from n8n, Zapier, or any custom app through 50 REST endpoints. I walk through the complete tech stack (Supabase for database/auth/storage, Vercel for hosting, Gemini API for RAG and TTS, FastAPI backend, Next.js frontend) and show you the exact PRD documents and architecture diagrams used to build it. If you want to learn how to ship production apps with Claude Code, this is the blueprint. ā³ TIMESTAMPS: 00:00 - Intro: Why I rebuilt NotebookLM from scratch 01:15 - The problem with Google's NotebookLM (no API, no ownership) 03:00 - Architecture overview: API-first design philosophy 05:00 - Tech stack breakdown: Supabase, Vercel, Gemini, FastAPI, Next.js 08:00 - Setting up the Supabase database schema 11:00 - RAG implementation: Document uploads and chunking 14:30 - Chat interface: Querying your documents with context 18:00 - Podcast generation: Multi-person audio with debate mode 22:00 - Video explainers with Alibaba's Wan 2.5 (10x cheaper than Veo) 26:00 - Deep research mode using Gemini's research API 29:30 - Study materials: Flashcards, quizzes, and study guides 33:00 - The 50 API endpoints explained 36:00 - Connecting to n8n and Zapier 38:30 - What's in the build kit: PRDs, diagrams, prompt templates 41:00 - Wrap up: Build it, own it, forever #NotebookLM #ClaudeCode #AI #Supabase #Vercel #RAG #AIAutomation #NoCode #BuildInPublic #GeminiAPI #FastAPI #NextJS #AITools #Anthropic #SelfHosted