Loading video player...
I built a mobile app that lets you turn
any object into a 3D model using Expo
and Meta's Segment Anything, aka SAM
models. In this video, I'm going to
break down what Meta's latest SAM models
are, how they work, and then I'll show
you how to build this exact mobile app
step by step so you can start creating
your own 3D models from Input Photos.
It's going to be a lot of fun. So, let's
dive into it. [music]
So, first let's talk about Meta's SAM
models. Meta SAM models, short for
segment anything models, are a new
family of AI vision models designed to
understand and reconstruct the visual
world in both two and three dimensions.
At their core, these models perform
image segmentation, which means they can
find and isolate objects inside images
by generating precise masks that outline
every object you want to work with.
Earlier versions focused on interactive
segmentation based on clicks or boxes.
While the latest SAM models can even
handle text prompts to describe what you
are looking for in an image, like a dog,
an animal, or people. And building on
this, Meta has released SAM 3D, a set of
models that go beyond flat segmentation,
which can actually reconstruct full 3D
shapes. So, there are two main variants
of this model. One focused on everyday
objects and environments called SAM 3D
objects and another tailored to full
human body shape and pose reconstruction
called SAM 3D body. We're going to be
using the SAM 3D objects model in our
app later in this video. But first,
let's examine how SAM 3D actually works.
So, with just a single 2D image as
input, SAM 3D can infer the depth,
geometry, and texture of objects and
even entire scenes by producing 3D
meshes. They can even extract full 3D
objects if they are partially obscured
in the input image. Well, not always,
actually. I'll show you what I mean
later in this video. Okay, but how do
these models actually do that? Well,
first the model feeds the image and the
object mask through a vision encoder
that learns what the object looks like
and where it exists in latent space.
Then they run that shape through a
flowbased transformer that maps it into
a set of 3D Gaussian primitives. And
those primitives are then rendered using
a technique called Gaussian splatting,
producing a continuous 3D structure that
behaves like a solid object, but is
actually a giant point cloud. That point
cloud can then be transformed into a
solid mesh. Meta actually has some
pretty cool demo playgrounds where you
can try the SAM 3D model for yourself
and see what kind of results you can
get. So, what we're going to be doing in
this video is trying to recreate that
same functionality inside a mobile app.
So, next, let's talk about how we're
going to build the app. So, to create
our mobile app, we're going to be using
Expo. Expo gives us all the necessary
components we need for photo capture,
segmentation, and 3D model
visualization. We will host our SAM
models on a remote runpod container
running on an A40 GPU. You can use
something heavier like an RTX5090
if you want faster interference. I just
didn't want to pay the high cost of
hosting it. And we will create an API
interface so we can communicate back and
forth between our container and our app.
So basically all our vision model
operations will be offloaded to a remote
runpod container. Now, I'm not going to
go over the code in full detail because
that would make for a very, very long
video, but I've published the entire
codebase for both the API and the front
end on GitHub. And I've also linked the
repositories in the description below,
so you can go ahead and clone those
repos and follow along. We will be using
two of the SAM models for this project.
SAM 2 for object segmentation and SAM 3D
objects for object extraction. Now you
might be asking why are we using SAM 2
and not SAM 3 for segmentation. Well,
first of all, it's a good opportunity to
test both generations of the model. And
secondly, because as advertised, SAM 2
focuses on one-click segmentation where
you can just click on an area and it
will segment the object based on that
point. Whereas in SAM 3, they use a
dragable box around the object, which is
also cool, but I kind of want to test
out the one-click segmentation flow
because that just seems cleaner to me.
So, now let's set up our SAM models in
RunPod. The first thing I'm going to do
is spin up a fresh A40 GPU instance. In
the template settings, you will want to
increase the container disk size because
I noticed that the entire project takes
up about 50 GB of space. But just to be
sure, I will set it to 100 GB. And don't
worry about it because disk space is
cheap on RunPod. It's the GPU cost that
you're actually paying for. And then I
will also expose it to port 8,000
because we will use this as our API
interface. So with that configured,
let's go ahead and deploy our pod. Next,
let's copy this command and SSH into our
pod. And the first thing we want to do
is clone my API project. Once you've
done that, CD into the project
directory. Now, before we do anything
else, make sure that you have a
HuggingFace account with a valid access
token set up because we will be
downloading the SAM models directly from
HuggingFace. They might also ask you to
provide your contact information to get
access to the SAM models, but the
request approval is usually pretty fast
to get. Once you've gotten all that, you
can proceed to run these first two lines
as outlined in the setup. And this will
download the hugging face CLI and
authenticate you. Hugging face will then
ask you to input your access token. So
just copy paste that in the command and
you should be good to go. And folks,
I've made this super super easy for you.
You just need to run this single command
to set up the entire project. And this
script will install everything you need.
And you better get cozy and go make
yourself a coffee because the full setup
takes about 15 minutes to run. And this
is mainly because the SAM 3D objects
have a ton of dependencies it needs to
install and set up. And don't blame me,
I just copied all the instructions
outlined in their setup file. And boy oh
boy, it's a massive setup. But once
you've gone through all of that, you
should now be ready to run your app. And
launching the API is super easy. You
just need to run uvorn API column app
with the host of four zeros and the port
of 8,000. And if this is the first time
launching it, hugging face will go ahead
and download the SAM 2 models and then
proceed to launch the server. And if
you're seeing this message, that means
you have successfully launched the API
server. And we are now ready to move to
our next step, setting up our Expo app.
So for the front end side, it's going to
be a lot easier. You just need to clone
my front-end repo and then run npm
install and then launch the app by
running expo. The only change you need
to make here is at the top of the APIs
file. Change the SAM API base variable
to the URL of your runpod container. You
can find this by simply clicking on this
button on runpod and it will open up a
container URL in a new browser window.
So be sure to copy that and replace it
in the api.ts file. So, now that we've
finally set up everything, let's go
ahead and test our app on the simulator.
So, first let's choose an image from our
gallery. So, I'm going to try to segment
this dog. And if we click on its head,
you can see that it has been segmented,
but just partially. Whereas, if we do
the same thing on Meta's playground, you
can see that it successfully segments
the whole dog. And this is one thing
that I kind of don't like about these
playground demos. this kind of
segmentation with the beautiful outline
and everything. This is not the
functionality that you get out of the
box with SAM 2 as you would expect. Even
their own demos don't work the same way
as in the Meta Playground. But that's
okay. We can still use it effectively to
mask out our desired object. I just need
to add additional more points to segment
a larger area of the dog and we should
be good to go. But here's what is
actually cool about the SAM 3D model.
Although the dog is obscured in this
image, this should be enough for the
model to understand the whole context
and still produce a full 3D dog model.
And the generation will take roughly a
minute or half to finish. And this is
because I'm running this on an A40 with
RTX5090 or any other beefier GPU. You
would probably get faster results. And I
have absolutely no idea what kind of
state-of-the-art GPU are they running on
the meta playground, but their object
generation is lightning fast compared to
my peasant server. But anyway, the
generation is done and we can now see
that SAM 3D has successfully created
this lovely 3D dog model and it
successfully filled in the obscured
parts of the object as well. What you
see here in the app is a generated GIF
image of the model rotating around. The
process itself does generate a full GB
file of the object as well, but I did
not include a 3D viewer here in the app.
So instead, I made an endpoint on the
API titled assets list. So if you visit
that page, you can see and download all
the generated files of the models you
just segmented. And if we open the model
in a 3D software like Blender, you can
see that it has done a pretty good job.
Albeit it's a bit low on resolution and
detail when you zoom in too closely, but
it's impressive nonetheless. You can
easily use this technique to generate
some low poly game assets for a video
game or a 3D scene. So I tested this
model with a bunch of different images
like this old car for example and the
results were really really good in terms
of understanding the whole object within
the context. This is a really big step
up from SAM 2. So I will commend them on
this. The SAM 3D objects model works
really really well and the offuscation
compensation is really really powerful.
It successfully filled in the details of
the car and the dog although the objects
were partially obscured in the photos.
But can we push this to the limit? Well,
as I mentioned earlier, it does not work
perfectly every time. So, I tried
running it on this little squirrel
image, and I only highlighted the head,
and the results of the test were
interesting to say the least. I mean,
you can see that the model tried to add
some hands and feet, but the proportions
were totally wrong, and it looks more
like a Pokemon rather than a real
animal, which is cute in its own way,
but this just shows that good
segmentation is still necessary if we
want the models to output actual good
results. I did a couple of more tests
with items I could find around the
house, and overall, the results were
very, very good. I noticed that the
model was struggling to reproduce things
like labels on a camera or any other
text label for that matter, [snorts] but
the object shapes and proportions were
correct for the most part. I also took
the app outside to create some 3D
architecture models, and I even managed
to create a nice little 3D model of the
CN Tower of Toronto. So, there you have
it. That's how Meta's segment anything
models work in action. If you want to
dive deeper in detail how the
segmentation works and how the UI sends
the data to our API, check out the full
codebase on both of the GitHub
repositories. Links are provided in the
description. So, what do you think about
these Meta's new AI vision models? Have
you tried them? Will you use them in
your own projects? Let us know in the
comments down below. And folks, if you
like these types of long technical
tutorials and you want to see more of
them, let us know by smashing that like
button underneath the video. And don't
forget to subscribe to our channel as
well. This has been Andress from Better
Stack and I will see you in the next
videos.
[music]
>> [music]
We built a mobile app that turns a single photo into a full 3D model using Meta’s SAM 3D models. In this video, we break down how SAM 3D reconstructs objects from images, how Gaussian splatting fits into the pipeline, and what actually works or fails when objects are partially obscured. We also walk through the real app setup with Expo, a remote GPU backend, and share the full code so you can try it yourself. 🔗 Relevant Links Project API: https://github.com/andrisgauracs/sam3d-api Project Frontend: https://github.com/andrisgauracs/sam3d-mobile Meta's SAM 3D: https://ai.meta.com/sam3d/ ❤️ More about us Radically better observability stack: https://betterstack.com/ Written tutorials: https://betterstack.com/community/ Example projects: https://github.com/BetterStackHQ 📱 Socials Twitter: https://twitter.com/betterstackhq Instagram: https://www.instagram.com/betterstackhq/ TikTok: https://www.tiktok.com/@betterstack LinkedIn: https://www.linkedin.com/company/betterstack 📌 Chapters: 00:00 Intro 00:31 What Are Segment Anything Models 01:05 The Latest SAM 3D Models 02:09 How SAM 3D Reconstructs Objects 02:59 Project App Architecture and Tech Stack 04:38 Setting Up SAM Models on a Remote GPU 07:00 Setting Up The Expo App 07:33 Test Run 09:56 Testing The Model Handling Obscurity 10:59 Tests With Real Photos 11:27 Final Takeaways