This AI Turns Any Photo Into a 3D Object (Meta's SAM 3D) | DailyDevLists

Loading video player...

Full Transcript

2,215 words • EN

I built a mobile app that lets you turn

any object into a 3D model using Expo

and Meta's Segment Anything, aka SAM

models. In this video, I'm going to

break down what Meta's latest SAM models

are, how they work, and then I'll show

you how to build this exact mobile app

step by step so you can start creating

your own 3D models from Input Photos.

It's going to be a lot of fun. So, let's

dive into it. [music]

So, first let's talk about Meta's SAM

models. Meta SAM models, short for

segment anything models, are a new

family of AI vision models designed to

understand and reconstruct the visual

world in both two and three dimensions.

At their core, these models perform

image segmentation, which means they can

find and isolate objects inside images

by generating precise masks that outline

every object you want to work with.

Earlier versions focused on interactive

segmentation based on clicks or boxes.

While the latest SAM models can even

handle text prompts to describe what you

are looking for in an image, like a dog,

an animal, or people. And building on

this, Meta has released SAM 3D, a set of

models that go beyond flat segmentation,

which can actually reconstruct full 3D

shapes. So, there are two main variants

of this model. One focused on everyday

objects and environments called SAM 3D

objects and another tailored to full

human body shape and pose reconstruction

called SAM 3D body. We're going to be

using the SAM 3D objects model in our

app later in this video. But first,

let's examine how SAM 3D actually works.

So, with just a single 2D image as

input, SAM 3D can infer the depth,

geometry, and texture of objects and

even entire scenes by producing 3D

meshes. They can even extract full 3D

objects if they are partially obscured

in the input image. Well, not always,

actually. I'll show you what I mean

later in this video. Okay, but how do

these models actually do that? Well,

first the model feeds the image and the

object mask through a vision encoder

that learns what the object looks like

and where it exists in latent space.

Then they run that shape through a

flowbased transformer that maps it into

a set of 3D Gaussian primitives. And

those primitives are then rendered using

a technique called Gaussian splatting,

producing a continuous 3D structure that

behaves like a solid object, but is

actually a giant point cloud. That point

cloud can then be transformed into a

solid mesh. Meta actually has some

pretty cool demo playgrounds where you

can try the SAM 3D model for yourself

and see what kind of results you can

get. So, what we're going to be doing in

this video is trying to recreate that

same functionality inside a mobile app.

So, next, let's talk about how we're

going to build the app. So, to create

our mobile app, we're going to be using

Expo. Expo gives us all the necessary

components we need for photo capture,

segmentation, and 3D model

visualization. We will host our SAM

models on a remote runpod container

running on an A40 GPU. You can use

something heavier like an RTX5090

if you want faster interference. I just

didn't want to pay the high cost of

hosting it. And we will create an API

interface so we can communicate back and

forth between our container and our app.

So basically all our vision model

operations will be offloaded to a remote

runpod container. Now, I'm not going to

go over the code in full detail because

that would make for a very, very long

video, but I've published the entire

codebase for both the API and the front

end on GitHub. And I've also linked the

repositories in the description below,

so you can go ahead and clone those

repos and follow along. We will be using

two of the SAM models for this project.

SAM 2 for object segmentation and SAM 3D

objects for object extraction. Now you

might be asking why are we using SAM 2

and not SAM 3 for segmentation. Well,

first of all, it's a good opportunity to

test both generations of the model. And

secondly, because as advertised, SAM 2

focuses on one-click segmentation where

you can just click on an area and it

will segment the object based on that

point. Whereas in SAM 3, they use a

dragable box around the object, which is

also cool, but I kind of want to test

out the one-click segmentation flow

because that just seems cleaner to me.

So, now let's set up our SAM models in

RunPod. The first thing I'm going to do

is spin up a fresh A40 GPU instance. In

the template settings, you will want to

increase the container disk size because

I noticed that the entire project takes

up about 50 GB of space. But just to be

sure, I will set it to 100 GB. And don't

worry about it because disk space is

cheap on RunPod. It's the GPU cost that

you're actually paying for. And then I

will also expose it to port 8,000

because we will use this as our API

interface. So with that configured,

let's go ahead and deploy our pod. Next,

let's copy this command and SSH into our

pod. And the first thing we want to do

is clone my API project. Once you've

done that, CD into the project

directory. Now, before we do anything

else, make sure that you have a

HuggingFace account with a valid access

token set up because we will be

downloading the SAM models directly from

HuggingFace. They might also ask you to

provide your contact information to get

access to the SAM models, but the

request approval is usually pretty fast

to get. Once you've gotten all that, you

can proceed to run these first two lines

as outlined in the setup. And this will

download the hugging face CLI and

authenticate you. Hugging face will then

ask you to input your access token. So

just copy paste that in the command and

you should be good to go. And folks,

I've made this super super easy for you.

You just need to run this single command

to set up the entire project. And this

script will install everything you need.

And you better get cozy and go make

yourself a coffee because the full setup

takes about 15 minutes to run. And this

is mainly because the SAM 3D objects

have a ton of dependencies it needs to

install and set up. And don't blame me,

I just copied all the instructions

outlined in their setup file. And boy oh

boy, it's a massive setup. But once

you've gone through all of that, you

should now be ready to run your app. And

launching the API is super easy. You

just need to run uvorn API column app

with the host of four zeros and the port

of 8,000. And if this is the first time

launching it, hugging face will go ahead

and download the SAM 2 models and then

proceed to launch the server. And if

you're seeing this message, that means

you have successfully launched the API

server. And we are now ready to move to

our next step, setting up our Expo app.

So for the front end side, it's going to

be a lot easier. You just need to clone

my front-end repo and then run npm

install and then launch the app by

running expo. The only change you need

to make here is at the top of the APIs

file. Change the SAM API base variable

to the URL of your runpod container. You

can find this by simply clicking on this

button on runpod and it will open up a

container URL in a new browser window.

So be sure to copy that and replace it

in the api.ts file. So, now that we've

finally set up everything, let's go

ahead and test our app on the simulator.

So, first let's choose an image from our

gallery. So, I'm going to try to segment

this dog. And if we click on its head,

you can see that it has been segmented,

but just partially. Whereas, if we do

the same thing on Meta's playground, you

can see that it successfully segments

the whole dog. And this is one thing

that I kind of don't like about these

playground demos. this kind of

segmentation with the beautiful outline

and everything. This is not the

functionality that you get out of the

box with SAM 2 as you would expect. Even

their own demos don't work the same way

as in the Meta Playground. But that's

okay. We can still use it effectively to

mask out our desired object. I just need

to add additional more points to segment

a larger area of the dog and we should

be good to go. But here's what is

actually cool about the SAM 3D model.

Although the dog is obscured in this

image, this should be enough for the

model to understand the whole context

and still produce a full 3D dog model.

And the generation will take roughly a

minute or half to finish. And this is

because I'm running this on an A40 with

RTX5090 or any other beefier GPU. You

would probably get faster results. And I

have absolutely no idea what kind of

state-of-the-art GPU are they running on

the meta playground, but their object

generation is lightning fast compared to

my peasant server. But anyway, the

generation is done and we can now see

that SAM 3D has successfully created

this lovely 3D dog model and it

successfully filled in the obscured

parts of the object as well. What you

see here in the app is a generated GIF

image of the model rotating around. The

process itself does generate a full GB

file of the object as well, but I did

not include a 3D viewer here in the app.

So instead, I made an endpoint on the

API titled assets list. So if you visit

that page, you can see and download all

the generated files of the models you

just segmented. And if we open the model

in a 3D software like Blender, you can

see that it has done a pretty good job.

Albeit it's a bit low on resolution and

detail when you zoom in too closely, but

it's impressive nonetheless. You can

easily use this technique to generate

some low poly game assets for a video

game or a 3D scene. So I tested this

model with a bunch of different images

like this old car for example and the

results were really really good in terms

of understanding the whole object within

the context. This is a really big step

up from SAM 2. So I will commend them on

this. The SAM 3D objects model works

really really well and the offuscation

compensation is really really powerful.

It successfully filled in the details of

the car and the dog although the objects

were partially obscured in the photos.

But can we push this to the limit? Well,

as I mentioned earlier, it does not work

perfectly every time. So, I tried

running it on this little squirrel

image, and I only highlighted the head,

and the results of the test were

interesting to say the least. I mean,

you can see that the model tried to add

some hands and feet, but the proportions

were totally wrong, and it looks more

like a Pokemon rather than a real

animal, which is cute in its own way,

but this just shows that good

segmentation is still necessary if we

want the models to output actual good

results. I did a couple of more tests

with items I could find around the

house, and overall, the results were

very, very good. I noticed that the

model was struggling to reproduce things

like labels on a camera or any other

text label for that matter, [snorts] but

the object shapes and proportions were

correct for the most part. I also took

the app outside to create some 3D

architecture models, and I even managed

to create a nice little 3D model of the

CN Tower of Toronto. So, there you have

it. That's how Meta's segment anything

models work in action. If you want to

dive deeper in detail how the

segmentation works and how the UI sends

the data to our API, check out the full

codebase on both of the GitHub

repositories. Links are provided in the

description. So, what do you think about

these Meta's new AI vision models? Have

you tried them? Will you use them in

your own projects? Let us know in the

comments down below. And folks, if you

like these types of long technical

tutorials and you want to see more of

them, let us know by smashing that like

button underneath the video. And don't

forget to subscribe to our channel as

well. This has been Andress from Better

Stack and I will see you in the next

videos.

[music]

>> [music]

This AI Turns Any Photo Into a 3D Object (Meta's SAM 3D)

Better Stack

72 days ago

12:25

Ai Whitelist

AI Whitelist

Rank #1

Description

We built a mobile app that turns a single photo into a full 3D model using Meta’s SAM 3D models. In this video, we break down how SAM 3D reconstructs objects from images, how Gaussian splatting fits into the pipeline, and what actually works or fails when objects are partially obscured. We also walk through the real app setup with Expo, a remote GPU backend, and share the full code so you can try it yourself. 🔗 Relevant Links Project API: https://github.com/andrisgauracs/sam3d-api Project Frontend: https://github.com/andrisgauracs/sam3d-mobile Meta's SAM 3D: https://ai.meta.com/sam3d/ ❤️ More about us Radically better observability stack: https://betterstack.com/ Written tutorials: https://betterstack.com/community/ Example projects: https://github.com/BetterStackHQ 📱 Socials Twitter: https://twitter.com/betterstackhq Instagram: https://www.instagram.com/betterstackhq/ TikTok: https://www.tiktok.com/@betterstack LinkedIn: https://www.linkedin.com/company/betterstack 📌 Chapters: 00:00 Intro 00:31 What Are Segment Anything Models 01:05 The Latest SAM 3D Models 02:09 How SAM 3D Reconstructs Objects 02:59 Project App Architecture and Tech Stack 04:38 Setting Up SAM Models on a Remote GPU 07:00 Setting Up The Expo App 07:33 Test Run 09:56 Testing The Model Handling Obscurity 10:59 Tests With Real Photos 11:27 Final Takeaways

Video Details

Category

Feed

AI Whitelist

Featured Date

December 20, 2025

Quality Rank

#1

AI Recommended