Props: Using Oracles for Machine-Learning Security | Ari Juels at SmartCon 2025 | DailyDevLists

Loading video player...

Full Transcript

959 words • EN

Machine learning systems, in consequence, have consumed a mountain of data, a vast amount of

data, to the point where essentially we are running out, in the sense that we

are running up against the limits of useful information scrapeable from the public Internet.

So the problem fundamentally here is we're running out of data because there is only

one Internet. But is this actually true? In some sense,

there are two Internets. There is the surface web, the publicly scrapable

portion of the Internet, and then there's the private web. The private

web is the portion of the Internet that is only accessible through access controls.

The walled -off portion of the Internet, if you will. And this is where I

would argue the most interesting data lives. Things like health records, email, financial

documents, sensitive data live on the private web. It's

estimated that the amount of data in the private web is two orders of magnitude

greater than that on the surface web.

But the data, all this data on the private web, is largely

unusable for machine learning purposes. And the reason is essentially a

security -related one. Let me explain by way of example. All right,

let's suppose that somebody is training a health diagnostics model and

training it on or fine -tuning it on electronic health records. Alice

has an electronic health record that she would like to provide for the purposes of

training this model. The problem she is naturally going to run into in most

cases is that most web servers don't support general -purpose secure

third -party data sharing. Now, there's no easy way for Alice to relay her electronic

health record to the entity that's training this model unless there's some

kind of pre -existing relationship between her health provider and this entity in general.

And so this doesn't quite work. What Alice can do, of course, is just download

her electronic health record and then upload it to the training environment.

But if she does that, two problems ensue. First, there's the problem of privacy.

Alice is sending it into this environment, but she has no idea whether her electronic

record will be protected there. Second problem is one of integrity.

Whoever is training this model wants to know that the electronic health records it's ingesting

are authentic. They actually come from real healthcare providers. But if users are just uploading

documents, there's no such assurance. And so we have these two security problems. How can

we address them? This is where blockchain technology can be helpful, and not just in

a blockchain context, in a general sense. If we plug in an Oracle, and in

particular a confidential Oracle system, like Town Crier or confidential

HTTPS in the CRE has introduced yesterday,

then we can ensure that the electronic health record Alice is

providing is authentic, hasn't been fabricated, hasn't been tampered with. And Alice

can do things like privacy -preserving filtering of her electronic health record can release only

the data she wants to release. All of this can be done with no modification

to existing web servers. This is the beauty of confidential Oracle systems.

Alice gets other privacy protections as well, and there are other integrity properties here that

I don't have time to get into. This idea,

generally speaking, I refer to or we refer to as props or protected

pipelines. The idea is that using the confidential Oracle system,

combined with other privacy -preserving systems, like trust -to -execution environments to do

model training, we end up with a full end -to -end security perimeter,

so that the integrity and privacy of the data being ingested by the system are

protected from the time that they're sourced through the time they're used, and beyond, potentially.

Well, this setup I've shown you looks a lot like the Chainlink runtime environment, the

CRE, with two features involved. Confidential

HTTPS to source Alice's data, again, from unmodified web

server. And this is based on TownCryer or Deco, as Sergei mentioned

yesterday. And confidential compute, a protected environment to do the model

training or fine -tuning. So to summarize the

benefits you get here, using props for model training, there's an explicit

step involving consent of the user, consent by the user. Alice is the one who

logs in and grabs her electronic health record in order to relay it to the

entity training the model. We get this property of data authenticity. The provider knows that

the EHR came from an authentic healthcare provider, and we have the form of confidentiality

or forms of confidentiality that I described. Basically, Alice's records go directly into the training

environment, and once the model's trained, her records can, her electronic health record can be

deleted. And again, no modification is required to existing infrastructure. So that's

the benefit of props for model training. Props can also be used for inference.

For example, suppose that somebody is selling a token, can only sell it to

accredited investors, investors who have the financial resources to incur

the risk that this offering may involve. Well, what

props can do then is ingest financial records

from trustworthy sources, financial institutions, the

IRS. Alice can, for instance, provide a transcript of her tax

filings. And an LLM can process these documents and determine whether Alice

is indeed an accredited investor. All of this, again, can happen within

a security perimeter, the security perimeter defined by the prop or

props. Exactly this setup we have in fact

implemented, fully implemented in a demo, which my colleague

Philip will come up and describe to you. Go through it step by step so

you understand exactly how the system works and what security assurances it provides.

Props: Using Oracles for Machine-Learning Security | Ari Juels at SmartCon 2025

Chainlink

48 days ago

7:10

Web3 Whitelist

Rank #1

Description

At SmartCon 2025, Ari Juels presents his thesis on using oracles for machine-learning security. View the SmartCon 2025 playlist: https://youtube.com/playlist?list=PLVP9aGDn-X0R1kuQo8qLPnqlT7ThKQR2s&si=pjTcFXjqEOKuldry Chainlink is the industry-standard oracle platform bringing the capital markets onchain and powering the majority of decentralized finance (DeFi). The Chainlink stack provides the essential data, interoperability, compliance, and privacy standards needed to power advanced blockchain use cases for institutional tokenized assets, lending, payments, stablecoins, and more. Since inventing decentralized oracle networks, Chainlink has enabled tens of trillions in transaction value and now secures the vast majority of DeFi. Many of the world’s largest financial services institutions have also adopted Chainlink’s standards and infrastructure, including Swift, Euroclear, Mastercard, Fidelity International, UBS, S&P Dow Jones Indices, FTSE Russell, WisdomTree, ANZ, and top protocols such as Aave, Lido, GMX and many others. Chainlink leverages a novel fee model where offchain and onchain revenue from enterprise adoption is converted to LINK tokens and stored in a strategic Chainlink Reserve. Learn more at chain.link. ✅ Subscribe and turn notifications on: https://www.youtube.com/channel/UCnjkrlqaWEBSnKZQ71gdyFA?sub_confirmation=1 Learn more about Chainlink: Website: https://chain.link Docs: https://docs.chain.link Twitter: https://twitter.com/chainlink #Chainlink #crypto #blockchain

Video Details

Category

Featured Date

January 13, 2026

Quality Rank

#1

AI Recommended