Loading video player...
Machine learning systems, in consequence, have consumed a mountain of data, a vast amount of
data, to the point where essentially we are running out, in the sense that we
are running up against the limits of useful information scrapeable from the public Internet.
So the problem fundamentally here is we're running out of data because there is only
one Internet. But is this actually true? In some sense,
there are two Internets. There is the surface web, the publicly scrapable
portion of the Internet, and then there's the private web. The private
web is the portion of the Internet that is only accessible through access controls.
The walled -off portion of the Internet, if you will. And this is where I
would argue the most interesting data lives. Things like health records, email, financial
documents, sensitive data live on the private web. It's
estimated that the amount of data in the private web is two orders of magnitude
greater than that on the surface web.
But the data, all this data on the private web, is largely
unusable for machine learning purposes. And the reason is essentially a
security -related one. Let me explain by way of example. All right,
let's suppose that somebody is training a health diagnostics model and
training it on or fine -tuning it on electronic health records. Alice
has an electronic health record that she would like to provide for the purposes of
training this model. The problem she is naturally going to run into in most
cases is that most web servers don't support general -purpose secure
third -party data sharing. Now, there's no easy way for Alice to relay her electronic
health record to the entity that's training this model unless there's some
kind of pre -existing relationship between her health provider and this entity in general.
And so this doesn't quite work. What Alice can do, of course, is just download
her electronic health record and then upload it to the training environment.
But if she does that, two problems ensue. First, there's the problem of privacy.
Alice is sending it into this environment, but she has no idea whether her electronic
record will be protected there. Second problem is one of integrity.
Whoever is training this model wants to know that the electronic health records it's ingesting
are authentic. They actually come from real healthcare providers. But if users are just uploading
documents, there's no such assurance. And so we have these two security problems. How can
we address them? This is where blockchain technology can be helpful, and not just in
a blockchain context, in a general sense. If we plug in an Oracle, and in
particular a confidential Oracle system, like Town Crier or confidential
HTTPS in the CRE has introduced yesterday,
then we can ensure that the electronic health record Alice is
providing is authentic, hasn't been fabricated, hasn't been tampered with. And Alice
can do things like privacy -preserving filtering of her electronic health record can release only
the data she wants to release. All of this can be done with no modification
to existing web servers. This is the beauty of confidential Oracle systems.
Alice gets other privacy protections as well, and there are other integrity properties here that
I don't have time to get into. This idea,
generally speaking, I refer to or we refer to as props or protected
pipelines. The idea is that using the confidential Oracle system,
combined with other privacy -preserving systems, like trust -to -execution environments to do
model training, we end up with a full end -to -end security perimeter,
so that the integrity and privacy of the data being ingested by the system are
protected from the time that they're sourced through the time they're used, and beyond, potentially.
Well, this setup I've shown you looks a lot like the Chainlink runtime environment, the
CRE, with two features involved. Confidential
HTTPS to source Alice's data, again, from unmodified web
server. And this is based on TownCryer or Deco, as Sergei mentioned
yesterday. And confidential compute, a protected environment to do the model
training or fine -tuning. So to summarize the
benefits you get here, using props for model training, there's an explicit
step involving consent of the user, consent by the user. Alice is the one who
logs in and grabs her electronic health record in order to relay it to the
entity training the model. We get this property of data authenticity. The provider knows that
the EHR came from an authentic healthcare provider, and we have the form of confidentiality
or forms of confidentiality that I described. Basically, Alice's records go directly into the training
environment, and once the model's trained, her records can, her electronic health record can be
deleted. And again, no modification is required to existing infrastructure. So that's
the benefit of props for model training. Props can also be used for inference.
For example, suppose that somebody is selling a token, can only sell it to
accredited investors, investors who have the financial resources to incur
the risk that this offering may involve. Well, what
props can do then is ingest financial records
from trustworthy sources, financial institutions, the
IRS. Alice can, for instance, provide a transcript of her tax
filings. And an LLM can process these documents and determine whether Alice
is indeed an accredited investor. All of this, again, can happen within
a security perimeter, the security perimeter defined by the prop or
props. Exactly this setup we have in fact
implemented, fully implemented in a demo, which my colleague
Philip will come up and describe to you. Go through it step by step so
you understand exactly how the system works and what security assurances it provides.
At SmartCon 2025, Ari Juels presents his thesis on using oracles for machine-learning security. View the SmartCon 2025 playlist: https://youtube.com/playlist?list=PLVP9aGDn-X0R1kuQo8qLPnqlT7ThKQR2s&si=pjTcFXjqEOKuldry Chainlink is the industry-standard oracle platform bringing the capital markets onchain and powering the majority of decentralized finance (DeFi). The Chainlink stack provides the essential data, interoperability, compliance, and privacy standards needed to power advanced blockchain use cases for institutional tokenized assets, lending, payments, stablecoins, and more. Since inventing decentralized oracle networks, Chainlink has enabled tens of trillions in transaction value and now secures the vast majority of DeFi. Many of the world’s largest financial services institutions have also adopted Chainlink’s standards and infrastructure, including Swift, Euroclear, Mastercard, Fidelity International, UBS, S&P Dow Jones Indices, FTSE Russell, WisdomTree, ANZ, and top protocols such as Aave, Lido, GMX and many others. Chainlink leverages a novel fee model where offchain and onchain revenue from enterprise adoption is converted to LINK tokens and stored in a strategic Chainlink Reserve. Learn more at chain.link. ✅ Subscribe and turn notifications on: https://www.youtube.com/channel/UCnjkrlqaWEBSnKZQ71gdyFA?sub_confirmation=1 Learn more about Chainlink: Website: https://chain.link Docs: https://docs.chain.link Twitter: https://twitter.com/chainlink #Chainlink #crypto #blockchain