Loading video player...
Usman Anwar (University of Cambridge) presents a decision-theoretic framework for detecting steganographic reasoning in LLMs — that is, reasoning that encodes hidden information to evade oversight. Existing approaches either rely on classical steganalysis (which requires a known reference distribution of benign signals that doesn't exist for LLM reasoning) or LLM-as-judge methods (which lack a theoretical basis and conflate suspiciousness with steganography). The key insight of Anwar and co-authors' work is that steganography is instrumental: it creates an asymmetry in usable information between a receiver who can decode the hidden content and a third-party sentinel who cannot. They formalize this via the "steganographic gap" — a metric comparing downstream utility for each party. As a concrete example: if a reasoning trace is encoded with ROT13, a receiver LLM that knows the scheme performs well on the downstream task while a sentinel that doesn't is left in the dark — producing a large, detectable gap. Note: The opinions shared in this event are those of the speaker(s) and may not represent the views of FAR.AI or their affiliated organizations.