Loading video player...
In this AI Research Roundup episode, Alex discusses the paper: 'A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring' This paper addresses the growing concern of LLMs using hidden steganography to evade oversight mechanisms. The authors propose a new decision-theoretic approach that focuses on the asymmetry of usable information between agents. They introduce the concept of generalized V-information and the steganographic gap to quantify hidden content based on observable actions. This method allows for the detection and mitigation of steganographic reasoning without needing a known reference distribution. Empirical results demonstrate that this framework effectively monitors and limits these hidden behaviors in large language models. Paper URL: https://arxiv.org/pdf/2602.23163 #AI #MachineLearning #DeepLearning #Steganography #LLMSafety #VInformation #ModelMonitoring