Loading video player...
This document introduces DreamGym, a unified framework designed for scalable experience synthesis to enable effective online reinforcement learning (RL) training for large language model (LLM) agents. It addresses challenges like costly rollouts, limited task diversity, and unreliable reward signals by using a reasoning-based experience model instead of relying on real-world environments. DreamGym distills environment dynamics into a textual space, generating consistent state transitions and feedback. An experience replay buffer, initialized with offline data and continuously updated, improves transition stability and quality. The framework adaptively generates new tasks, enabling effective online curriculum learning. Experiments demonstrate that DreamGym improves RL training in both synthetic and sim-to-real scenarios, outperforming baselines on tasks like WebArena. It offers a scalable warm-start strategy, providing performance gains with fewer real-world interactions. #reinforcementlearning #LLM #agents #DreamGym #experiencesynthesis #onlinetraining #AI paper - http://arxiv.org/pdf/2511.03773v1 subscribe - https://t.me/arxivpaper donations: USDT: 0xAA7B976c6A9A7ccC97A3B55B7fb353b6Cc8D1ef7 BTC: bc1q8972egrt38f5ye5klv3yye0996k2jjsz2zthpr ETH: 0xAA7B976c6A9A7ccC97A3B55B7fb353b6Cc8D1ef7 SOL: DXnz1nd6oVm7evDJk25Z2wFSstEH8mcA1dzWDCVjUj9e created with NotebookLM