
RAG Pipeline: 7 Iterations Explained!
Cyril Imhof
In this AI Research Roundup episode, Alex discusses the paper: 'Tongyi DeepResearch Technical Report' Tongyi DeepResearch introduces an open, scalable agentic LLM designed for long-horizon, deep information-seeking research across planning, multi-step tool use, multi-source reasoning, and synthesis. The work proposes an end-to-end agentic training pipeline combining Agentic CPT mid-training to instill agentic inductive biases in Qwen3-30B-A3B-Base with SFT and on-policy RL to refine planning and tool use. A fully automated synthetic data flywheel generates large-scale questions, agentic trajectories, and stage-specific difficulty/uncertainty. Training spans Prior World, Simulated (offline Wiki + local RAG), and Real-world (sandboxed APIs) environments, scaling context from 32K to 128K with mixed ReAct/Context Management and a GRPO-like token-level policy gradient. Paper URL: https://arxiv.org/abs/2510.24701 #AI #MachineLearning #DeepLearning #LLM #ToolUse #ReinforcementLearning #RAG #Planning Resources: - GitHub: https://github.com/Alibaba-NLP/DeepResearch