最近在做强化学习,零零碎碎看了不少文章,但还是感觉不成体系,各家之言似乎也不太统一,这篇综述还是挺及时的:[2509.08827] A Survey of Reinforcement Learning for Large Reasoning Models
最近在做强化学习,零零碎碎看了不少文章,但还是感觉不成体系,各家之言似乎也不太统一,这篇综述还是挺及时的:[2509.08827] A Survey of Reinforcement Learning for Large Reasoning Models