Songyang Gao
~Songyang_Gao1
6
论文总数
3.0
年均投稿
平均评分
接收情况4/6
会议分布
COLM
2
NeurIPS
2
ICLR
2
发表论文 (6 篇)
20254 篇
4
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
COLM 2025Poster
4
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-Thinking Reasoning
NeurIPS 2025Poster
4
Pre-Trained Policy Discriminators are General Reward Models
NeurIPS 2025Poster
4
AgentGym: Evaluating and Evolving Large Language Model-based Agents across Diverse Envronments
ICLR 2025Rejected