影响力指数

50.63/100

前 8.3%

全站排名 #5,332

发表论文6 篇

平均评分6.2

年均产出3.0 篇/年

Bilal Piot

Researcher@Google·美国·OpenReview

研究方向

Reinforcement Learning

Learning from negative feedback, or positive feedback or both

ICLR 2025Spotlight

Building Math Agents with Multi-Turn Iterative Preference Learning

ICLR 2025Poster

RRM: Robust Reward Model Training Mitigates Reward Hacking

ICLR 2025Poster

Unlocking the Power of Representations in Long-term Novelty-based Exploration

ICLR 2024Spotlight

Multi-turn Reinforcement Learning with Preference Human Feedback

NeurIPS 2024Poster

Direct Language Model Alignment from Online AI Feedback

NeurIPS 2024Rejected

合作者 (20)

Daniele Calandriello