影响力指数

37.67/100

前 16.2%

全站排名 #10,417

发表论文3 篇

平均评分6.4

年均产出1.5 篇/年

Yang Gao

Researcher@Google·OpenReview

RRM: Robust Reward Model Training Mitigates Reward Hacking

ICLR 2025Poster

Impact of Preference Noise on the Alignment Performance of Generative Language Models

COLM 2024Poster

Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data

COLM 2024Poster

合作者 (20)

Abe Ittycheriah

Anastasia Makarova

Jeremiah Zhe Liu