Paper
Hub
搜索
Toggle language
Rauno Arike
~Rauno_Arike1
2
论文总数
2.0
年均投稿
4.1
平均评分
接收情况
1
/
2
会议分布
NeurIPS
1
ICLR
1
发表论文 (2 篇)
2024
2 篇
5.3
4
Interpreting Learned Feedback Patterns in Large Language Models
NeurIPS 2024
Poster
3.0
3
Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders
ICLR 2024
withdrawn
合作者 (7)
AA
Amir Abdullah
2 篇
FB
Fazl Barez
2 篇
LM
Luke Marks
2 篇
PT
Philip Torr
2 篇
LM
Luna Mendez
1 篇
CN
Clement Neo
1 篇
DK
David Krueger
1 篇