Paper
Hub
搜索
Toggle language
Luke Marks
~Luke_Marks2
3
论文总数
1.5
年均投稿
4.1
平均评分
接收情况
1
/
3
会议分布
ICLR
2
NeurIPS
1
发表论文 (3 篇)
2025
1 篇
4.0
4
Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders
ICLR 2025
Rejected
2024
2 篇
5.3
4
Interpreting Learned Feedback Patterns in Large Language Models
NeurIPS 2024
Poster
3.0
3
Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders
ICLR 2024
withdrawn
合作者 (8)
FB
Fazl Barez
3 篇
AA
Amir Abdullah
2 篇
PT
Philip Torr
2 篇
RA
Rauno Arike
2 篇
DK
David Krueger
2 篇
LM
Luna Mendez
1 篇
AP
Alasdair Paren
1 篇
CN
Clement Neo
1 篇