影响力指数

66.25/100

前 3.3%

全站排名 #2,152

发表论文18 篇

平均评分5.2

年均产出6.0 篇/年

Christian Schroeder de Witt

Lecturer@University of Oxford·英国·OpenReview

研究方向

AI security · multi-agent learning · reinforcement learning

Towards Understanding Multimodal Fine-Tuning: A Case Study into Spatial Features

ICLR 2026Rejected

AI Models Can Provably Hide Arbitrary Capabilities

ICLR 2026Withdrawn

h1: Bootstrapping Models to Reason over Longer Horizons via Reinforcement Learning

ICLR 2026Rejected

Predicting Weak-to-Strong Generalization from Latent Representations

ICLR 2026Rejected

Efficient Dictionary Learning with Switch Sparse Autoencoders

ICLR 2025Poster

MALT: Improving Reasoning with Multi-Agent LLM Training

COLM 2025Poster

Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs

NeurIPS 2025Poster

Mixture of Experts Made Intrinsically Interpretable

ICML 2025Poster

MAD-Sherlock: Multi-Agent Debates for Out-of-Context Misinformation Detection

ICLR 2025Rejected

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

ICLR 2025Rejected

Mitigating Goal Misgeneralization via Minimax Regret

ICLR 2025Rejected

SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders

ICLR 2025Withdrawn

Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap

ICLR 2025Rejected

合作者 (20)

博士导师10 篇

Sumeet Ramesh Motwani

Constantin Venhoff