Stuart Russell
~Stuart_Russell1
23
论文总数
11.5
年均投稿
平均评分
接收情况16/23
会议分布
ICLR
13
NeurIPS
6
ICML
4
发表论文 (23 篇)
202513 篇
4
Monitoring Latent World States in Language Models with Propositional Probes
ICLR 2025Spotlight
4
Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts
ICML 2025Poster
5
Diffusion On Syntax Trees For Program Synthesis
ICLR 2025Spotlight
4
BAMDP Shaping: a Unified Framework for Intrinsic Motivation and Reward Shaping
ICLR 2025Poster
4
Avoiding Catastrophe in Online Learning by Asking for Help
ICML 2025Poster
4
ALMANACS: A Simulatability Benchmark for Language Model Explainability
ICLR 2025Rejected
4
Avoiding Catastrophe in Online Learning by Asking for Help
ICLR 2025Rejected
4
RL, but don't do anything I wouldn't do
ICLR 2025Rejected
4
Observation Interference in Partially Observable Assistance Games
ICML 2025Poster
4
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
NeurIPS 2025Poster
5
Robust and Diverse Multi-Agent Learning via Rational Policy Gradient
NeurIPS 2025Poster
3
AssistanceZero: Scalably Solving Assistance Games
ICML 2025Poster
4
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
NeurIPS 2025Poster
202410 篇
4
When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback
NeurIPS 2024Poster
4
RL Algorithms are Information-State Policies in the Bayes-Adaptive MDP
ICLR 2024Rejected
3
On Representation Complexity of Model-based and Model-free Reinforcement Learning
ICLR 2024Poster
4
ALMANACS: A Simulatability Benchmark for Language Model Explainability
ICLR 2024Rejected
4
The Effective Horizon Explains Deep RL Performance in Stochastic Environments
ICLR 2024Spotlight
4
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
ICLR 2024Rejected
4
Active Teacher Selection for Reinforcement Learning from Human Feedback
ICLR 2024Rejected
5
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
NeurIPS 2024Poster
4
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
NeurIPS 2024Poster
3
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
ICLR 2024Spotlight