Anca Dragan
~Anca_Dragan1
22
论文总数
11.0
年均投稿
平均评分
接收情况13/22
会议分布
ICLR
16
NeurIPS
4
ICML
2
发表论文 (22 篇)
202514 篇
4
Adversaries Can Misuse Combinations of Safe Models
ICML 2025Poster
4
Adversaries Can Misuse Combinations of Safe Models
ICLR 2025Rejected
4
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
ICLR 2025Poster
4
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL
NeurIPS 2025Poster
3
Successor Representations Enable Emergent Compositional Instruction Following
ICLR 2025Rejected
4
Reliability-Aware Preference Learning for LLM Reward Models
ICLR 2025withdrawn
4
Zero-Shot Goal Dialogue via Reinforcement Learning on Imagined Conversations
ICLR 2025Rejected
5
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
ICLR 2025Spotlight
4
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
NeurIPS 2025Poster
3
Context Steering: Controllable Personalization at Inference Time
ICLR 2025Poster
4
Interactive Dialogue Agents via Reinforcement Learning with Hindsight Regenerations
ICLR 2025Rejected
4
Defining Deception in Decision Making
ICLR 2025Rejected
3
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
ICLR 2025Poster
3
AssistanceZero: Scalably Solving Assistance Games
ICML 2025Poster
20248 篇
3
Offline RL with Observation Histories: Analyzing and Improving Sample Complexity
ICLR 2024Poster
3
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
ICLR 2024Rejected
5
Preventing Reward Hacking with Occupancy Measure Regularization
ICLR 2024Rejected
4
The Effective Horizon Explains Deep RL Performance in Stochastic Environments
ICLR 2024Spotlight
4
When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback
NeurIPS 2024Poster
5
Learning to Assist Humans without Inferring Rewards
NeurIPS 2024Poster
4
Confronting Reward Model Overoptimization with Constrained RLHF
ICLR 2024Spotlight
4
Learning to Model the World with Language
ICLR 2024Rejected