影响力指数

88.44/100

前 0.7%

全站排名 #443

发表论文26 篇

平均评分5.7

年均产出8.7 篇/年

Maarten Sap

Assistant Professor@Carnegie Mellon University·美国·OpenReview

研究方向

AI agents · Toxicity in dialogues · Ethics in AI · Hate speech detection · Social commonsense · Narrative analyses

PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm

ICLR 2026Poster

OpenAgentSafety: A Comprehensive Framework For Evaluating Real-World AI Agent Safety

ICLR 2026Poster

Ambig-SWE: Interactive Agents to Overcome Underspecificity in Software Engineering

ICLR 2026Poster

Fortifying Hallucination Detection to Out-of-Domain Data

ICLR 2026Withdrawn

TOM-SWE: User Mental Modeling For Software Engineering Agents

ICLR 2026Rejected

Social World Models: Universal Structured Representations for Social Reasoning

ICLR 2026Rejected

HypoVeil: A Hypothesis-Driven Pragmatic Inference-Time Control Framework for Privacy–Utility-Aware LLM-Agent Dialogue

ICLR 2026Withdrawn

The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies

ICLR 2026Rejected

EVALUESTEER: Measuring Reward Model Steerability Towards Values and Preferences

ICLR 2026Withdrawn

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

COLM 2025Poster

Fluid Language Model Benchmarking

COLM 2025Poster

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

COLM 2025Poster

Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning

NeurIPS 2025Poster

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

COLM 2025Poster

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents

COLM 2025Poster

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

ICLR 2025Rejected

On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents

ICML 2025Poster

SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior

ICML 2025Poster

On the Resilience of Multi-Agent Systems with Malicious Agents

ICLR 2025Rejected

BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data

ICLR 2025Withdrawn

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

ICLR 2025Rejected

合作者 (20)

博士导师6 篇

Niloofar Mireshghallah