Maarten Sap
~Maarten_Sap1
17
论文总数
8.5
年均投稿
平均评分
接收情况13/17
会议分布
ICLR
7
COLM
6
NeurIPS
2
ICML
2
发表论文 (17 篇)
202512 篇
5
Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning
NeurIPS 2025Poster
4
The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains
COLM 2025Poster
4
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
ICLR 2025withdrawn
3
Fluid Language Model Benchmarking
COLM 2025Poster
4
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
COLM 2025Poster
4
SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior
ICML 2025Poster
5
On the Resilience of Multi-Agent Systems with Malicious Agents
ICLR 2025Rejected
3
SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
ICLR 2025Rejected
4
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents
ICML 2025Poster
4
ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning
COLM 2025Poster
4
HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
ICLR 2025Rejected
4
HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents
COLM 2025Poster
20245 篇
4
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
ICLR 2024Spotlight
4
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
ICLR 2024Poster
5
PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models
COLM 2024Poster
4
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
NeurIPS 2024Poster
3
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
ICLR 2024Spotlight