影响力指数

51.9/100

前 7.6%

全站排名 #4,906

发表论文7 篇

平均评分5.9

年均产出2.3 篇/年

Udari Madhushani Sehwag

Researcher@Scale AI·美国·OpenReview

研究方向

Scalable oversight · Frontier safety · Foundation model based agents · Alignment · collective alignment · Evaluations and benchmarking foundation models · Jailbreak · safety and responsible generative AI · Multi-agent learning · multi-agent RL

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

ICLR 2026Poster

PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach

ICLR 2026Poster

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

ICLR 2025Poster

GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment

ICLR 2025Poster

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment

ICLR 2025Poster

AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment

ICLR 2025Rejected

合作者 (20)

Soumya Suvra Ghosal