影响力指数

9.59/100

超过 29.9%

全站排名 #45,155

发表论文2 篇

平均评分5.3

年均产出1.0 篇/年

Narutatsu Ri

PhD student@Princeton University·美国·OpenReview

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

ICML 2025Poster

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

ICLR 2024Rejected

合作者 (10)

Marzyeh Ghassemi

Jacob Steinhardt

Kathleen McKeown