影响力指数

84.31/100

前 1%

全站排名 #639

发表论文20 篇

平均评分6.0

年均产出6.7 篇/年

Liwei Jiang

PhD student@University of Washington·美国·OpenReview

研究方向

Natural Language Processing · AI

5.3

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

ICLR 2026Rejected

二作

5.0

ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment

ICLR 2026Poster

5.0

Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability

ICLR 2026Poster

4.7

HieraSuite: A Holistic Toolkit for Building Versatile System-User Instruction Hierarchy

ICLR 2026Rejected

一作

7.3

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

ICLR 2025Spotlight

二作

7.0

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

COLM 2025Poster

7.0

AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

ICLR 2025Oral

6.5

SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior

ICML 2025Poster

5.0

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring (the Lack of) Cultural Knowledge of LLMs

ICLR 2025Rejected

二作

4.5

Can Language Models Reason about Individualistic Human Values and Preferences?

ICLR 2025Withdrawn

一作

3.3

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

合作者 (20)

Niloofar Mireshghallah

Liwei Jiang

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment

Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability

HieraSuite: A Holistic Toolkit for Building Versatile System-User Instruction Hierarchy

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents

AI Debate Aids Assessment of Controversial Claims

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring (the Lack of) Cultural Knowledge of LLMs

Can Language Models Reason about Individualistic Human Values and Preferences?

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation