Yejin Choi

~Yejin_Choi1

59

论文总数

29.5

年均投稿

6.3

平均评分

接收情况44/59

会议分布

ICLR

32

COLM

14

NeurIPS

9

ICML

4

发表论文 (59 篇)

202537 篇

From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

ICLR 2025Rejected

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

ICLR 2025Spotlight

Can Language Models Reason about Individualistic Human Values and Preferences?

ICLR 2025withdrawn

The HALoGen Benchmark: Fantastic LLM Hallucinations and Where To Find Them

ICLR 2025withdrawn

Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations

NeurIPS 2025Spotlight

LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception

COLM 2025Poster

Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations

NeurIPS 2025Poster

Explore Theory of Mind: program-guided adversarial data generation for theory of mind reasoning

ICLR 2025Poster

Diverging Preferences: When do Annotators Disagree and do Models Know?

ICLR 2025Rejected

SuperBPE: Space Travel for Language Models

COLM 2025Poster

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

NeurIPS 2025Poster

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

ICLR 2025Rejected

Diverging Preferences: When do Annotators Disagree and do Models Know?

ICML 2025Poster

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

ICLR 2025Poster

Pixelated Instructions: Can Multimodal Large Language Models Follow Printed Instructions in Images?

ICLR 2025Rejected

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage

COLM 2025Poster

SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs

ICLR 2025Rejected

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

ICML 2025Poster

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers

COLM 2025Poster

CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

ICLR 2025Poster

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

ICLR 2025Spotlight

Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models

COLM 2025Poster

Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence

ICML 2025Poster

SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior

ICML 2025Poster

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Interactive AI Agents

COLM 2025Poster

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

ICLR 2025Rejected

Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence

ICLR 2025Rejected

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

COLM 2025Poster

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

ICLR 2025Rejected

Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning

NeurIPS 2025Spotlight

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

ICLR 2025Poster

AI Debate Aids Assessment of Controversial Claims

NeurIPS 2025Poster

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring (the Lack of) Cultural Knowledge of LLMs

ICLR 2025Rejected

AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Language Model Alignment in Multilingual Trolley Problems

ICLR 2025Spotlight

VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents

NeurIPS 2025Poster

202422 篇

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

ICLR 2024Poster

Data Mixture Inference Attack: BPE Tokenizers Reveal Training Data Compositions

NeurIPS 2024Poster

Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding

COLM 2024Poster

Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild

COLM 2024Poster

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

COLM 2024Poster

Making PPO even better: Value-Guided Monte-Carlo Tree Search decoding

ICLR 2024Rejected

Tuning Language Models by Proxy

COLM 2024Poster

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

COLM 2024Poster

WildChat: 1M ChatGPT Interaction Logs in the Wild

ICLR 2024Spotlight

LUMOS: Towards Language Agents that are Unified, Modular, and Open Source

ICLR 2024Rejected

FiLM: Fill-in Language Models for Any-Order Generation

ICLR 2024Rejected

Information-Theoretic Distillation for Reference-less Summarization

COLM 2024Poster

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

ICLR 2024Spotlight

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

NeurIPS 2024Poster

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

ICLR 2024Poster

Tailoring Self-Rationalizers with Multi-Reward Distillation

ICLR 2024Poster

Do Membership Inference Attacks Work on Large Language Models?

COLM 2024Poster

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

In Search of the Long-Tail: Systematic Generation of Long-Tail Knowledge via Logical Rule Induced Search

ICLR 2024withdrawn

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

NeurIPS 2024Poster

PlaSma: Procedural Knowledge Models for Language-based Planning and Re-Planning

ICLR 2024Poster

The Generative AI Paradox: “What It Can Create, It May Not Understand”

ICLR 2024Poster

合作者 (20)

Ximing Lu15 篇

Liwei Jiang15 篇

Nouha Dziri10 篇

Niloofar Mireshghallah9 篇

Yulia Tsvetkov9 篇

Faeze Brahman9 篇

Valentina Pyatkin8 篇