影响力指数

95.49/100

前 0.2%

全站排名 #158

发表论文34 篇

平均评分6.3

年均产出11.3 篇/年

Pang Wei Koh

Visiting Research Scientist@Allen Institute for Artificial Intelligence·美国·OpenReview

研究方向

distribution shifts

5.5

Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks

ICLR 2026Poster

4.5

PrefDisco: Benchmarking Proactive Personalized Reasoning

ICLR 2026Poster

4.5

HybridCoT: Interleaving Latent and Text Chain-of-Thought for Efficient Reasoning

ICLR 2026Rejected

4.0

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

COLM 2025Poster

通讯

7.7

Fluid Language Model Benchmarking

COLM 2025Poster

7.2

NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric

ICML 2025Poster

7.0

ReasonIR: Training Retrievers for Reasoning Tasks

COLM 2025Poster

7.0

2 OLMo 2 Furious (COLM’s Version)

COLM 2025Poster

6.8

Precise Information Control in Long-Form Text Generation

NeurIPS 2025Poster

6.6

S4S: Solving for a Fast Diffusion Model Solver

ICML 2025Poster

6.5

Language models scale reliably with over-training and on downstream tasks

ICLR 2025Poster

6.3

JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

ICLR 2025Rejected

三作

6.3

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

COLM 2025Poster

6.3

DataDecide: How to Predict Best Pretraining Data with Small Experiments

ICML 2025Poster

6.0

Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions

ICLR 2025Poster

5.8

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

ICLR 2025Rejected

通讯

5.7

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

COLM 2025Poster

4.8

Conformal Reasoning: Uncertainty Estimation in Interactive Environments

ICLR 2025Rejected

通讯

4.5

On Erroneous Agreements of CLIP Image Embeddings

ICLR 2025Rejected

二作

4.0

MoSH: Modeling Multi-Objective Tradeoffs with Soft and Hard Bounds

合作者 (20)

Pang Wei Koh

Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks

PrefDisco: Benchmarking Proactive Personalized Reasoning

HybridCoT: Interleaving Latent and Text Chain-of-Thought for Efficient Reasoning

Spurious Rewards: Rethinking Training Signals in RLVR

Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch

FlexOLMo: Open Language Models for Flexible Data Use

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

OLMoE: Open Mixture-of-Experts Language Models

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains

Fluid Language Model Benchmarking

NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric

ReasonIR: Training Retrievers for Reasoning Tasks

2 OLMo 2 Furious (COLM’s Version)

Precise Information Control in Long-Form Text Generation

S4S: Solving for a Fast Diffusion Model Solver

Language models scale reliably with over-training and on downstream tasks

JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

Establishing Task Scaling Laws via Compute-Efficient Model Ladders

DataDecide: How to Predict Best Pretraining Data with Small Experiments

Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Conformal Reasoning: Uncertainty Estimation in Interactive Environments

On Erroneous Agreements of CLIP Image Embeddings

MoSH: Modeling Multi-Objective Tradeoffs with Soft and Hard Bounds