Kwang-Sung Jun

Associate Professor@Pohang University of Science and Technology·韩国·OpenReview

研究方向

alignment · reinforcement learning from human feedback (RLHF) · test-time scaling · offline bandits · off-policy evaluation · selection · learning · online learning · multi-armed bandit

7.0

GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression

ICML 2025Spotlight

三作

5.5

Fixing the Loose Brake: Exponential-Tailed Stopping Time in Best Arm Identification

合作者 (20)

Kwang-Sung Jun

Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning

Second-Order Bounds for [0,1]-Valued Regression via Betting Loss

Beyond RLHF: A Theoretical Framework of Alignment as Distribution Learning

GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression

Fixing the Loose Brake: Exponential-Tailed Stopping Time in Best Arm Identification