影响力指数

97.55/100

前 0.1%

全站排名 #71

发表论文45 篇

平均评分5.9

年均产出15.0 篇/年

Min-hwan Oh

Associate Professor@Seoul National University·韩国·OpenReview

研究方向

Bandit Algorithms · Reinforcement Learning

6.5

Convergence of Muon with Newton-Schulz

ICLR 2026Poster

二作

6.0

Peng's Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning

ICLR 2026Poster

二作

6.0

Follow-the-Perturbed-Leader for Decoupled Bandits: Best-of-Both-Worlds and Practicality

ICLR 2026Withdrawn

三作

3.5

Sample-Efficient Pruning Model Selection via Lasso

ICLR 2026Withdrawn

三作

2.7

Inverse GFlowNets for Generative Imitation Learning

ICLR 2026Withdrawn

7.8

Exploration via Feature Perturbation in Contextual Bandits

NeurIPS 2025Spotlight

二作

7.8

True Impact of Cascade Length in Contextual Cascading Bandits

NeurIPS 2025Poster

三作

7.3

Infrequent Exploration in Linear Bandits

NeurIPS 2025Poster

二作

7.3

Tractable Multinomial Logit Contextual Bandits with Non-Linear Utilities

NeurIPS 2025Poster

三作

7.3

Revisiting Follow-the-Perturbed-Leader with Unbounded Perturbations in Bandit Problems

NeurIPS 2025Poster

通讯

7.2

Improved Online Confidence Bounds for Multinomial Logistic Bandits

ICML 2025Poster

二作

7.1

Thompson Sampling for Multi-Objective Linear Contextual Bandit

NeurIPS 2025Poster

三作

7.0

Minimax Optimal Reinforcement Learning with Quasi-Optimism

ICLR 2025Poster

二作

7.0

Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning

ICLR 2025Poster

二作

7.0

Oracle-Efficient Combinatorial Semi-Bandits

NeurIPS 2025Poster

三作

6.8

Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options

NeurIPS 2025Poster

三作

6.7

ADAM Optimization with Adaptive Batch Selection

ICLR 2025Poster

二作

6.5

Dynamic Assortment Selection and Pricing with Censored Preference Feedback

ICLR 2025Poster

二作

6.4

EUGens: Efficient, Unified and General Dense Layers

NeurIPS 2025Poster

6.3

Lasso Bandit with Compatibility Condition on Optimal Arm

ICLR 2025Poster

三作

6.1

Combinatorial Reinforcement Learning with Preference Feedback

ICML 2025Poster

二作

6.1

Symmetry-Aware GFlowNets

ICML 2025Poster

三作

5.6

GFlowNets Need Automorphism Correction for Unbiased Graph Generation

ICLR 2025Rejected

三作

5.5

Combinatorial Reinforcement Learning with Preference Feedback

ICLR 2025Rejected

二作

5.5

Optimal and Practical Batched Linear Bandit Algorithm

ICML 2025Poster

二作

5.3

Magnituder Layers for Implicit Neural Representations in 3D

ICLR 2025Rejected

通讯

5.0

Linear Bandits with Partially Observable Features

ICLR 2025Rejected

通讯

4.8

Stochastic Matching Bandits under Preference Feedback

ICLR 2025Withdrawn

二作

4.8

Linear Bandits with Partially Observable Features

ICML 2025Poster

通讯

4.3

Neural Dynamic Pricing: Provable and Practical Efficiency

ICLR 2025Withdrawn

通讯

4.0

Mostly Exploration-free Algorithms for Multi-Objective Linear Bandits

ICLR 2025Withdrawn

二作

-1

Coordinated Exploration in Distributed Reinforcement Learning

合作者 (20)

Min-hwan Oh

Convergence of Muon with Newton-Schulz

Peng's Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning

KL-Regularization Is Sufficient in Contextual Bandits and RLHF

Optimal Batched (Generalized) Linear Contextual Bandit Algorithm

Offline Preference-Based Value Optimization

Diversified Multinomial Logit Contextual Bandits

Batched Stochastic Matching Bandits

Blessings of Many Good Arms in Multi-Objective Linear Bandits

Follow-the-Perturbed-Leader for Decoupled Bandits: Best-of-Both-Worlds and Practicality

Sample-Efficient Pruning Model Selection via Lasso

Inverse GFlowNets for Generative Imitation Learning

Exploration via Feature Perturbation in Contextual Bandits

True Impact of Cascade Length in Contextual Cascading Bandits

Infrequent Exploration in Linear Bandits

Tractable Multinomial Logit Contextual Bandits with Non-Linear Utilities

Revisiting Follow-the-Perturbed-Leader with Unbounded Perturbations in Bandit Problems

Improved Online Confidence Bounds for Multinomial Logistic Bandits

Thompson Sampling for Multi-Objective Linear Contextual Bandit

Minimax Optimal Reinforcement Learning with Quasi-Optimism

Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning

Oracle-Efficient Combinatorial Semi-Bandits

Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options

ADAM Optimization with Adaptive Batch Selection

Dynamic Assortment Selection and Pricing with Censored Preference Feedback

EUGens: Efficient, Unified and General Dense Layers

Lasso Bandit with Compatibility Condition on Optimal Arm

Combinatorial Reinforcement Learning with Preference Feedback

Symmetry-Aware GFlowNets

GFlowNets Need Automorphism Correction for Unbiased Graph Generation

Combinatorial Reinforcement Learning with Preference Feedback

Optimal and Practical Batched Linear Bandit Algorithm

Magnituder Layers for Implicit Neural Representations in 3D

Linear Bandits with Partially Observable Features

Stochastic Matching Bandits under Preference Feedback

Linear Bandits with Partially Observable Features

Neural Dynamic Pricing: Provable and Practical Efficiency

Mostly Exploration-free Algorithms for Multi-Objective Linear Bandits

Coordinated Exploration in Distributed Reinforcement Learning