影响力指数

70.43/100

前 2.6%

全站排名 #1,653

发表论文15 篇

平均评分5.3

年均产出5.0 篇/年

Junkang Wu

PhD student@University of Science and Technology of China·中国·OpenReview

研究方向

llm alignment · recommendation system · graph embedding and knowledge graph

Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning

ICLR 2026Poster

Beyond Magnitude: Leveraging Direction of RLVR Updates for LLM Reasoning

ICLR 2026Poster

PEA-DPO: Perception-Enhanced Alignment Direct Preference Optimization for MLLMs Alignment

ICLR 2026Rejected

Mitigating Reward Hacking in LLM-based Recommendation: A Preference Optimization Approach

ICLR 2026Rejected

bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs

ICLR 2026Rejected

Enhancing Multimodal LLMs Reasoning via Perception Reward Modeling

ICLR 2026Rejected

Bridging Perception and Reasoning: Token Reweighting for RLVR in Multimodal LLMs

ICLR 2026Rejected

DAMA: Data- and Model-aware Alignment of Multi-modal LLMs

ICML 2025Poster

RePO: Understanding Preference Learning Through ReLU-Based Optimization

NeurIPS 2025Poster

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

ICLR 2025Poster

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

ICML 2025Poster

Larger or Smaller Reward Margins to Select Preferences for LLM Alignment?

ICML 2025Poster

$\alpha$-DPO: Adaptive Reward Margin is What Direct Preference Optimization Needs

ICLR 2025Rejected

AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization

ICML 2025Poster

合作者 (20)

博士导师14 篇

博士导师13 篇