影响力指数

98.16/100

前 0.1%

全站排名 #46

发表论文79 篇

平均评分5.6

年均产出26.3 篇/年

Hongsheng Li

Associate Professor@The Chinese University of Hong Kong·中国·OpenReview

研究方向

3D Object Detection · Feature Distillation · Generative Models · Multimodal Large Language Models

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

ICLR 2026Poster

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

ICLR 2026Poster

GLEAM: Learning to Match and Explain in Cross-View Geo-Localization

ICLR 2026Withdrawn

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

ICLR 2026Poster

Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

ICLR 2026Poster

One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning

ICLR 2026Poster

ProteinAE: Protein Diffusion Autoencoders for Structure Encoding

ICLR 2026Poster

RelightMaster: Precise Video Relighting with Multi-plane Light Images

ICLR 2026Rejected

PICABench: How Far are We from Physical Realistic Image Editing?

ICLR 2026Poster

LMGenDrive: LLM Reasoning Meets World Models for End-to-End Driving

ICLR 2026Withdrawn

GoT-R1: Unleashing Reasoning Capability of Autoregressive Visual Generation with Reinforcement Learning

ICLR 2026Poster

WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning

ICLR 2026Poster

VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

ICLR 2026Withdrawn

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space

ICLR 2026Withdrawn

MobileWizard: A Data-Efficient GUI Agent with Structured Reasoning and Progressive Reinforcement Learning

ICLR 2026Rejected

A3: Android Agent Arena For Mobile GUI Agents

ICLR 2026Withdrawn

InterAvatar: Real-time Interactive Portrait Animation via Behavioral Interaction Prompts

ICLR 2026Withdrawn

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

ICLR 2026Withdrawn

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

NeurIPS 2025Poster

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

ICLR 2025Spotlight

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

NeurIPS 2025Poster

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

NeurIPS 2025Poster

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

ICLR 2025Spotlight

Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation

NeurIPS 2025Poster

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

NeurIPS 2025Poster

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

ICLR 2025Spotlight

Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models

ICLR 2025Poster

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

NeurIPS 2025Poster

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

ICLR 2025Poster

Mixture Compressor for Mixture-of-Experts LLMs Gains More

ICLR 2025Poster

Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception

ICLR 2025Poster

CameraCtrl: Enabling Camera Control for Video Diffusion Models

ICLR 2025Poster

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

ICLR 2025Poster

MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines

ICLR 2025Poster

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

ICLR 2025Poster

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

NeurIPS 2025Poster

NopeRoomGS: Indoor 3D Gaussian Splatting Optimization without Camera Pose Input

NeurIPS 2025Poster

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

NeurIPS 2025Poster

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

NeurIPS 2025Poster

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

ICML 2025Poster

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

ICLR 2025Rejected

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

ICLR 2025Poster

Step-Controlled DPO: Leveraging Stepwise Errors for Enhancing Mathematical Reasoning of Language Models

ICLR 2025Rejected

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

ICLR 2025Poster

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow

ICLR 2025Poster

Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective

ICLR 2025Rejected

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

ICLR 2025Rejected

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

ICLR 2025Poster

One Leaf Reveals the Season: Occlusion-Based Contrastive Learning with Semantic-Aware Views for Efficient Visual Representation

ICML 2025Poster

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

ICML 2025Poster

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow

ICLR 2025Rejected

DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation

ICLR 2025Withdrawn

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

ICLR 2025Withdrawn

FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering

ICLR 2025Withdrawn

TimeWalker: Personalized Neural Space for Life-long Head Avatar

ICLR 2025Withdrawn

TerDiT: Ternary Diffusion Models with Transformers

ICLR 2025Withdrawn

Stable Consistency Tuning: Understanding and Improving Consistency Models

ICLR 2025Withdrawn

合作者 (20)