PaperHub

Hongsheng Li

~Hongsheng_Li3

61
论文总数
30.5
年均投稿
5.9
平均评分
接收情况42/61
会议分布
ICLR
39
NeurIPS
19
ICML
3

发表论文 (61 篇)

202539

5.0
4

DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation

ICLR 2025withdrawn
3.0
4

Stable Consistency Tuning: Understanding and Improving Consistency Models

ICLR 2025withdrawn
5.5
3

One Leaf Reveals the Season: Occlusion-Based Contrastive Learning with Semantic-Aware Views for Efficient Visual Representation

ICML 2025Poster
4.5
4

TimeWalker: Personalized Neural Space for Life-long Head Avatar

ICLR 2025withdrawn
4.7
3

FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering

ICLR 2025withdrawn
7.3
4

Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation

NeurIPS 2025Poster
5.8
4

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow

ICLR 2025Poster
5.3
4

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow

ICLR 2025Rejected
6.8
4

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

ICLR 2025Poster
7.0
4

Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models

ICLR 2025Poster
5.8
4

Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective

ICLR 2025Rejected
6.0
5

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

ICLR 2025Rejected
6.5
4

Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception

ICLR 2025Poster
6.5
4

CameraCtrl: Enabling Camera Control for Video Diffusion Models

ICLR 2025Poster
6.0
4

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

ICLR 2025Poster
7.8
4

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

NeurIPS 2025Poster
6.4
4

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

NeurIPS 2025Poster
6.8
4

Mixture Compressor for Mixture-of-Experts LLMs Gains More

ICLR 2025Poster
7.3
3

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

ICLR 2025Spotlight
6.1
4

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

ICML 2025Poster
8.0
3

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

ICLR 2025Spotlight
6.0
5

Step-Controlled DPO: Leveraging Stepwise Errors for Enhancing Mathematical Reasoning of Language Models

ICLR 2025Rejected
7.3
4

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

NeurIPS 2025Poster
6.0
4

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

ICLR 2025Poster
6.4
4

NopeRoomGS: Indoor 3D Gaussian Splatting Optimization without Camera Pose Input

NeurIPS 2025Poster
5.7
3

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

ICLR 2025Poster
4.8
4

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

ICLR 2025withdrawn
7.8
4

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

NeurIPS 2025Poster
6.4
3

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

NeurIPS 2025Poster
4.0
4

TerDiT: Ternary Diffusion Models with Transformers

ICLR 2025withdrawn
6.5
4

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

ICLR 2025Poster
6.4
3

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

NeurIPS 2025Poster
8.2
3

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

NeurIPS 2025Poster
6.5
4

MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines

ICLR 2025Poster
5.8
4

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

ICLR 2025Rejected
5.5
4

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

ICML 2025Poster
6.5
6

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

ICLR 2025Poster
6.8
4

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

NeurIPS 2025Poster
7.2
5

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

ICLR 2025Spotlight

202422

5.5
4

Covariance-corrected Whitening Alleviates Network Degeneration on Imbalanced Classification

ICLR 2024Rejected
4.6
5

Meta-Transformer: A Unified Framework for Multimodal Learning

ICLR 2024withdrawn
4.8
4

Searching for Parameter-Efficient Tuning Architecture for Text-to-image Diffusion Models

ICLR 2024withdrawn
5.3
4

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

NeurIPS 2024Poster
3.8
4

Enhancing Vision-Language Model with Unmasked Token Alignment at Scale

ICLR 2024withdrawn
5.8
5

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

NeurIPS 2024Poster
4.3
4

Debias the Training of Diffusion Models

ICLR 2024withdrawn
5.0
4

Instruct2Act: Mapping Multi-modality Instructions to Robotic Arm Actions with Large Language Model

ICLR 2024Rejected
6.3
3

LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention

ICLR 2024Poster
4.8
4

Unifying Diverse Decision-Making Scenarios with Learned Discrete Actions

ICLR 2024Rejected
5.5
4

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

NeurIPS 2024Poster
5.2
5

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding

NeurIPS 2024Poster
5.0
4

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

NeurIPS 2024Poster
5.3
4

Learning 1D Causal Visual Representation with De-focus Attention Networks

NeurIPS 2024Poster
5.8
4

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process

ICLR 2024Poster
6.7
3

Personalize Segment Anything Model with One Shot

ICLR 2024Poster
5.3
4

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving

NeurIPS 2024Poster
5.8
4

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

ICLR 2024Poster
6.3
4

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

ICLR 2024Poster
6.3
3

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

ICLR 2024Rejected
5.7
3

Phased Consistency Models

NeurIPS 2024Poster
5.2
5

Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT

NeurIPS 2024Poster