Hongsheng Li
~Hongsheng_Li3
61
论文总数
30.5
年均投稿
平均评分
接收情况42/61
会议分布
ICLR
39
NeurIPS
19
ICML
3
发表论文 (61 篇)
202539 篇
4
DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation
ICLR 2025withdrawn
4
Stable Consistency Tuning: Understanding and Improving Consistency Models
ICLR 2025withdrawn
3
One Leaf Reveals the Season: Occlusion-Based Contrastive Learning with Semantic-Aware Views for Efficient Visual Representation
ICML 2025Poster
4
TimeWalker: Personalized Neural Space for Life-long Head Avatar
ICLR 2025withdrawn
3
FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering
ICLR 2025withdrawn
4
Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation
NeurIPS 2025Poster
4
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
ICLR 2025Poster
4
I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
ICLR 2025Rejected
4
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
ICLR 2025Poster
4
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
ICLR 2025Poster
4
Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective
ICLR 2025Rejected
5
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
ICLR 2025Rejected
4
Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception
ICLR 2025Poster
4
CameraCtrl: Enabling Camera Control for Video Diffusion Models
ICLR 2025Poster
4
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
ICLR 2025Poster
4
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
NeurIPS 2025Poster
4
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
NeurIPS 2025Poster
4
Mixture Compressor for Mixture-of-Experts LLMs Gains More
ICLR 2025Poster
3
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
ICLR 2025Spotlight
4
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
ICML 2025Poster
3
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
ICLR 2025Spotlight
5
Step-Controlled DPO: Leveraging Stepwise Errors for Enhancing Mathematical Reasoning of Language Models
ICLR 2025Rejected
4
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
NeurIPS 2025Poster
4
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
ICLR 2025Poster
4
NopeRoomGS: Indoor 3D Gaussian Splatting Optimization without Camera Pose Input
NeurIPS 2025Poster
3
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
ICLR 2025Poster
4
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
ICLR 2025withdrawn
4
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
NeurIPS 2025Poster
3
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
NeurIPS 2025Poster
4
TerDiT: Ternary Diffusion Models with Transformers
ICLR 2025withdrawn
4
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
ICLR 2025Poster
3
GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing
NeurIPS 2025Poster
3
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
NeurIPS 2025Poster
4
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
ICLR 2025Poster
4
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving
ICLR 2025Rejected
4
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
ICML 2025Poster
6
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
ICLR 2025Poster
4
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
NeurIPS 2025Poster
5
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
ICLR 2025Spotlight
202422 篇
4
Covariance-corrected Whitening Alleviates Network Degeneration on Imbalanced Classification
ICLR 2024Rejected
5
Meta-Transformer: A Unified Framework for Multimodal Learning
ICLR 2024withdrawn
4
Searching for Parameter-Efficient Tuning Architecture for Text-to-image Diffusion Models
ICLR 2024withdrawn
4
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models
NeurIPS 2024Poster
4
Enhancing Vision-Language Model with Unmasked Token Alignment at Scale
ICLR 2024withdrawn
5
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control
NeurIPS 2024Poster
4
Debias the Training of Diffusion Models
ICLR 2024withdrawn
4
Instruct2Act: Mapping Multi-modality Instructions to Robotic Arm Actions with Large Language Model
ICLR 2024Rejected
3
LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention
ICLR 2024Poster
4
Unifying Diverse Decision-Making Scenarios with Learned Discrete Actions
ICLR 2024Rejected
4
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
NeurIPS 2024Poster
5
A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding
NeurIPS 2024Poster
4
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
NeurIPS 2024Poster
4
Learning 1D Causal Visual Representation with De-focus Attention Networks
NeurIPS 2024Poster
4
ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
ICLR 2024Poster
3
Personalize Segment Anything Model with One Shot
ICLR 2024Poster
4
ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
NeurIPS 2024Poster
4
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
ICLR 2024Poster
4
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
ICLR 2024Poster
3
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
ICLR 2024Rejected
3
Phased Consistency Models
NeurIPS 2024Poster
5
Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT
NeurIPS 2024Poster