Renrui Zhang
~Renrui_Zhang1
27
论文总数
13.5
年均投稿
平均评分
接收情况21/27
会议分布
ICLR
14
NeurIPS
12
ICML
1
发表论文 (27 篇)
202518 篇
4
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
ICLR 2025Poster
4
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
NeurIPS 2025Poster
4
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
ICLR 2025Poster
4
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
ICML 2025Poster
3
LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
ICLR 2025Spotlight
4
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
NeurIPS 2025Poster
4
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
NeurIPS 2025Poster
3
UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens
NeurIPS 2025Poster
4
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
ICLR 2025Poster
4
What We Miss Matters: Learning from the Overlooked in Point Cloud Transformers
NeurIPS 2025Poster
4
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking
NeurIPS 2025Poster
4
PointACL: Point Cloud Understanding via Attention-Driven Contrastive Learning
ICLR 2025withdrawn
4
Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning
NeurIPS 2025Poster
4
HybridVLA: Collaborative Autoregression and Diffusion in a Unified Vision-Language-Action Model
NeurIPS 2025Rejected
3
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
NeurIPS 2025Poster
4
TerDiT: Ternary Diffusion Models with Transformers
ICLR 2025withdrawn
5
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
ICLR 2025Spotlight
4
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
NeurIPS 2025Poster
20249 篇
3
Personalize Segment Anything Model with One Shot
ICLR 2024Poster
3
LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention
ICLR 2024Poster
3
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
ICLR 2024Rejected
4
Improving Compositional Text-to-image Generation with Large Vision-Language Models
ICLR 2024Rejected
5
TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning
ICLR 2024Rejected
4
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
NeurIPS 2024Poster
4
ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation
ICLR 2024Poster
4
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
ICLR 2024Poster
5
RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation
NeurIPS 2024Poster