影响力指数

84.05/100

前 1%

全站排名 #659

发表论文34 篇

平均评分5.1

年均产出11.3 篇/年

Yuhang Zang

Researcher@Shanghai Artificial Intelligence Laboratory·中国·OpenReview

研究方向

Vision-Language Models · Large Language Models

Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models

ICLR 2026Poster

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

ICLR 2026Poster

SIM-CoT: Supervised Implicit Chain-of-Thought

ICLR 2026Poster

Visual Self-Refine: A Pixel-Guided Paradigm for Accurate Chart Parsing

ICLR 2026Poster

DiCache: Let Diffusion Model Determine Its Own Cache

ICLR 2026Poster

Advancing Complex Video Object Segmentation via Progressive Concept Construction

ICLR 2026Poster

ScaleCap: Scalable Image Captioning via Dual-Modality Debiasing

ICLR 2026Poster

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

ICLR 2026Rejected

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning

ICLR 2026Poster

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

ICLR 2026Rejected

Unified Reward Model for Multimodal Understanding and Generation

ICLR 2026Withdrawn

SPARK: Synergistic Policy And Reward Co-Evolving Framework

ICLR 2026Withdrawn

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

ICLR 2026Withdrawn

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

ICLR 2026Withdrawn

$\text{G}^2$RPO: Granular GRPO for Precise Reward in Flow Models

ICLR 2026Withdrawn

Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning

ICLR 2026Withdrawn

BoostStep: Boosting Mathematical Capability of Large Language Models via Step-aligned In Context Learning

ICLR 2026Rejected

Generative Photographic Control for Scene-Consistent Video Cinematic Editing

ICLR 2026Withdrawn

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

NeurIPS 2025Poster

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

NeurIPS 2025Poster

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

ICLR 2025Poster

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

ICLR 2025Poster

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

ICLR 2025Rejected

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

ICML 2025Poster

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate

ICLR 2025Withdrawn

SAM2Long: Enhancing SAM2 for Long Video Segmentation with a Training-Free Memory Tree

ICLR 2025Withdrawn

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way

ICLR 2025Withdrawn

Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

ICLR 2025Withdrawn

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

ICLR 2025Withdrawn

合作者 (20)