影响力指数

97.41/100

前 0.1%

全站排名 #76

发表论文63 篇

平均评分5.3

年均产出21.0 篇/年

Mike Zheng Shou

Assistant Professor@National University of Singapore·新加坡·OpenReview

研究方向

Multimodal · Video Generation · Video Understanding

VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning

ICLR 2026Poster

Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing

ICLR 2026Poster

TPDiff: Temporal Pyramid Video Diffusion Model

ICLR 2026Poster

Paper2Video: Automatic Video Generation from Scientific Papers

ICLR 2026Rejected

D-AR: Diffusion via Autoregressive Models

ICLR 2026Poster

DD-Ranking: Rethinking the Evaluation of Dataset Distillation

ICLR 2026Rejected

Personalized Vision via Visual In-Context Learning

ICLR 2026Rejected

UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths

ICLR 2026Rejected

A Gain for Reconstruction, A Pain for Generation: Exploiting Representation in Visual Tokenization

ICLR 2026Rejected

Ego-centric Predictive Model Conditioned on Hand Trajectories

ICLR 2026Withdrawn

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

ICLR 2026Rejected

Automated Movie Generation via Multi-Agent CoT Planning

ICLR 2026Rejected

Rethinking Defense for Computer-Use Agents: Context Deception Attacks are Simple to Defend

ICLR 2026Rejected

MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation

ICLR 2026Withdrawn

Code2Video: A Code-centric Paradigm for Educational Video Generation

ICLR 2026Rejected

Mitty: Diffusion-based Human-To-Robot Video Generation

ICLR 2026Withdrawn

DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection

ICLR 2026Withdrawn

WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point

ICLR 2026Withdrawn

Computer-Use Agents as Judges for Automatic GUI Design

ICLR 2026Withdrawn

Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation

ICLR 2026Withdrawn

Multi-Human Interactive Talking Dataset

ICLR 2026Rejected

macOSWorld: A Multilingual Interactive Benchmark for GUI Agents

NeurIPS 2025Poster

Show-o2: Improved Native Unified Multimodal Models

NeurIPS 2025Poster

Impossible Videos

ICML 2025Poster

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

ICLR 2025Poster

PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer

NeurIPS 2025Poster

Sparse Image Synthesis via Joint Latent and RoI Flow

NeurIPS 2025Poster

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

NeurIPS 2025Poster

Image Watermarks are Removable using Controllable Regeneration from Clean Noise

ICLR 2025Poster

DOTA: Distributional Test-time Adaptation of Vision-Language Models

NeurIPS 2025Poster

MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation

ICLR 2025Poster

CoFFT: Chain of Foresight-Focus Thought for Visual Language Models

NeurIPS 2025Poster

Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach

ICLR 2025Poster

WMAdapter: Adding WaterMark Control to Latent Diffusion Models

ICML 2025Poster

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

NeurIPS 2025Poster

Grounding Multimodal Large Language Model in GUI World

ICLR 2025Poster

DOTA: Distributional Test-Time Adaptation of Vision-Language Models

ICLR 2025Rejected

Personalized Vision via Visual In-Context Learning

NeurIPS 2025Rejected

OmniContrast: Vision-Language-Interleaved Contrast from Pixels All at once

ICLR 2025Rejected

WMAdapter: Adding WaterMark Control to Latent Diffusion Models

ICLR 2025Rejected

VEditBench: Holistic Benchmark for Text-Guided Video Editing

ICLR 2025Rejected

Improving Autoregressive Image Generation by Mitigating Gradient Bias in Softmax

ICLR 2025Withdrawn

Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos

ICLR 2025Withdrawn

Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation

ICLR 2025Withdrawn

X-PlugVid: Versatile Adaptation of Image Plugins for Controllable Video Generation

ICLR 2025Withdrawn

合作者 (20)

David Junhao Zhang

Kevin Qinghong Lin