Mike Zheng Shou
~Mike_Zheng_Shou1
42
论文总数
21.0
年均投稿
平均评分
接收情况26/42
会议分布
ICLR
21
NeurIPS
19
ICML
2
发表论文 (42 篇)
202524 篇
4
Improving Autoregressive Image Generation by Mitigating Gradient Bias in Softmax
ICLR 2025withdrawn
4
Impossible Videos
ICML 2025Poster
4
Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos
ICLR 2025withdrawn
5
Sparse Image Synthesis via Joint Latent and RoI Flow
NeurIPS 2025Poster
4
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents
NeurIPS 2025Poster
4
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
NeurIPS 2025Poster
4
PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer
NeurIPS 2025Poster
4
Grounding Multimodal Large Language Model in GUI World
ICLR 2025Poster
4
Show-o2: Improved Native Unified Multimodal Models
NeurIPS 2025Poster
5
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
NeurIPS 2025Poster
4
Personalized Vision via Visual In-Context Learning
NeurIPS 2025Rejected
5
WMAdapter: Adding WaterMark Control to Latent Diffusion Models
ICLR 2025Rejected
4
OmniContrast: Vision-Language-Interleaved Contrast from Pixels All at once
ICLR 2025Rejected
3
WMAdapter: Adding WaterMark Control to Latent Diffusion Models
ICML 2025Poster
3
DOTA: Distributional Test-time Adaptation of Vision-Language Models
NeurIPS 2025Poster
4
DOTA: Distributional Test-Time Adaptation of Vision-Language Models
ICLR 2025Rejected
5
Image Watermarks are Removable using Controllable Regeneration from Clean Noise
ICLR 2025Poster
3
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach
ICLR 2025Poster
4
Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation
ICLR 2025withdrawn
5
MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation
ICLR 2025Poster
3
X-PlugVid: Versatile Adaptation of Image Plugins for Controllable Video Generation
ICLR 2025withdrawn
4
CoFFT: Chain of Foresight-Focus Thought for Visual Language Models
NeurIPS 2025Poster
5
VEditBench: Holistic Benchmark for Text-Guided Video Editing
ICLR 2025Rejected
4
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
ICLR 2025Poster
202418 篇
4
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
ICLR 2024Poster
4
Linguistic Image Understanding
ICLR 2024withdrawn
4
Can Simple Averaging Defeat Modern Watermarks?
NeurIPS 2024Poster
3
LOVA3: Learning to Visual Question Answering, Asking and Assessment
NeurIPS 2024Poster
4
Skinned Motion Retargeting with Dense Geometric Interaction Perception
NeurIPS 2024Spotlight
4
Exocentric-to-Egocentric Video Generation
NeurIPS 2024Poster
4
TaCA: Hot-Plugging Upgrades for Foundation Model with Task-agnostic Compatible Adapter
ICLR 2024Rejected
5
DoFIT: Domain-aware Federated Instruction Tuning with Alleviated Catastrophic Forgetting
NeurIPS 2024Poster
4
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
NeurIPS 2024Poster
4
Implicit Semi-auto-regressive Image-to-Video Diffusion
ICLR 2024Rejected
4
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
ICLR 2024withdrawn
4
Integrating View Conditions for Image Synthesis
ICLR 2024Rejected
4
Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
ICLR 2024withdrawn
3
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
ICLR 2024withdrawn
3
Visual Perception by Large Language Model’s Weights
NeurIPS 2024Poster
3
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
NeurIPS 2024Poster
4
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
NeurIPS 2024Poster
3
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
NeurIPS 2024Poster