PaperHub

Mike Zheng Shou

~Mike_Zheng_Shou1

42
论文总数
21.0
年均投稿
5.7
平均评分
接收情况26/42
会议分布
ICLR
21
NeurIPS
19
ICML
2

发表论文 (42 篇)

202524

4.8
4

Improving Autoregressive Image Generation by Mitigating Gradient Bias in Softmax

ICLR 2025withdrawn
6.6
4

Impossible Videos

ICML 2025Poster
4.5
4

Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos

ICLR 2025withdrawn
6.4
5

Sparse Image Synthesis via Joint Latent and RoI Flow

NeurIPS 2025Poster
7.3
4

macOSWorld: A Multilingual Interactive Benchmark for GUI Agents

NeurIPS 2025Poster
6.4
4

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

NeurIPS 2025Poster
6.4
4

PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer

NeurIPS 2025Poster
6.0
4

Grounding Multimodal Large Language Model in GUI World

ICLR 2025Poster
6.8
4

Show-o2: Improved Native Unified Multimodal Models

NeurIPS 2025Poster
6.0
5

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

NeurIPS 2025Poster
5.5
4

Personalized Vision via Visual In-Context Learning

NeurIPS 2025Rejected
5.2
5

WMAdapter: Adding WaterMark Control to Latent Diffusion Models

ICLR 2025Rejected
5.5
4

OmniContrast: Vision-Language-Interleaved Contrast from Pixels All at once

ICLR 2025Rejected
6.3
3

WMAdapter: Adding WaterMark Control to Latent Diffusion Models

ICML 2025Poster
6.4
3

DOTA: Distributional Test-time Adaptation of Vision-Language Models

NeurIPS 2025Poster
6.0
4

DOTA: Distributional Test-Time Adaptation of Vision-Language Models

ICLR 2025Rejected
6.4
5

Image Watermarks are Removable using Controllable Regeneration from Clean Noise

ICLR 2025Poster
6.3
3

Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach

ICLR 2025Poster
4.5
4

Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation

ICLR 2025withdrawn
6.4
5

MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation

ICLR 2025Poster
4.3
3

X-PlugVid: Versatile Adaptation of Image Plugins for Controllable Video Generation

ICLR 2025withdrawn
6.4
4

CoFFT: Chain of Foresight-Focus Thought for Visual Language Models

NeurIPS 2025Poster
5.2
5

VEditBench: Holistic Benchmark for Text-Guided Video Editing

ICLR 2025Rejected
6.5
4

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

ICLR 2025Poster

202418

6.3
4

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

ICLR 2024Poster
4.0
4

Linguistic Image Understanding

ICLR 2024withdrawn
5.5
4

Can Simple Averaging Defeat Modern Watermarks?

NeurIPS 2024Poster
4.7
3

LOVA3: Learning to Visual Question Answering, Asking and Assessment

NeurIPS 2024Poster
6.0
4

Skinned Motion Retargeting with Dense Geometric Interaction Perception

NeurIPS 2024Spotlight
6.0
4

Exocentric-to-Egocentric Video Generation

NeurIPS 2024Poster
5.3
4

TaCA: Hot-Plugging Upgrades for Foundation Model with Task-agnostic Compatible Adapter

ICLR 2024Rejected
6.4
5

DoFIT: Domain-aware Federated Instruction Tuning with Alleviated Catastrophic Forgetting

NeurIPS 2024Poster
5.3
4

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

NeurIPS 2024Poster
4.5
4

Implicit Semi-auto-regressive Image-to-Video Diffusion

ICLR 2024Rejected
5.5
4

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

ICLR 2024withdrawn
5.0
4

Integrating View Conditions for Image Synthesis

ICLR 2024Rejected
5.0
4

Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

ICLR 2024withdrawn
5.3
3

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

ICLR 2024withdrawn
5.3
3

Visual Perception by Large Language Model’s Weights

NeurIPS 2024Poster
5.0
3

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

NeurIPS 2024Poster
5.0
4

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

NeurIPS 2024Poster
6.3
3

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

NeurIPS 2024Poster