影响力指数

96.95/100

前 0.2%

全站排名 #97

发表论文62 篇

平均评分5.1

年均产出20.7 篇/年

Zhou Zhao

Full Professor@Zhejiang University·中国·OpenReview

研究方向

Machine Learning · Computer Vision · deep learning

SpatialHand: Generative Object Manipulation from 3D Prespective

ICLR 2026Poster

WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark

ICLR 2026Poster

MARS-Sep: Multimodal-Aligned Reinforced Sound Separation

ICLR 2026Poster

Depth Anything with Any Prior

ICLR 2026Poster

Figma2Code: Automating Multimodal Design to Code in the Wild

ICLR 2026Poster

Vox-Infinity: Benchmarking the Limits of Long-Context Spoken Language Models

ICLR 2026Rejected

AlignSep: Temporally-Aligned Video-Queried Sound Separation with Flow Matching

ICLR 2026Poster

CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration

ICLR 2026Poster

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

ICLR 2026Rejected

ETC: training-free diffusion models acceleration with Error-aware Trend Consistency

ICLR 2026Rejected

PSR: Subject-Consistency Rewards for Multi-Subject Personalized Generation

ICLR 2026Withdrawn

DSI-Bench: A Benchmark for Dynamic Spatial Intelligence

ICLR 2026Withdrawn

Keep Refining Your Discrete Diffusion Model: A Mixture of Absorbing and Uniform Processes

ICLR 2026Rejected

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

ICLR 2026Rejected

ACDC: Adaptive Cloud-Device Collaboration for Efficient and Accurate Semantic Segmentation

ICLR 2026Rejected

Orient Anything V2: Unifying Orientation and Rotation Understanding

NeurIPS 2025Spotlight

SPMDM: Enhancing Masked Diffusion Models through Simplifing Sampling Path

NeurIPS 2025Poster

ThinkSound: Chain-of-Thought Reasoning in Multimodal LLMs for Audio Generation and Editing

NeurIPS 2025Poster

OmniAudio: Generating Spatial Audio from 360-Degree Video

ICML 2025Poster

VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?

ICLR 2025Poster

EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation

ICLR 2025Poster

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

ICLR 2025Poster

Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning

NeurIPS 2025Poster

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

ICLR 2025Poster

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

ICLR 2025Poster

Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis

ICLR 2025Withdrawn

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization

ICLR 2025Rejected

Dataflow-Guided Neuro-Symbolic Language Models for Type Inference

ICML 2025Poster

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

ICML 2025Poster

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control

ICLR 2025Withdrawn

T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback

ICLR 2025Withdrawn

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

ICLR 2025Withdrawn

IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models

ICML 2025Poster

CodeSync: Synchronizing Large Language Models with Dynamic Code Evolution at Scale

ICML 2025Poster

AVSET-10M: An Open Large-Scale Audio-Visual Dataset with High Correspondence

ICLR 2025Withdrawn

Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models

ICLR 2025Rejected

Noise-Robust Audio-Visual Speech-Driven Body Language Synthesis

ICLR 2025Withdrawn

Advancing Multimodal Unified Discrete Representations

ICLR 2025Withdrawn

MultiBand: Multi-Task Song Generation with Personalized Prompt-Based Control

ICLR 2025Withdrawn

Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style Representation

ICLR 2025Withdrawn

MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

ICLR 2025Withdrawn

Fox-TTS: Scalable Flow Transformers for Expressive Zero-Shot Text to Speech

ICLR 2025Withdrawn

MindLoc: A Secure Brain-Based System for Object Localization

ICLR 2025Withdrawn

合作者 (20)