PaperHub

Zhou Zhao

~Zhou_Zhao3

47
论文总数
23.5
年均投稿
5.3
平均评分
接收情况23/47
会议分布
ICLR
31
NeurIPS
11
ICML
5

发表论文 (47 篇)

202528

5.5
4

Dataflow-Guided Neuro-Symbolic Language Models for Type Inference

ICML 2025Poster
7.3
4

SPMDM: Enhancing Masked Diffusion Models through Simplifing Sampling Path

NeurIPS 2025Poster
4.2
5

Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style Representation

ICLR 2025withdrawn
6.5
4

EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation

ICLR 2025Poster
6.8
4

ThinkSound: Chain-of-Thought Reasoning in Multimodal LLMs for Audio Generation and Editing

NeurIPS 2025Poster
4.6
5

Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models

ICLR 2025Rejected
6.4
5

Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning

NeurIPS 2025Poster
5.5
4

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

ICML 2025Poster
5.7
3

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization

ICLR 2025Rejected
4.9
4

IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models

ICML 2025Poster
4.8
3

CodeSync: Synchronizing Large Language Models with Dynamic Code Evolution at Scale

ICML 2025Poster
4.0
4

MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

ICLR 2025withdrawn
8.7
4

Orient Anything V2: Unifying Orientation and Rotation Understanding

NeurIPS 2025Spotlight
4.4
5

Noise-Robust Audio-Visual Speech-Driven Body Language Synthesis

ICLR 2025withdrawn
4.8
4

AVSET-10M: An Open Large-Scale Audio-Visual Dataset with High Correspondence

ICLR 2025withdrawn
4.3
4

MultiBand: Multi-Task Song Generation with Personalized Prompt-Based Control

ICLR 2025withdrawn
6.0
4

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

ICLR 2025Poster
6.3
4

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

ICLR 2025Poster
5.2
5

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control

ICLR 2025withdrawn
6.6
5

VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?

ICLR 2025Poster
4.3
6

Advancing Multimodal Unified Discrete Representations

ICLR 2025withdrawn
5.0
4

T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback

ICLR 2025withdrawn
3.0
4

Fox-TTS: Scalable Flow Transformers for Expressive Zero-Shot Text to Speech

ICLR 2025withdrawn
2.3
3

MindLoc: A Secure Brain-Based System for Object Localization

ICLR 2025withdrawn
6.6
4

OmniAudio: Generating Spatial Audio from 360-Degree Video

ICML 2025Poster
5.0
4

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

ICLR 2025withdrawn
5.8
4

Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis

ICLR 2025withdrawn
6.5
4

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

ICLR 2025Poster

202419

4.3
4

Overcoming both Domain Shift and Label Shift for Referring Video Segmentation

ICLR 2024Rejected
5.3
4

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

NeurIPS 2024Poster
4.0
4

Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes

ICLR 2024withdrawn
4.8
4

NaturalSigner: Diffusion Models are Natural Sign Language Generator

ICLR 2024withdrawn
7.0
4

MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence

NeurIPS 2024Poster
6.0
4

Extending Multi-modal Contrastive Representations

ICLR 2024Rejected
6.0
4

Action Imitation in Common Action Space for Customized Action Image Synthesis

NeurIPS 2024Poster
6.0
4

Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching

NeurIPS 2024Poster
1.7
3

HarmonyLM: Advancing Unified Large-Scale Language Modeling for Sound and Music Generation

ICLR 2024withdrawn
3.5
4

TETA: Temporal-Enhanced Text-to-Audio Generation

ICLR 2024withdrawn
6.3
4

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

ICLR 2024Rejected
3.7
3

Listen to Motion: Robustly Learning Correlated Audio-Visual Representations

ICLR 2024withdrawn
5.3
4

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

ICLR 2024Rejected
5.5
4

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers

NeurIPS 2024Poster
3.5
4

MVoice: Multilingual Unified Voice Generation With Discrete Representation at Scale

ICLR 2024withdrawn
5.3
4

Extending Multi-modal Contrastive Representations

NeurIPS 2024Poster
6.5
4

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

ICLR 2024Poster
5.5
4

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes

NeurIPS 2024Poster
8.5
4

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

ICLR 2024Spotlight