Zhou Zhao
~Zhou_Zhao3
47
论文总数
23.5
年均投稿
平均评分
接收情况23/47
会议分布
ICLR
31
NeurIPS
11
ICML
5
发表论文 (47 篇)
202528 篇
4
Dataflow-Guided Neuro-Symbolic Language Models for Type Inference
ICML 2025Poster
4
SPMDM: Enhancing Masked Diffusion Models through Simplifing Sampling Path
NeurIPS 2025Poster
5
Controllable Text-to-Speech Synthesis with Masked-Autoencoded Style Representation
ICLR 2025withdrawn
4
EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation
ICLR 2025Poster
4
ThinkSound: Chain-of-Thought Reasoning in Multimodal LLMs for Audio Generation and Editing
NeurIPS 2025Poster
5
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
ICLR 2025Rejected
5
Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning
NeurIPS 2025Poster
4
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
ICML 2025Poster
3
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
ICLR 2025Rejected
4
IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models
ICML 2025Poster
3
CodeSync: Synchronizing Large Language Models with Dynamic Code Evolution at Scale
ICML 2025Poster
4
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
ICLR 2025withdrawn
4
Orient Anything V2: Unifying Orientation and Rotation Understanding
NeurIPS 2025Spotlight
5
Noise-Robust Audio-Visual Speech-Driven Body Language Synthesis
ICLR 2025withdrawn
4
AVSET-10M: An Open Large-Scale Audio-Visual Dataset with High Correspondence
ICLR 2025withdrawn
4
MultiBand: Multi-Task Song Generation with Personalized Prompt-Based Control
ICLR 2025withdrawn
4
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
ICLR 2025Poster
4
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
ICLR 2025Poster
5
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control
ICLR 2025withdrawn
5
VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?
ICLR 2025Poster
6
Advancing Multimodal Unified Discrete Representations
ICLR 2025withdrawn
4
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
ICLR 2025withdrawn
4
Fox-TTS: Scalable Flow Transformers for Expressive Zero-Shot Text to Speech
ICLR 2025withdrawn
3
MindLoc: A Secure Brain-Based System for Object Localization
ICLR 2025withdrawn
4
OmniAudio: Generating Spatial Audio from 360-Degree Video
ICML 2025Poster
4
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
ICLR 2025withdrawn
4
Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
ICLR 2025withdrawn
4
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
ICLR 2025Poster
202419 篇
4
Overcoming both Domain Shift and Label Shift for Referring Video Segmentation
ICLR 2024Rejected
4
Classifier-guided Gradient Modulation for Enhanced Multimodal Learning
NeurIPS 2024Poster
4
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes
ICLR 2024withdrawn
4
NaturalSigner: Diffusion Models are Natural Sign Language Generator
ICLR 2024withdrawn
4
MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
NeurIPS 2024Poster
4
Extending Multi-modal Contrastive Representations
ICLR 2024Rejected
4
Action Imitation in Common Action Space for Customized Action Image Synthesis
NeurIPS 2024Poster
4
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
NeurIPS 2024Poster
3
HarmonyLM: Advancing Unified Large-Scale Language Modeling for Sound and Music Generation
ICLR 2024withdrawn
4
TETA: Temporal-Enhanced Text-to-Audio Generation
ICLR 2024withdrawn
4
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
ICLR 2024Rejected
3
Listen to Motion: Robustly Learning Correlated Audio-Visual Representations
ICLR 2024withdrawn
4
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
ICLR 2024Rejected
4
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
NeurIPS 2024Poster
4
MVoice: Multilingual Unified Voice Generation With Discrete Representation at Scale
ICLR 2024withdrawn
4
Extending Multi-modal Contrastive Representations
NeurIPS 2024Poster
4
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
ICLR 2024Poster
4
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
NeurIPS 2024Poster
4
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
ICLR 2024Spotlight