Zehan Wang
~Zehan_Wang2
26
论文总数
13.0
年均投稿
平均评分
接收情况14/26
会议分布
ICLR
18
NeurIPS
7
ICML
1
发表论文 (26 篇)
202516 篇
4
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
ICLR 2025withdrawn
4
Orient Anything V2: Unifying Orientation and Rotation Understanding
NeurIPS 2025Spotlight
4
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
ICML 2025Poster
4
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
ICLR 2025Poster
4
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
ICLR 2025Poster
5
Noise-Robust Audio-Visual Speech-Driven Body Language Synthesis
ICLR 2025withdrawn
4
Improving Long-Text Alignment for Text-to-Image Diffusion Models
ICLR 2025Poster
4
AVSET-10M: An Open Large-Scale Audio-Visual Dataset with High Correspondence
ICLR 2025withdrawn
5
VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?
ICLR 2025Poster
4
Dynamic Switching Teacher: How to Generalize Temporal Action Detection Models
ICLR 2025withdrawn
5
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
ICLR 2025Poster
4
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
ICLR 2025withdrawn
5
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control
ICLR 2025withdrawn
4
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
ICLR 2025Poster
6
Advancing Multimodal Unified Discrete Representations
ICLR 2025withdrawn
3
MindLoc: A Secure Brain-Based System for Object Localization
ICLR 2025withdrawn
202410 篇
3
Listen to Motion: Robustly Learning Correlated Audio-Visual Representations
ICLR 2024withdrawn
4
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes
ICLR 2024withdrawn
4
Extending Multi-modal Contrastive Representations
ICLR 2024Rejected
4
Extending Multi-modal Contrastive Representations
NeurIPS 2024Poster
4
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
NeurIPS 2024Poster
4
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
ICLR 2024Rejected
4
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
NeurIPS 2024Poster
4
Action Imitation in Common Action Space for Customized Action Image Synthesis
NeurIPS 2024Poster
4
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
NeurIPS 2024Poster
5
Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT
NeurIPS 2024Poster