影响力指数

88.23/100

前 0.7%

全站排名 #450

发表论文33 篇

平均评分5.3

年均产出11.0 篇/年

Zehan Wang

PhD student@Zhejiang University·中国·OpenReview

研究方向

multi-modal learning

SpatialHand: Generative Object Manipulation from 3D Prespective

ICLR 2026Poster

Depth Anything with Any Prior

ICLR 2026Poster

Vox-Infinity: Benchmarking the Limits of Long-Context Spoken Language Models

ICLR 2026Rejected

FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL

ICLR 2026Rejected

AlignSep: Temporally-Aligned Video-Queried Sound Separation with Flow Matching

ICLR 2026Poster

DSI-Bench: A Benchmark for Dynamic Spatial Intelligence

ICLR 2026Withdrawn

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

ICLR 2026Rejected

Orient Anything V2: Unifying Orientation and Rotation Understanding

NeurIPS 2025Spotlight

VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?

ICLR 2025Poster

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

ICLR 2025Poster

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

ICLR 2025Poster

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

ICLR 2025Poster

Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision

ICLR 2025Poster

Improving Long-Text Alignment for Text-to-Image Diffusion Models

ICLR 2025Poster

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

ICML 2025Poster

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control

ICLR 2025Withdrawn

T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback

ICLR 2025Withdrawn

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

ICLR 2025Withdrawn

AVSET-10M: An Open Large-Scale Audio-Visual Dataset with High Correspondence

ICLR 2025Withdrawn

Noise-Robust Audio-Visual Speech-Driven Body Language Synthesis

ICLR 2025Withdrawn

Advancing Multimodal Unified Discrete Representations

ICLR 2025Withdrawn

Dynamic Switching Teacher: How to Generalize Temporal Action Detection Models

ICLR 2025Withdrawn

MindLoc: A Secure Brain-Based System for Object Localization

ICLR 2025Withdrawn

合作者 (20)

博士导师28 篇