影响力指数

89.41/100

前 0.6%

全站排名 #401

发表论文41 篇

平均评分5.3

年均产出13.7 篇/年

Wei Xue

Assistant Professor@Hong Kong University of Science and Technology·中国香港·OpenReview

研究方向

Speech and Audio Processing

AudioX: A Unified Framework for Anything-to-Audio Generation

ICLR 2026Poster

PrismAudio: Decomposed Chain-of-Thought and Multi-dimensional Rewards for Video-to-Audio Generation

ICLR 2026Poster

YuE: Scaling Open Foundation Models for Long-Form Music Generation

ICLR 2026Poster

TreePO: Enhancing Policy Efficacy and Inference Efficiency with Tree Modeling

ICLR 2026Rejected

Pixel-Perfect Puppetry: Precision-Guided Enhancement for Face Image and Video Editing

ICLR 2026Poster

Style Waltz: Dancing Between Content and Style in Face Stylization

ICLR 2026Rejected

PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation

ICLR 2026Rejected

UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice

ICLR 2026Poster

Reinforcement Learning for Generalized Label Aggregation

ICLR 2026Rejected

Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks

ICLR 2026Poster

WoW: Scaling Embodied Omni-World Model For Generalizable Manipulation Simulation

ICLR 2026Rejected

Audio-FLAN: An Instruction-Following Dataset for Unified Understanding and Generation of Speech, Music, and Sound

ICLR 2026Rejected

Forging a Masterpiece from Any Face: A Universal Framework for Face Stylization

ICLR 2026Withdrawn

SongEval: A Benchmark Dataset for Song Aesthetics Evaluation

ICLR 2026Withdrawn

Light-Search: Reducing Retrieval Cost in RAG via Curriculum-Based Policy Optimization

ICLR 2026Withdrawn

Alignment Does Matter: Enables Pure-Speech-Token Dialogue with Frozen Text LLMs

ICLR 2026Rejected

Co$^{\mathbf{3}}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion

ICLR 2025Spotlight

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

ICLR 2025Spotlight

Delta Decompression for MoE-based LLMs Compression

ICML 2025Poster

ThinkSound: Chain-of-Thought Reasoning in Multimodal LLMs for Audio Generation and Editing

NeurIPS 2025Poster

Foundation Cures Personalization: Improving Personalized Models’ Prompt Consistency via Hidden Foundation Knowledge

NeurIPS 2025Poster

MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition

ICML 2025Poster

OmniAudio: Generating Spatial Audio from 360-Degree Video

ICML 2025Poster

MuPT: A Generative Symbolic Music Pretrained Transformer

ICLR 2025Poster

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

ICLR 2025Poster

EVA: An Embodied World Model for Future Video Anticipation

ICLR 2025Rejected

You Know What I'm Saying: Jailbreak Attack via Implicit Reference

ICLR 2025Rejected

Empowering World Models with Reflection for Embodied Video Prediction

ICML 2025Poster

Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition

ICLR 2025Rejected

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

ICLR 2025Withdrawn

AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems

ICLR 2025Withdrawn

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

ICLR 2025Withdrawn

ViML: A Video, Music, Language Unified Dataset for Understanding and Generation

ICLR 2025Withdrawn

GuideEdit: Enhancing Face Video Editing with Fine-grained Control

ICLR 2025Withdrawn

NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models

ICLR 2025Withdrawn

$CoCoGesture$: Towards Coherent Co-speech 3D Gesture Generation in the Wild

ICLR 2025Withdrawn

合作者 (20)

合作者32 篇

Shanghang Zhang