影响力指数

94.63/100

前 0.3%

全站排名 #185

发表论文42 篇

平均评分5.3

年均产出14.0 篇/年

Yuki Mitsufuji

Visiting Research Professor@New York University·美国·OpenReview

研究方向

Diffusion · Music · Audio · Sound

CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow-Map Models

ICLR 2026Poster

Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment

ICLR 2026Poster

LLM2Fx-Tools: Tool Calling for Music Post-Production

ICLR 2026Poster

SONA: Learning Conditional, Unconditional, and Matching-Aware Discriminator

ICLR 2026Poster

Concept-TRAK: Understanding how diffusion models learn concepts through concept attribution

ICLR 2026Poster

Theoretical refinement of CLIP by utilizing linear structure of optimal similarity

ICLR 2026Rejected

VIRTUE: Visual-Interactive Text-Image Universal Embedder

ICLR 2026Poster

Improving Classifier-Free Guidance in Masked Diffusion: Low-Dim Theoretical Insights with High-Dim Impact

ICLR 2026Poster

3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation

ICLR 2026Poster

Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance

ICLR 2026Rejected

TraSCE: Trajectory Steering for Concept Erasure

ICLR 2026Rejected

Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion

ICLR 2026Rejected

Exposing Vulnerabilities in Latent-Noise Diffusion Watermarks

ICLR 2026Rejected

SoundReactor: Frame-level Online Video-to-Audio Generation

ICLR 2026Withdrawn

MLLMCLIP: Feature-Level Distillation of MLLM for Robust Vision-Language Representations

ICLR 2026Withdrawn

Unified Pose Embeddings: Utilizing Euclidean Space for Simplified Topology Alignment

ICLR 2026Withdrawn

MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval

ICLR 2026Withdrawn

Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric

ICLR 2025Spotlight

Supervised Contrastive Learning from Weakly-Labeled Audio Segments for Musical Version Matching

ICML 2025Poster

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation

ICLR 2025Poster

Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models

ICLR 2025Poster

Enhancing 3D Reconstruction for Dynamic Scenes

NeurIPS 2025Poster

Distillation of Discrete Diffusion through Dimensional Correlations

ICML 2025Poster

MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation

ICLR 2025Poster

HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning

ICLR 2025Poster

Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models

ICLR 2025Poster

GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models

ICLR 2025Rejected

G2D2: Gradient-guided Discrete Diffusion for image inverse problem solving

ICLR 2025Rejected

Orator: LLM-Guided Multi-Shot Speech Video Generation

ICLR 2025Rejected

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

ICLR 2025Rejected

A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation

ICLR 2025Withdrawn

Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space

ICLR 2025Rejected

VCT: Training Consistency Models with Variational Noise Coupling

ICML 2025Poster

FLEXOUNDIT: VARIABLE-LENGTH DIFFUSION TRANSFORMER FOR TEXT-TO-AUDIO GENERATION

ICLR 2025Withdrawn

Mitigating Embedding Collapse in Diffusion Models for Categorical Data

ICLR 2025Withdrawn

OpenMU: Your Swiss Army Knife for Music Understanding

ICLR 2025Withdrawn

合作者 (20)

Takashi Shibuya

Toshimitsu Uesaka

Wei-Hsiang Liao