影响力指数

81.63/100

前 1.2%

全站排名 #790

发表论文28 篇

平均评分5.2

年均产出9.3 篇/年

Xiaoyi Dong

Member of Technical Staff@Microsoft·美国·OpenReview

研究方向

Multi-modality LLM · Vision Transformer · Adversarial samples

Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models

ICLR 2026Poster

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

ICLR 2026Poster

Visual Self-Refine: A Pixel-Guided Paradigm for Accurate Chart Parsing

ICLR 2026Poster

Advancing Complex Video Object Segmentation via Progressive Concept Construction

ICLR 2026Poster

SIM-CoT: Supervised Implicit Chain-of-Thought

ICLR 2026Poster

ScaleCap: Scalable Image Captioning via Dual-Modality Debiasing

ICLR 2026Poster

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

ICLR 2026Rejected

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning

ICLR 2026Poster

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

ICLR 2026Withdrawn

SPARK: Synergistic Policy And Reward Co-Evolving Framework

ICLR 2026Withdrawn

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

ICLR 2026Withdrawn

BoostStep: Boosting Mathematical Capability of Large Language Models via Step-aligned In Context Learning

ICLR 2026Rejected

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

NeurIPS 2025Poster

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

ICLR 2025Poster

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

ICLR 2025Poster

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

ICML 2025Poster

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

ICLR 2025Rejected

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate

ICLR 2025Withdrawn

SAM2Long: Enhancing SAM2 for Long Video Segmentation with a Training-Free Memory Tree

ICLR 2025Withdrawn

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

ICLR 2025Withdrawn

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation

ICLR 2025Rejected

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way

ICLR 2025Withdrawn

Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data

ICLR 2025Withdrawn

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

ICLR 2025Withdrawn

合作者 (20)

博后导师26 篇