影响力指数

90.01/100

前 0.6%

全站排名 #373

发表论文41 篇

平均评分5.2

年均产出13.7 篇/年

Xiaodan Liang

Full Professor@SUN YAT-SEN UNIVERSITY·中国·OpenReview

研究方向

Embodied Vision · Cross-modal Understanding and generation · Image/Video Generation and Editing

SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation

ICLR 2026Poster

iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance

ICLR 2026Rejected

FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models

ICLR 2026Rejected

Does Your 3D Encoder Really Work? A simple yet effective pathway to real 3D scene understanding

ICLR 2026Rejected

C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

ICLR 2026Rejected

VistaGUI: Towards More Robust and Intelligent GUI Automation

ICLR 2026Rejected

SimuPhy: Towards Physical Understanding, Reasoning, and Evaluation via Code Generation

ICLR 2026Rejected

Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration

ICLR 2026Withdrawn

Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers

ICLR 2026Withdrawn

MakeupAnyone: Self-Supervised Identity-Preserving MakeUp Transfer with Region-Aware Multi-Scale Alignment

ICLR 2026Rejected

TreeRPO: Tree Relative Policy Optimization

ICLR 2026Withdrawn

CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics

ICLR 2026Desk Rejected

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

NeurIPS 2025Poster

MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation

ICLR 2025Rejected

WISA: World simulator assistant for physics-aware text-to-video generation

NeurIPS 2025Spotlight

OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling

ICLR 2025Poster

GDrag:Towards General-Purpose Interactive Editing with Anti-ambiguity Point Diffusion

ICLR 2025Poster

PT-T2I/V: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Image/Video-Task

ICLR 2025Poster

CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

ICLR 2025Poster

UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting

ICLR 2025Poster

Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes

ICLR 2025Poster

S2-Track: A Simple yet Strong Approach for End-to-End 3D Multi-Object Tracking

ICML 2025Poster

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

ICLR 2025Withdrawn

UncertaintyRAG: Span Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation

ICLR 2025Rejected

Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models

ICLR 2025Withdrawn

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

ICLR 2025Withdrawn

StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration

ICLR 2025Rejected

Memory-Driven Multimodal Chain of Thought for Embodied Long-Horizon Task Planning

ICLR 2025Withdrawn

ActionFiller: Fill-In-The-Blank Prompting for OS Agent

ICLR 2025Rejected

合作者 (20)