影响力指数

89.49/100

前 0.6%

全站排名 #398

发表论文44 篇

平均评分5.2

年均产出14.7 篇/年

Ying Shan

Director@Tencent AI Lab Center of Visual Computing·中国·OpenReview

研究方向

Multi-modal understanding and generation

From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation

ICLR 2026Poster

IC-Custom: Diverse Image Customization via In-Context Learning

ICLR 2026Poster

AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation

ICLR 2026Rejected

Rolling Forcing: Autoregressive Long Video Diffusion in Real Time

ICLR 2026Poster

GenCompositor: Generative Video Compositing with Diffusion Transformer

ICLR 2026Poster

ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

ICLR 2026Poster

MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation

ICLR 2026Withdrawn

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

ICLR 2026Rejected

DVT-LLaVA: Vision-Language Model Personalization with Disentangled Visual Tuning

ICLR 2026Rejected

Aligning Latent Spaces with Flow Priors

ICLR 2026Rejected

GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning

ICLR 2026Withdrawn

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

ICLR 2026Withdrawn

Unified Single Transformer for Multimodal Video Understanding and Generation

ICLR 2026Withdrawn

Fourier Minds, Forget Less: Discrete Fourier Transform for Fast and Robust Continual Learning in LLMs

ICLR 2026Withdrawn

LoRA-Gen: Specializing Large Language Model via Online LoRA Generation

ICML 2025Poster

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning

NeurIPS 2025Poster

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

NeurIPS 2025Poster

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding

ICML 2025Poster

Taming Rectified Flow for Inversion and Editing

ICML 2025Poster

LoRA-Gen: Specializing Language Model via Online LoRA Generation

ICLR 2025Withdrawn

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

ICLR 2025Rejected

Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

ICLR 2025Withdrawn

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

ICLR 2025Rejected

SEED-X: Multimodal Models in Real World

ICLR 2025Rejected

SEED-Story: Multimodal Long Story Generation with Large Language Model

ICLR 2025Withdrawn

Self-Conditioned Diffusion Model for Consistent Human Image and Video Synthesis

ICLR 2025Withdrawn

GPT4LoRA: Optimizing LoRA Combination via MLLM Self-Reflection

ICLR 2025Rejected

合作者 (20)