Xiangtai Li
~Xiangtai_Li1
23
论文总数
11.5
年均投稿
平均评分
接收情况18/23
会议分布
ICLR
11
NeurIPS
9
ICML
3
发表论文 (23 篇)
202516 篇
5
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
ICLR 2025withdrawn
4
Conditional Panoramic Image Generation via Masked Autoregressive Modeling
NeurIPS 2025Poster
4
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
ICLR 2025Spotlight
5
Towards Semantic Equivalence of Tokenization in Multimodal LLM
ICLR 2025Poster
4
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
ICLR 2025withdrawn
5
On Path to Multimodal Generalist: General-Level and General-Bench
ICML 2025Oral
4
PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners
ICLR 2025withdrawn
4
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection
ICLR 2025Poster
4
Three-Dimensional Trajectory Prediction with 3DMoTraj Dataset
ICML 2025Poster
4
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
NeurIPS 2025Poster
4
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
NeurIPS 2025Poster
3
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
ICLR 2025Poster
4
RelationBooth: Towards Relation-Aware Customized Object Generation
ICLR 2025withdrawn
4
OmniAudio: Generating Spatial Audio from 360-Degree Video
ICML 2025Poster
4
RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
ICLR 2025Oral
4
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
NeurIPS 2025Poster
20247 篇
5
DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
ICLR 2024withdrawn
3
SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow
NeurIPS 2024Poster
5
MotionBooth: Motion-Aware Customized Text-to-Video Generation
NeurIPS 2024Spotlight
4
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
NeurIPS 2024Poster
4
Synergistic Dual Spatial-aware Generation of Image-to-text and Text-to-image
NeurIPS 2024Poster
4
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
ICLR 2024Spotlight
4
MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection
NeurIPS 2024Poster