Yu Qiao
~Yu_Qiao1
80
论文总数
40.0
年均投稿
平均评分
接收情况49/80
会议分布
ICLR
56
NeurIPS
24
发表论文 (80 篇)
202534 篇
4
An Intelligent Agentic System for Complex Image Restoration Problems
ICLR 2025Poster
5
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
ICLR 2025Rejected
4
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
ICLR 2025Spotlight
4
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
ICLR 2025Poster
4
Linear Attention Sequence Parallelism
ICLR 2025withdrawn
4
Aligning Anything: Hierarchical Motion Estimation for Video Frame Interpolation
ICLR 2025withdrawn
4
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
ICLR 2025withdrawn
4
REEF: Representation Encoding Fingerprints for Large Language Models
ICLR 2025Oral
4
I-Lora: Iterative Merging of Routing-Tuned Low-Rank Adapters for Multi-task Learning
ICLR 2025withdrawn
3
Derail Yourself: Multi-turn LLM Jailbreak Attack through self-discovered clues
ICLR 2025withdrawn
3
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
ICLR 2025Spotlight
4
Diffusion Transformer Policy
ICLR 2025withdrawn
4
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
NeurIPS 2025Poster
4
VEnhancer: Generative Space-Time Enhancement for Video Generation
ICLR 2025Rejected
4
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
ICLR 2025Poster
4
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation
ICLR 2025Rejected
4
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
NeurIPS 2025Poster
4
ToMiE: Towards Modular Growth in Enhanced SMPL Skeleton for 3D Human Gaussians with Animatable Garments
ICLR 2025withdrawn
4
Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings
NeurIPS 2025Poster
4
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
ICLR 2025Poster
4
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
ICLR 2025Poster
3
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
ICLR 2025withdrawn
5
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
ICLR 2025Rejected
4
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
NeurIPS 2025Poster
5
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
ICLR 2025withdrawn
5
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
ICLR 2025Poster
4
OS-ATLAS: Foundation Action Model for Generalist GUI Agents
ICLR 2025Spotlight
5
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
ICLR 2025Poster
4
ArchCAD-400K: A Large-Scale CAD drawings Dataset and New Baseline for Panoptic Symbol Spotting
NeurIPS 2025Poster
4
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
ICLR 2025withdrawn
5
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
ICLR 2025Spotlight
4
OASIS: Open Agents Social Interaction Simulations on a Large Scale
ICLR 2025Rejected
4
DocGenome: A Large Benchmark for Multi-Modal Language Models in Real-World Academic Document Understanding
ICLR 2025Rejected
4
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025Spotlight
202446 篇
4
Reasoning Multi-Agent Behavioral Topology for Interactive Autonomous Driving
NeurIPS 2024Poster
4
CT++: Complementary Co-Training for Semi-Supervised Semantic Segmentation
ICLR 2024withdrawn
5
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
NeurIPS 2024Poster
4
Instruct2Act: Mapping Multi-modality Instructions to Robotic Arm Actions with Large Language Model
ICLR 2024Rejected
4
Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation
ICLR 2024Rejected
4
SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution
ICLR 2024Spotlight
5
Meta-Transformer: A Unified Framework for Multimodal Learning
ICLR 2024withdrawn
4
SC3D: Self-conditioned Generative Gaussian Model with 3D-aware Feedback
NeurIPS 2024Rejected
3
LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention
ICLR 2024Poster
4
SyncVIS: Synchronized Video Instance Segmentation
NeurIPS 2024Poster
4
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
NeurIPS 2024Poster
4
4Diffusion: Multi-view Video Diffusion Model for 4D Generation
NeurIPS 2024Poster
4
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
ICLR 2024Spotlight
4
REVO-LION: Evaluating and Refining Vision-Language Instruction Tuning Datasets
ICLR 2024withdrawn
3
Personalize Segment Anything Model with One Shot
ICLR 2024Poster
6
CO2: Efficient Distributed Training with Full Communication-Computation Overlap
ICLR 2024Spotlight
4
Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?
NeurIPS 2024Poster
4
LAVITA: Latent Video Diffusion Models with Spatio-temporal Transformers
ICLR 2024withdrawn
3
LEO: Generative Latent Image Animator for Human Video Synthesis
ICLR 2024Rejected
3
StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding
ICLR 2024Rejected
3
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge
NeurIPS 2024Poster
4
Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization
ICLR 2024Rejected
4
ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
NeurIPS 2024Poster
5
Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face
ICLR 2024withdrawn
3
Parameter-Inverted Image Pyramid Networks
NeurIPS 2024Spotlight
4
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
NeurIPS 2024Oral
4
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
ICLR 2024Poster
4
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
ICLR 2024Poster
5
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
ICLR 2024Spotlight
4
Are We on the Right Way for Evaluating Large Vision-Language Models?
NeurIPS 2024Poster
4
Learning 1D Causal Visual Representation with De-focus Attention Networks
NeurIPS 2024Poster
4
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
ICLR 2024Poster
4
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
ICLR 2024Rejected
4
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality
NeurIPS 2024Poster
4
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models
ICLR 2024Poster
4
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
NeurIPS 2024Poster
3
SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving
ICLR 2024Rejected
3
Ghost in the Minecraft: Hierarchical Agents for Minecraft via Large Language Models with Text-based Knowledge and Memory
ICLR 2024Rejected
3
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
NeurIPS 2024Poster
4
ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation
ICLR 2024Poster
4
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
NeurIPS 2024Poster
3
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
ICLR 2024Poster
4
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024Spotlight
5
Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT
NeurIPS 2024Poster
4
LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
ICLR 2024Rejected
3
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NeurIPS 2024Poster