PaperHub

Yu Qiao

~Yu_Qiao1

80
论文总数
40.0
年均投稿
5.8
平均评分
接收情况49/80
会议分布
ICLR
56
NeurIPS
24

发表论文 (80 篇)

202534

6.3
4

An Intelligent Agentic System for Complex Image Restoration Problems

ICLR 2025Poster
6.0
5

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

ICLR 2025Rejected
7.5
4

DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes

ICLR 2025Spotlight
5.5
4

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

ICLR 2025Poster
4.8
4

Linear Attention Sequence Parallelism

ICLR 2025withdrawn
4.0
4

Aligning Anything: Hierarchical Motion Estimation for Video Frame Interpolation

ICLR 2025withdrawn
5.3
4

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

ICLR 2025withdrawn
8.0
4

REEF: Representation Encoding Fingerprints for Large Language Models

ICLR 2025Oral
4.0
4

I-Lora: Iterative Merging of Routing-Tuned Low-Rank Adapters for Multi-task Learning

ICLR 2025withdrawn
5.3
3

Derail Yourself: Multi-turn LLM Jailbreak Attack through self-discovered clues

ICLR 2025withdrawn
8.0
3

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

ICLR 2025Spotlight
4.3
4

Diffusion Transformer Policy

ICLR 2025withdrawn
6.8
4

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

NeurIPS 2025Poster
5.0
4

VEnhancer: Generative Space-Time Enhancement for Video Generation

ICLR 2025Rejected
5.8
4

Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning

ICLR 2025Poster
5.3
4

Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation

ICLR 2025Rejected
6.4
4

EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT

NeurIPS 2025Poster
4.5
4

ToMiE: Towards Modular Growth in Enhanced SMPL Skeleton for 3D Human Gaussians with Animatable Garments

ICLR 2025withdrawn
6.0
4

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings

NeurIPS 2025Poster
6.5
4

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

ICLR 2025Poster
6.0
4

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning

ICLR 2025Poster
5.3
3

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

ICLR 2025withdrawn
5.2
5

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

ICLR 2025Rejected
7.3
4

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

NeurIPS 2025Poster
3.4
5

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

ICLR 2025withdrawn
6.0
5

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

ICLR 2025Poster
7.5
4

OS-ATLAS: Foundation Action Model for Generalist GUI Agents

ICLR 2025Spotlight
5.8
5

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

ICLR 2025Poster
7.8
4

ArchCAD-400K: A Large-Scale CAD drawings Dataset and New Baseline for Panoptic Symbol Spotting

NeurIPS 2025Poster
4.0
4

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

ICLR 2025withdrawn
7.2
5

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

ICLR 2025Spotlight
4.3
4

OASIS: Open Agents Social Interaction Simulations on a Large Scale

ICLR 2025Rejected
6.3
4

DocGenome: A Large Benchmark for Multi-Modal Language Models in Real-World Academic Document Understanding

ICLR 2025Rejected
7.5
4

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

ICLR 2025Spotlight

202446

6.3
4

Reasoning Multi-Agent Behavioral Topology for Interactive Autonomous Driving

NeurIPS 2024Poster
4.0
4

CT++: Complementary Co-Training for Semi-Supervised Semantic Segmentation

ICLR 2024withdrawn
4.8
5

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration

NeurIPS 2024Poster
5.0
4

Instruct2Act: Mapping Multi-modality Instructions to Robotic Arm Actions with Large Language Model

ICLR 2024Rejected
7.0
4

Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation

ICLR 2024Rejected
7.0
4

SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution

ICLR 2024Spotlight
4.6
5

Meta-Transformer: A Unified Framework for Multimodal Learning

ICLR 2024withdrawn
5.5
4

SC3D: Self-conditioned Generative Gaussian Model with 3D-aware Feedback

NeurIPS 2024Rejected
6.3
3

LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention

ICLR 2024Poster
5.5
4

SyncVIS: Synchronized Video Instance Segmentation

NeurIPS 2024Poster
6.3
4

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models

NeurIPS 2024Poster
4.8
4

4Diffusion: Multi-view Video Diffusion Model for 4D Generation

NeurIPS 2024Poster
7.0
4

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

ICLR 2024Spotlight
3.5
4

REVO-LION: Evaluating and Refining Vision-Language Instruction Tuning Datasets

ICLR 2024withdrawn
6.7
3

Personalize Segment Anything Model with One Shot

ICLR 2024Poster
7.0
6

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

ICLR 2024Spotlight
6.0
4

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?

NeurIPS 2024Poster
4.0
4

LAVITA: Latent Video Diffusion Models with Spatio-temporal Transformers

ICLR 2024withdrawn
5.0
3

LEO: Generative Latent Image Animator for Human Video Synthesis

ICLR 2024Rejected
5.7
3

StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding

ICLR 2024Rejected
5.7
3

SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge

NeurIPS 2024Poster
6.5
4

Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization

ICLR 2024Rejected
5.3
4

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving

NeurIPS 2024Poster
5.0
5

Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face

ICLR 2024withdrawn
7.7
3

Parameter-Inverted Image Pyramid Networks

NeurIPS 2024Spotlight
7.0
4

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

NeurIPS 2024Oral
7.0
4

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

ICLR 2024Poster
5.3
4

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

ICLR 2024Poster
6.4
5

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

ICLR 2024Spotlight
6.5
4

Are We on the Right Way for Evaluating Large Vision-Language Models?

NeurIPS 2024Poster
5.3
4

Learning 1D Causal Visual Representation with De-focus Attention Networks

NeurIPS 2024Poster
5.5
4

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

ICLR 2024Poster
6.0
4

TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

ICLR 2024Rejected
5.3
4

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality

NeurIPS 2024Poster
6.3
4

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

ICLR 2024Poster
4.8
4

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

NeurIPS 2024Poster
4.3
3

SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving

ICLR 2024Rejected
4.0
3

Ghost in the Minecraft: Hierarchical Agents for Minecraft via Large Language Models with Text-based Knowledge and Memory

ICLR 2024Rejected
5.7
3

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

NeurIPS 2024Poster
5.8
4

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation

ICLR 2024Poster
6.3
4

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

NeurIPS 2024Poster
6.0
3

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

ICLR 2024Poster
7.0
4

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

ICLR 2024Spotlight
5.2
5

Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT

NeurIPS 2024Poster
5.5
4

LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models

ICLR 2024Rejected
5.3
3

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

NeurIPS 2024Poster