影响力指数

85.16/100

前 0.9%

全站排名 #594

发表论文41 篇

平均评分5.0

年均产出13.7 篇/年

Wenqi Shao

Researcher@Shanghai AI Laboratory·中国·OpenReview

研究方向

Multimodal Learning · Evaluation · Deep Learning · Optimization

InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models

ICLR 2026Poster

CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning

ICLR 2026Poster

Fine-grained Contrastive Learning for ECG-Report Alignment with Waveform Enhancement

ICLR 2026Rejected

Enhance-A-Video: Better Generated Video for Free

ICLR 2026Withdrawn

CPGD: Toward Stable Reinforcement Learning for Language Models

ICLR 2026Withdrawn

VTPerception-R1: Enhancing Multimodal Reasoning via Explicit Visual and Textual Perceptual Grounding

ICLR 2026Rejected

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

ICLR 2026Withdrawn

UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation

ICLR 2026Rejected

Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation

ICLR 2026Rejected

MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

ICLR 2026Withdrawn

More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration

ICLR 2026Rejected

UniPruning: Unifying Local Metric and Global Feedback for Scalable Sparse LLMs

ICLR 2026Withdrawn

CoSMo-RL: Towards Trustworthy LMRMs via Joint Safety and Stability

ICLR 2026Rejected

TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception

NeurIPS 2025Spotlight

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

ICLR 2025Spotlight

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

NeurIPS 2025Poster

EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents

ICLR 2025Poster

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

ICML 2025Poster

LLaMA Decoder As Vision Transformer

ICLR 2025Rejected

SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement

ICLR 2025Poster

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

ICLR 2025Poster

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

ICLR 2025Rejected

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

ICLR 2025Rejected

TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts

ICLR 2025Rejected

HRVMamba: High-Resolution Visual State Space Model for Dense Prediction

ICLR 2025Withdrawn

Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing

ICLR 2025Withdrawn

ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression

ICLR 2025Withdrawn

MatchMask: Mask-Centric Generative Data Augmentation for Label-Scarce Semantic Segmentation

ICLR 2025Withdrawn

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

ICLR 2025Rejected

PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

ICLR 2025Rejected

合作者 (20)