影响力指数

70.89/100

前 2.5%

全站排名 #1,592

发表论文23 篇

平均评分5.3

年均产出11.5 篇/年

Chaoyou Fu

Assistant Professor@Nanjing University·中国·OpenReview

研究方向

Multimodality · Face Recognition · Image Synthesis

6.5

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

ICLR 2026Poster

6.0

Thyme: Think Beyond Images

ICLR 2026Poster

5.5

VITA-E: A Dual-Model Framework for Real-Time, Interruptible, and Concurrent Human-Robot Interaction

ICLR 2026Rejected

二作

5.5

BaseReward: A Strong Baseline for Multimodal Reward Model

ICLR 2026Poster

5.5

Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models

ICLR 2026Poster

5.0

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

ICLR 2026Poster

三作

4.7

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

ICLR 2026Withdrawn

4.5

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

ICLR 2026Rejected

4.5

CUARewardBench: Benchmark for Evaluating Reward Models on Computer-using Agent Trajectories

ICLR 2026Rejected

4.0

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation

ICLR 2026Withdrawn

二作

4.0

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

ICLR 2026Withdrawn

3.5

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

ICLR 2025Poster

6.5

Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

ICLR 2025Poster

6.4

VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model

NeurIPS 2025Poster

三作

6.4

Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs

NeurIPS 2025Poster

6.1

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

ICML 2025Poster

5.5

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

ICML 2025Poster

5.0

MME-FINANCE: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning

ICLR 2025Withdrawn

4.0

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

合作者 (20)

Chaoyou Fu

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Thyme: Think Beyond Images

VITA-E: A Dual-Model Framework for Real-Time, Interruptible, and Concurrent Human-Robot Interaction

BaseReward: A Strong Baseline for Multimodal Reward Model

Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

CUARewardBench: Benchmark for Evaluating Reward Models on Computer-using Agent Trajectories

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity

HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model

Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

MME-FINANCE: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM