PaperHub

暂无评分数据

ICLR 2025

VeLAR: Vision-oriEnted Language-Attentive token Reduction for multimodal large language models

OpenReviewPDF
提交: 2024-09-26更新: 2024-10-11
TL;DR

We propose a token reduction framework for MLLMs that reduces vision token redundancy in vision-language learning, cutting computational costs by up to 42% while maintaining and even surpassing the original model performance.

摘要

关键词
Multi-modal Large Language ModelsToken ReductionModel AcelerationFoundation ModelsVision-Language LearningInstruction Tuning

评审与讨论

撤稿通知

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.