暂无评分数据
ICLR 2025
VeLAR: Vision-oriEnted Language-Attentive token Reduction for multimodal large language models
TL;DR
We propose a token reduction framework for MLLMs that reduces vision token redundancy in vision-language learning, cutting computational costs by up to 42% while maintaining and even surpassing the original model performance.
摘要
关键词
Multi-modal Large Language ModelsToken ReductionModel AcelerationFoundation ModelsVision-Language LearningInstruction Tuning
评审与讨论
作者撤稿通知
I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.