Zhe Gan
~Zhe_Gan1
17
论文总数
8.5
年均投稿
平均评分
接收情况11/17
会议分布
ICLR
14
COLM
2
ICML
1
发表论文 (17 篇)
202512 篇
4
Pixelated Instructions: Can Multimodal Large Language Models Follow Printed Instructions in Images?
ICLR 2025Rejected
4
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
ICLR 2025Poster
4
SlowFast-LLaVA: A strong training-free baseline for video large language models
ICLR 2025Rejected
4
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
COLM 2025Poster
4
Improve Vision Language Model Chain-of-thought Reasoning
ICLR 2025withdrawn
4
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
ICLR 2025Poster
4
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
ICLR 2025Poster
3
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
ICLR 2025Poster
4
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
ICLR 2025Poster
3
Understanding Alignment in Multimodal LLMs: A Comprehensive Study
ICLR 2025Rejected
4
Contrastive Localized Language-Image Pre-Training
ICML 2025Poster
4
Contrastive Localized Language-Image Pre-Training
ICLR 2025Rejected
20245 篇
4
Compressing LLMs: The Truth is Rarely Pure and Never Simple
ICLR 2024Poster
3
Ferret: Refer and Ground Anything Anywhere at Any Granularity
ICLR 2024Spotlight
4
Guiding Instruction-based Image Editing via Multimodal Large Language Models
ICLR 2024Spotlight
3
From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
ICLR 2024withdrawn
3
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
COLM 2024Poster