影响力指数

48.81/100

前 9.2%

全站排名 #5,909

发表论文12 篇

平均评分5.0

年均产出4.0 篇/年

Can Huang

Researcher@Bytedance·中国·OpenReview

研究方向

Deep learning · OCR · LLM · VLM · Document understanding

ChineseVideoBench: Benchmarking Multi-modal Large Models for Chinese Video Question Answering

ICLR 2026Rejected

Vision as LoRA

ICLR 2026Withdrawn

GLOMA: Global Video Text Spotting with Morphological Association

ICLR 2025Poster

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

ICLR 2025Rejected

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

ICLR 2025Withdrawn

Video Q-Former: Multimodal Large Language Model with Spatio-Temporal Querying Transformer Towards Video Understanding

ICLR 2025Rejected

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

ICLR 2025Withdrawn

MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark

ICLR 2025Rejected

合作者 (20)