影响力指数

73.17/100

前 2.1%

全站排名 #1,382

发表论文31 篇

平均评分4.8

年均产出10.3 篇/年

Mohamed Elhoseiny

Associate Professor@King Abdullah University of Science and Technology·沙特阿拉伯·OpenReview

研究方向

Computer Vision · Machine Learning · AI

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in

ICLR 2026Rejected

Neural Catalog: Scaling Species Recognition with Catalog of Life–Augmented Generation

ICLR 2026Rejected

Look&Learn: Where to Look? Bridging Perception and Grounding Gap in Vision-Language Models

ICLR 2026Rejected

$1+1<1$? Breaking the Standalone Barrier in Federated Fine-Tuning of Multimodal Large Language Models under Non-IID Data

ICLR 2026Rejected

AutoDavis: Automatic and Dynamic Evaluation Protocol of Large Vision-Language Models on Visual Question-Answering

ICLR 2026Rejected

Time Blindness: Why Video-Language Models Can’t See What Humans Can?

ICLR 2026Withdrawn

EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making

ICLR 2026Withdrawn

Cross-Reflect: Empowering Multi-Modal Agents with Joint Reasoning Across Trajectories

ICLR 2026Rejected

FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology.

ICLR 2026Rejected

AVSU-Bench and VSpeech-R1: A Dataset and MLLM for Audio-Visual Speech Understanding

ICLR 2026Withdrawn

ReefNet: A Large-Scale, Taxonomically Enriched Dataset and Benchmark for Hard Coral Classification

ICLR 2026Rejected

Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding

NeurIPS 2025Spotlight

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

ICLR 2025Spotlight

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

NeurIPS 2025Poster

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

ICML 2025Poster

ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge

ICLR 2025Poster

InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding

ICLR 2025Rejected

Query-based Knowledge Transfer for Heterogeneous Learning Environments

ICLR 2025Poster

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

ICLR 2025Rejected

ReferPix2Pix: Guiding Multi-Modal LLMs for Image Editing with Referential Pixel Grounding

ICLR 2025Withdrawn

StoryGPT-V: Large Language Models as Consistent Story Visualizers

ICLR 2025Withdrawn

HuMouS: Human Motion Synthesis with Fine-Grained Control using Latent Space Manipulation of Cycle-Consistent Diffusion Models

ICLR 2025Rejected

AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?

ICLR 2025Withdrawn

iMotion-LLM: Motion Prediction Instruction Tuning

ICLR 2025Withdrawn

合作者 (20)

Eslam Mohamed BAKR

Faizan Farooq Khan

Abdulwahab Felemban