Jifeng Dai
~Jifeng_Dai1
25
论文总数
12.5
年均投稿
平均评分
接收情况20/25
会议分布
ICLR
12
NeurIPS
11
ICML
2
发表论文 (25 篇)
202514 篇
4
big.LITTLE Vision Transformer for Efficient Visual Recognition
ICLR 2025withdrawn
4
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
ICLR 2025withdrawn
4
CoMemo: LVLMs Need Image Context with Image Memory
ICML 2025Poster
4
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
ICLR 2025Poster
3
MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost
ICML 2025Poster
4
Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings
NeurIPS 2025Poster
4
Diffusion Transformer Policy
ICLR 2025withdrawn
4
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis
NeurIPS 2025Poster
4
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
ICLR 2025withdrawn
3
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
ICLR 2025Spotlight
5
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
ICLR 2025Poster
3
GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing
NeurIPS 2025Poster
3
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints
NeurIPS 2025Poster
4
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025Spotlight
202411 篇
4
ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
ICLR 2024Poster
4
Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments
ICLR 2024Spotlight
4
DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model
NeurIPS 2024Poster
4
CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics
NeurIPS 2024Spotlight
3
Parameter-Inverted Image Pyramid Networks
NeurIPS 2024Spotlight
4
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
NeurIPS 2024Poster
4
Learning 1D Causal Visual Representation with De-focus Attention Networks
NeurIPS 2024Poster
3
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
ICLR 2024Poster
3
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
NeurIPS 2024Poster
3
Ghost in the Minecraft: Hierarchical Agents for Minecraft via Large Language Models with Text-based Knowledge and Memory
ICLR 2024Rejected
3
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NeurIPS 2024Poster