PaperHub

Di ZHANG

~Di_ZHANG3

26
论文总数
13.0
年均投稿
5.9
平均评分
接收情况16/26
会议分布
ICLR
17
NeurIPS
7
ICML
2

发表论文 (26 篇)

202523

2.0
4

Generate explorative goals with large language model guidance

ICLR 2025withdrawn
6.5
4

Stable Segment Anything Model

ICLR 2025Poster
7.3
4

VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models

NeurIPS 2025Poster
6.4
5

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

ICLR 2025Poster
6.8
4

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

NeurIPS 2025Poster
5.0
4

Geometric Spatiotemporal Transformer to Simulate Long-Term Physical Dynamics

ICLR 2025Rejected
9.1
4

OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers

NeurIPS 2025Spotlight
6.0
4

Motion Inversion for Video Customization

ICLR 2025Rejected
6.8
4

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

NeurIPS 2025Poster
3.5
4

Recipes for Unbiased Reward Modeling Learning: An Empirically Study

ICLR 2025withdrawn
7.5
5

Flow-GRPO: Training Flow Matching Models via Online RL

NeurIPS 2025Poster
6.0
3

Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control

ICLR 2025Poster
5.9
7

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

ICLR 2025Poster
3.5
4

DMQR-RAG: Diverse Multi-Query Rewriting in Retrieval-Augmented Generation

ICLR 2025withdrawn
5.5
4

SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance

ICLR 2025Rejected
6.8
4

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

ICLR 2025Poster
7.8
4

MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding

ICML 2025Spotlight
4.3
4

Explicit-Constrained Single Agent for Enhanced Task-Solving in LLMs

ICLR 2025withdrawn
6.0
4

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types

ICLR 2025Poster
4.5
4

Kinda-45M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content

ICLR 2025withdrawn
6.4
4

Improving Video Generation with Human Feedback

NeurIPS 2025Poster
-

EVLM: An Efficient Vision-Language Model for Visual Understanding

ICLR 2025desk_rejected
6.1
4

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

ICML 2025Poster