Di ZHANG
~Di_ZHANG3
26
论文总数
13.0
年均投稿
平均评分
接收情况16/26
会议分布
ICLR
17
NeurIPS
7
ICML
2
发表论文 (26 篇)
202523 篇
4
Generate explorative goals with large language model guidance
ICLR 2025withdrawn
4
Stable Segment Anything Model
ICLR 2025Poster
4
VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models
NeurIPS 2025Poster
5
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
ICLR 2025Poster
4
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
NeurIPS 2025Poster
4
Geometric Spatiotemporal Transformer to Simulate Long-Term Physical Dynamics
ICLR 2025Rejected
4
OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers
NeurIPS 2025Spotlight
4
Motion Inversion for Video Customization
ICLR 2025Rejected
4
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
NeurIPS 2025Poster
4
Recipes for Unbiased Reward Modeling Learning: An Empirically Study
ICLR 2025withdrawn
5
Flow-GRPO: Training Flow Matching Models via Online RL
NeurIPS 2025Poster
3
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
ICLR 2025Poster
7
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
ICLR 2025Poster
4
DMQR-RAG: Diverse Multi-Query Rewriting in Retrieval-Augmented Generation
ICLR 2025withdrawn
4
SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance
ICLR 2025Rejected
4
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
ICLR 2025Poster
4
MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding
ICML 2025Spotlight
4
Explicit-Constrained Single Agent for Enhanced Task-Solving in LLMs
ICLR 2025withdrawn
4
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
ICLR 2025Poster
4
Kinda-45M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
ICLR 2025withdrawn
4
Improving Video Generation with Human Feedback
NeurIPS 2025Poster
-
EVLM: An Efficient Vision-Language Model for Visual Understanding
ICLR 2025desk_rejected
4
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
ICML 2025Poster