Kaipeng Zhang
~Kaipeng_Zhang1
31
论文总数
15.5
年均投稿
平均评分
接收情况14/31
会议分布
ICLR
22
NeurIPS
7
ICML
2
发表论文 (31 篇)
202519 篇
4
Improving Autoregressive Image Generation by Mitigating Gradient Bias in Softmax
ICLR 2025withdrawn
4
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
ICLR 2025Oral
3
TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts
ICLR 2025Rejected
4
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression
ICLR 2025withdrawn
4
HRVMamba: High-Resolution Visual State Space Model for Dense Prediction
ICLR 2025withdrawn
5
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
ICLR 2025Rejected
4
ZipAR: Parallel Autoregressive Image Generation through Spatial Locality
ICML 2025Poster
4
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
ICLR 2025Rejected
3
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
ICLR 2025Rejected
4
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
ICML 2025Poster
4
LLaMA Decoder As Vision Transformer
ICLR 2025Rejected
4
To Think or Not To Think: A Study of Thinking in Rule-Based Visual Reinforcement Fine-Tuning
NeurIPS 2025Spotlight
4
Simple and Fast CNN for Vision
ICLR 2025Rejected
4
MatchMask: Mask-Centric Generative Data Augmentation for Label-Scarce Semantic Segmentation
ICLR 2025withdrawn
4
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
ICLR 2025Poster
4
Prioritize Alignment in Dataset Distillation
ICLR 2025Rejected
4
REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
NeurIPS 2025Poster
5
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
ICLR 2025Poster
4
Neural-Driven Image Editing
NeurIPS 2025Poster
202412 篇
5
Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face
ICLR 2024withdrawn
5
Meta-Transformer: A Unified Framework for Multimodal Learning
ICLR 2024withdrawn
3
Towards Unified and Effective Domain Generalization
ICLR 2024withdrawn
4
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
ICLR 2024Poster
4
Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching
ICLR 2024Poster
4
CTRL: Graph condensation via crafting rational trajectory matching
ICLR 2024Rejected
3
Prioritize Alignment in Dataset Distillation
NeurIPS 2024Rejected
4
Simple CNN for Vision
ICLR 2024Rejected
5
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
ICLR 2024Spotlight
3
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge
NeurIPS 2024Poster
4
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality
NeurIPS 2024Poster
5
Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT
NeurIPS 2024Poster