Jiaqi Wang
~Jiaqi_Wang1
23
论文总数
11.5
年均投稿
平均评分
接收情况12/23
会议分布
ICLR
14
NeurIPS
7
ICML
2
发表论文 (23 篇)
202516 篇
3
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models
ICLR 2025withdrawn
5
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
ICLR 2025Poster
4
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
ICLR 2025withdrawn
4
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
NeurIPS 2025Poster
4
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
ICLR 2025Poster
4
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
ICLR 2025withdrawn
4
Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images
ICLR 2025withdrawn
4
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
ICLR 2025withdrawn
5
BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way
ICLR 2025withdrawn
4
SAM2Long: Enhancing SAM2 for Long Video Segmentation with a Training-Free Memory Tree
ICLR 2025withdrawn
5
SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation
ICLR 2025Rejected
3
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
ICML 2025Poster
4
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
ICLR 2025Rejected
5
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
NeurIPS 2025Poster
4
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
ICLR 2025Poster
4
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
ICML 2025Oral
20247 篇
5
Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials
NeurIPS 2024Poster
4
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
ICLR 2024withdrawn
3
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
NeurIPS 2024Poster
4
Streaming Long Video Understanding with Large Language Models
NeurIPS 2024Poster
4
Are We on the Right Way for Evaluating Large Vision-Language Models?
NeurIPS 2024Poster
4
MMBench: Is Your Multi-modal Model an All-around Player?
ICLR 2024Rejected
3
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NeurIPS 2024Poster