Hang Xu
~Hang_Xu1
20
论文总数
10.0
年均投稿
平均评分
接收情况13/20
会议分布
ICLR
13
NeurIPS
7
发表论文 (20 篇)
20259 篇
4
Towards Unified Multimodal Interleaved Generation via Group Relative Policy Optimization
NeurIPS 2025Poster
4
ACT-IN-LLM: Adaptively Compression Vision Tokens in LLM for High-Resolution Multimodal Large Language Models
ICLR 2025Rejected
4
FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise
ICLR 2025Poster
4
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
ICLR 2025withdrawn
4
INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning
NeurIPS 2025Poster
5
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
ICLR 2025Poster
4
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
ICLR 2025Poster
4
4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
NeurIPS 2025Poster
4
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
ICLR 2025withdrawn
202411 篇
3
TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields
ICLR 2024Poster
-
RealignDiff: Boosting text-to-image diffusion model with coarse-to-fine semantic re-alignment
ICLR 2024Rejected
5
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation
NeurIPS 2024Poster
4
SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM
NeurIPS 2024Poster
4
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
NeurIPS 2024Poster
4
Geom-Erasing: Geometry-Driven Removal of Implicit Concept in Diffusion Models
ICLR 2024withdrawn
4
Ins-DetCLIP: Aligning Detection Model to Follow Human-Language Instruction
ICLR 2024Poster
3
UNIT: Unifying Image and Text Recognition in One Vision Encoder
NeurIPS 2024Poster
3
LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
ICLR 2024withdrawn
4
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
ICLR 2024Poster
4
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
ICLR 2024withdrawn