影响力指数

94.74/100

前 0.3%

全站排名 #178

发表论文58 篇

平均评分5.3

年均产出19.3 篇/年

Shanghang Zhang

Assistant Professor@Peking University·中国·OpenReview

研究方向

deep learning on graph · Machine Learning · Robotics · Deep Learning · Computer Vision · zero shot learning · domain adaptation

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

ICLR 2026Poster

SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System

ICLR 2026Poster

BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models

ICLR 2026Poster

XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations

ICLR 2026Rejected

HumanoidVerse: A Versatile Humanoid for Vision-Language Guided Multi-Object Rearrangement

ICLR 2026Rejected

WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation

ICLR 2026Rejected

SpikeGen: Decoupled “Rods and Cones” Visual Representation Processing with Latent Generative Framework

ICLR 2026Poster

From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance

ICLR 2026Poster

ZoomV: Temporal Zoom-in for Efficient Long Video Understanding

ICLR 2026Rejected

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

ICLR 2026Rejected

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model

ICLR 2026Poster

OmniSAT: Compact Action Token, Faster Auto Regression

ICLR 2026Withdrawn

dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought

ICLR 2026Rejected

WoW: Scaling Embodied Omni-World Model For Generalizable Manipulation Simulation

ICLR 2026Rejected

Robobench: A Comprehensive Evaluation Benchmark For Multimodal Large Language Models as Embodied Brain

ICLR 2026Rejected

FractalFold: Towards Fractal Structure Modeling for Hierarchical Inverse Protein Folding

ICLR 2026Withdrawn

Can World Models Benefit VLMs for World Dynamics?

ICLR 2026Withdrawn

PoseDiff: A Unified Diffusion Model Bridging Robot Pose Estimation and Video-to-Action Control

ICLR 2026Withdrawn

T-REX: Mixture-of-Rank-One-Experts with Semantic-aware Intuition for Multi-task Large Language Model Finetuning

ICLR 2026Desk Rejected

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

NeurIPS 2025Poster

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

NeurIPS 2025Poster

Co$^{\mathbf{3}}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion

ICLR 2025Spotlight

Orochi: Versatile Biomedical Image Processor

NeurIPS 2025Spotlight

Fast-in-Slow: A Dual-System VLA Model Unifying Fast Manipulation within Slow Reasoning

NeurIPS 2025Poster

URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model

NeurIPS 2025Spotlight

SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents

NeurIPS 2025Poster

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

ICLR 2025Poster

Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models

NeurIPS 2025Poster

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

NeurIPS 2025Poster

Uni-Map: Unified Camera-LiDAR Perception for Robust HD Map Construction

ICLR 2025Rejected

EmpathyRobot: A Dataset and Benchmark for Empathetic Task Planning of Robotic Agent

ICLR 2025Rejected

HybridVLA: Collaborative Autoregression and Diffusion in a Unified Vision-Language-Action Model

NeurIPS 2025Rejected

EVA: An Embodied World Model for Future Video Anticipation

ICLR 2025Rejected

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

ICLR 2025Poster

OmniArch: Building Foundation Model for Scientific Computing

ICML 2025Poster

SAN: Hypothesizing Long-Term Synaptic Development and Neural Engram Mechanism in Scalable Model's Parameter-Efficient Fine-Tuning

ICML 2025Poster

Empowering World Models with Reflection for Embodied Video Prediction

ICML 2025Poster

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

ICML 2025Poster

CrayonRobo: Toward Generic Robot Manipulation via Crayon Visual Prompting

ICLR 2025Withdrawn

SparseVLM: Visual Token Sparsification for Efficient Vision Language Models Inference

ICLR 2025Rejected

Self-Corrected Multimodal Large Language Model for Robot Manipulation and Reflection

ICLR 2025Withdrawn

HyperAdapter: Generating Adapters for Pre-Trained Model-Based Continual Learning

ICLR 2025Withdrawn

Discovering Long-Term Effects on Parameter Efficient Fine-tuning

ICLR 2025Rejected

Frequency-Decoupled Cross-Modal Knowledge Distillation

ICLR 2025Withdrawn

ViML: A Video, Music, Language Unified Dataset for Understanding and Generation

ICLR 2025Withdrawn

$CoCoGesture$: Towards Coherent Co-speech 3D Gesture Generation in the Wild

ICLR 2025Withdrawn

PINNsAgent: Automated PDE Surrogation with Large Language Models

ICML 2025Poster

合作者 (20)