影响力指数

93.24/100

前 0.4%

全站排名 #240

发表论文46 篇

平均评分5.6

年均产出15.3 篇/年

Jan Kautz

VP Research@NVIDIA·美国·OpenReview

研究方向

Vision Language Models · Efficient AI · Digital Humans · Perception · Generative AI

6.5

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge

ICLR 2026Poster

5.6

RLP: Reinforcement as a Pretraining Objective

ICLR 2026Poster

5.5

3D Aware Region Prompted Vision Language Model

ICLR 2026Poster

5.3

Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning

ICLR 2026Poster

5.0

DLER: Doing Length pEnalty Right — Incentivizing More Intelligence per Token via Reinforcement Learning

ICLR 2026Rejected

5.0

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

ICLR 2026Rejected

5.0

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

ICLR 2026Rejected

5.0

TinyEye: Sharpening Visual Reasoning of Tiny Models with Offline Policy Optimization

ICLR 2026Rejected

3.0

iGRPO: Self‑Feedback–Driven LLM Reasoning

ICLR 2026Rejected

通讯

8.2

Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning

NeurIPS 2025Poster

7.5

Hymba: A Hybrid-head Architecture for Small Language Models

ICLR 2025Spotlight

7.3

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

NeurIPS 2025Poster

7.3

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

NeurIPS 2025Poster

7.2

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

ICLR 2025Spotlight

7.0

Gated Delta Networks: Improving Mamba2 with Delta Rule

ICLR 2025Poster

二作

6.8

Scaling RL to Long Videos

NeurIPS 2025Poster

6.8

GSPN-2: Efficient Parallel Sequence Modeling

NeurIPS 2025Poster

6.8

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

NeurIPS 2025Poster

6.8

LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement

ICLR 2025Poster

6.7

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

ICLR 2025Poster

6.5

LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing

ICLR 2025Poster

6.0

Minifinetuning: Low-Data Generation Domain Adaptation through Corrective Self-Distillation

ICLR 2025Rejected

通讯

5.5

NaVILA: Legged Robot Vision-Language-Action Model for Navigation

ICLR 2025Rejected

5.3

PHI-S: Distribution Balancing for Agglomerative Models

ICLR 2025Rejected

5.0

LLM Pruning and Distillation in Practice

ICLR 2025Rejected

5.0

ZoomVLM: A Tuning-Free Framework for Efficient Video Understanding via Adaptive Zooming in Vision-Language Models

ICLR 2025Rejected

4.9

LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models

ICML 2025Poster

4.8

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

ICLR 2025Rejected

4.8

X-VILA: Cross-Modality Alignment for Large Language Models

ICLR 2025Withdrawn

4.8

Wolf: Accurate Video Captioning with a World Summarization Framework

ICLR 2025Withdrawn

4.5

VILA^2: VLM Augmented VLM with Self-Improvement

ICLR 2025Withdrawn

4.0

Factorized Implicit Global Convolution for Automotive Computational Fluid Dynamics Prediction

ICLR 2025Rejected

4.0

UNAST: Unified framework for Neural Architecture Search for Transformers

合作者 (20)

Jan Kautz

ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge

RLP: Reinforcement as a Pretraining Objective

3D Aware Region Prompted Vision Language Model

Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning

DLER: Doing Length pEnalty Right — Incentivizing More Intelligence per Token via Reinforcement Learning

Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Compact GSPN: Scaling Spatial Propagation to Vision Foundation Models

BroRL: Scaling Reinforcement Learning via Broadened Exploration

TinyEye: Sharpening Visual Reasoning of Tiny Models with Offline Policy Optimization

iGRPO: Self‑Feedback–Driven LLM Reasoning

Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning

Hymba: A Hybrid-head Architecture for Small Language Models

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Gated Delta Networks: Improving Mamba2 with Delta Rule

Scaling RL to Long Videos

GSPN-2: Efficient Parallel Sequence Modeling

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing

Minifinetuning: Low-Data Generation Domain Adaptation through Corrective Self-Distillation

NaVILA: Legged Robot Vision-Language-Action Model for Navigation

PHI-S: Distribution Balancing for Agglomerative Models

LLM Pruning and Distillation in Practice

ZoomVLM: A Tuning-Free Framework for Efficient Video Understanding via Adaptive Zooming in Vision-Language Models

LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

X-VILA: Cross-Modality Alignment for Large Language Models

Wolf: Accurate Video Captioning with a World Summarization Framework

VILA^2: VLM Augmented VLM with Self-Improvement

Factorized Implicit Global Convolution for Automotive Computational Fluid Dynamics Prediction

UNAST: Unified framework for Neural Architecture Search for Transformers