影响力指数

97.8/100

前 0.1%

全站排名 #63

发表论文39 篇

平均评分6.4

年均产出13.0 篇/年

Bryan Catanzaro

Vice President@NVIDIA·美国·OpenReview

研究方向

speech recognition · deep learning · machine learning systems

Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset

ICLR 2026Poster

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy

ICLR 2026Poster

Music Flamingo: Scaling Music Understanding in Audio Language Models

ICLR 2026Poster

UALM: Unified Audio Language Model for Understanding, Generation and Reasoning

RLP: Reinforcement as a Pretraining Objective

ICLR 2026Poster

Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning

ICLR 2026Poster

Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data

ICLR 2026Poster

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

ICLR 2026Poster

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

ICLR 2026Poster

FusionFactory: Fusing LLM Capabilities with Multi-LLM Log Data

ICLR 2026Withdrawn

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

NeurIPS 2025Spotlight

Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning

NeurIPS 2025Spotlight

Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning

NeurIPS 2025Poster

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

ICML 2025Poster

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

ICLR 2025Spotlight

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

NeurIPS 2025Poster

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

ICLR 2025Spotlight

ETTA: Elucidating the Design Space of Text-to-Audio Models

ICML 2025Poster

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

NeurIPS 2025Poster

Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

ICLR 2025Poster

Fugatto 1: Foundational Generative Audio Transformer Opus 1

ICLR 2025Poster

FeatSharp: Your Vision Model Features, Sharper

ICML 2025Poster

MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS

ICLR 2025Poster

Elucidating the Design Space of Text-to-Audio Models

ICLR 2025Rejected

OMCAT: Omni Context Aware Transformer

ICLR 2025Rejected

UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation

ICLR 2025Poster

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs

ICLR 2025Poster

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

ICLR 2025Poster

A$^2$-Flow: Alignment-Aware Pre-training for Speech Synthesis with Flow Matching

ICLR 2025Rejected

PHI-S: Distribution Balancing for Agglomerative Models

ICLR 2025Rejected

LLM Pruning and Distillation in Practice

ICLR 2025Rejected

Nemotron-CORTEXA: Enhancing LLM Agents for Software Engineering Tasks via Improved Localization and Solution Diversity

ICML 2025Poster

合作者 (20)

Mohammad Shoeybi

Mostofa Patwary