影响力指数

96.87/100

前 0.2%

全站排名 #101

发表论文45 篇

平均评分5.7

年均产出15.0 篇/年

Sham M. Kakade

Full Professor@Harvard University·美国·OpenReview

研究方向

Optimization · Machine Learning

Cognitive models can reveal interpretable value trade-offs in language models

ICLR 2026Poster

The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton

ICLR 2026Poster

Any-Order Flexible Length Masked Diffusion

ICLR 2026Poster

Fine-Tuning Masked Diffusion for Provable Self-Correction

ICLR 2026Rejected

Seesaw: Accelerating Training by Balancing Batch Size and Learning Rate Scheduling

ICLR 2026Poster

Adam or Gauss-Newton? — A Comparative Study In Terms of Basis Alignment and SGD Noise

ICLR 2026Rejected

In Good GRACES: Principled Teacher Selection for Knowledge Distillation

ICLR 2026Poster

Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning

ICLR 2026Rejected

Parameter-Efficient Reinforcement Learning using Prefix Optimization

ICLR 2026Poster

The Emergence of Complex Behavior in Large-Scale Ecological Environments

ICLR 2026Rejected

LOTION: Smoothing the Optimization Landscape for Quantized Training

ICLR 2026Rejected

Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants

ICLR 2026Rejected

Understanding the Design Space and Cross-Modality Transfer for Vision-Language Models

ICLR 2026Rejected

A Mechanistic Analysis of Low-Precision Instabilities in Microscaling Formats

ICLR 2026Withdrawn

Random Scaling of Emergent Capabilities

ICLR 2026Rejected

Selective Underfitting in Diffusion Models

ICLR 2026Rejected

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions

EvoLM: In Search of Lost Language Model Training Dynamics

NeurIPS 2025Oral

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

COLM 2025Poster

The Role of Sparsity for Length Generalization in LLMs

ICML 2025Poster

Interpreting the linear structure of vision-language model embedding spaces

COLM 2025Poster

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Mixture of Parrots: Experts improve memorization more than reasoning

ICLR 2025Poster

Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond

ICLR 2025Poster

How Does Critical Batch Size Scale in Pre-training?

ICLR 2025Poster

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

ICLR 2025Poster

Eliminating Position Bias of Language Models: A Mechanistic Approach

ICLR 2025Poster

A New Perspective on Shampoo's Preconditioner

ICLR 2025Poster

SOAP: Improving and Stabilizing Shampoo using Adam for Language Modeling

ICLR 2025Poster

Deconstructing What Makes a Good Optimizer for Autoregressive Language Models

ICLR 2025Poster

Universal Length Generalization with Turing Programs

ICML 2025Poster

Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

ICLR 2025Rejected

Universal length generalization with Turing Programs

ICLR 2025Rejected

Soup to go: mitigating forgetting during continual learning with model averaging

ICLR 2025Rejected

合作者 (20)

David Brandfonbrener