Sham M. Kakade
~Sham_M._Kakade1
29
论文总数
14.5
年均投稿
平均评分
接收情况22/29
会议分布
ICLR
17
NeurIPS
7
ICML
3
COLM
2
发表论文 (29 篇)
202518 篇
3
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
COLM 2025Poster
4
Universal Length Generalization with Turing Programs
ICML 2025Poster
4
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
ICLR 2025Poster
3
The Role of Sparsity for Length Generalization in LLMs
ICML 2025Poster
4
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
ICML 2025Oral
4
Interpreting the linear structure of vision-language model embedding spaces
COLM 2025Poster
4
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
ICLR 2025Oral
4
Soup to go: mitigating forgetting during continual learning with model averaging
ICLR 2025Rejected
6
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
ICLR 2025Poster
4
Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
ICLR 2025Rejected
4
Deconstructing What Makes a Good Optimizer for Autoregressive Language Models
ICLR 2025Poster
4
A New Perspective on Shampoo's Preconditioner
ICLR 2025Poster
3
Universal length generalization with Turing Programs
ICLR 2025Rejected
5
Eliminating Position Bias of Language Models: A Mechanistic Approach
ICLR 2025Poster
4
SOAP: Improving and Stabilizing Shampoo using Adam for Language Modeling
ICLR 2025Poster
5
How Does Critical Batch Size Scale in Pre-training?
ICLR 2025Poster
4
EvoLM: In Search of Lost Language Model Training Dynamics
NeurIPS 2025Oral
4
Mixture of Parrots: Experts improve memorization more than reasoning
ICLR 2025Poster
202411 篇
5
Scaling Laws in Linear Regression: Compute, Parameters, and Data
NeurIPS 2024Poster
4
Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent
NeurIPS 2024Poster
4
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training
NeurIPS 2024Poster
4
Feature emergence via margin maximization: case studies in algebraic tasks
ICLR 2024Spotlight
3
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
ICLR 2024withdrawn
5
Scaling Laws for Imitation Learning in Single-Agent Games
ICLR 2024Rejected
4
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
NeurIPS 2024Poster
3
Transcendence: Generative Models Can Outperform The Experts That Train Them
NeurIPS 2024Poster
3
Learning an Inventory Control Policy with General Inventory Arrival Dynamics
ICLR 2024Rejected
4
MatFormer: Nested Transformer for Elastic Inference
NeurIPS 2024Poster
4
MatFormer: Nested Transformer for Elastic Inference
ICLR 2024Rejected