Joel Hestness
~Joel_Hestness2
8
论文总数
4.0
年均投稿
平均评分
接收情况6/8
会议分布
NeurIPS
4
ICLR
3
COLM
1
发表论文 (8 篇)
20254 篇
-
BLIMEY: Towards Better Routing Methods in Sparse Mixture of Experts
ICLR 2025withdrawn
5
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
NeurIPS 2025Poster
3
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
ICLR 2025Poster
4
Don't be lazy: CompleteP enables compute-efficient deep transformers
NeurIPS 2025Poster
20244 篇
4
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
NeurIPS 2024Poster
3
On the Relation between Gradient Directions and Systematic Generalization
ICLR 2024withdrawn
4
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
NeurIPS 2024Poster
3
Crystal: Illuminating LLM Abilities on Language and Code
COLM 2024Poster