Paper
Hub
搜索
Toggle language
Shane Bergsma
~Shane_Bergsma1
5
论文总数
2.5
年均投稿
6.3
平均评分
接收情况
5
/
5
会议分布
NeurIPS
4
ICLR
1
发表论文 (5 篇)
2025
3 篇
6.8
5
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
NeurIPS 2025
Poster
6.3
3
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
ICLR 2025
Poster
7.8
4
Don't be lazy: CompleteP enables compute-efficient deep transformers
NeurIPS 2025
Poster
2024
2 篇
5.3
4
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
NeurIPS 2024
Poster
5.5
4
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
NeurIPS 2024
Poster
合作者 (12)
JH
Joel Hestness
5 篇
ND
Nolan Simran Dey
4 篇
GG
Gavia Gray
3 篇
DS
Daria Soboleva
2 篇
GG
Gurpreet Gosal
2 篇
BZ
Bin Claire Zhang
1 篇
BB
Blake Bordelon
1 篇
BH
Boris Hanin
1 篇
查看全部 12 位合作者