暂无评分数据
ICLR 2024
Delayed Generalization: Bridging Double Descent and Grokking
TL;DR
We argue that grokking and double descent are better understood as similar instances of a broader phenomenon that we call \emph{Staggered Learning}.
摘要
关键词
double descentgrokkingscience of deep learningempirical theory of deep learninggeneralizationoverfittingdelayed generalizationfeature learningpattern learningrepresentation learning
评审与讨论
暂无评审记录