Jun Suzuki
~Jun_Suzuki1
6
论文总数
6.0
年均投稿
平均评分
接收情况5/6
会议分布
COLM
3
ICLR
2
NeurIPS
1
发表论文 (6 篇)
20256 篇
4
Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
NeurIPS 2025Poster
3
Spike No More: Stabilizing the Pre-training of Large Language Models
ICLR 2025Rejected
3
Spike No More: Stabilizing the Pre-training of Large Language Models
COLM 2025Poster
4
Efficient Construction of Model Family through Progressive Training Using Model Expansion
COLM 2025Poster
4
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
ICLR 2025Poster
4
Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language Models
COLM 2025Poster