Kaiyue Wen
~Kaiyue_Wen1
8
论文总数
8.0
年均投稿
平均评分
接收情况8/8
会议分布
ICLR
3
ICML
2
NeurIPS
2
COLM
1
发表论文 (8 篇)
20258 篇
4
RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval
ICLR 2025Poster
4
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
ICLR 2025Poster
5
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View
ICLR 2025Poster
4
Weight ensembling improves reasoning in language models
COLM 2025Poster
4
Task Generalization with Autoregressive Compositional Structure: Can Learning from $D$ Tasks Generalize to $D^T$ Tasks?
ICML 2025Poster
4
Overtrained Language Models Are Harder to Fine-Tune
ICML 2025Poster
4
PaTH Attention: Position Encoding via Accumulating Householder Transformations
NeurIPS 2025Poster
4
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
NeurIPS 2025Oral