暂无评分数据
ICLR 2024
Attention-Only Transformers and Implementing MLPs with Attention Heads
TL;DR
We show that MLP neurons can be implemented by masked, rank-1 attention heads, allowing one to convert an MLP-and-attention transformer into an attention-only transformer.
摘要
关键词
transformerneural networkarchitectureattention
评审与讨论
暂无评审记录