Attention-Only Transformers and Implementing MLPs with Attention Heads

提交: 2023-09-20更新: 2024-03-26

TL;DR

We show that MLP neurons can be implemented by masked, rank-1 attention heads, allowing one to convert an MLP-and-attention transformer into an attention-only transformer.