暂无评分数据
ICLR 2024
Quasi-Recurrent Gist Attention: Efficiently Modeling Long Context in Large Language Model
摘要
Transformer-based Large Language Models (LLMs) have achieved state-of-the-art on numerous Natural Language Processing tasks. However, LLMs typically come with a predetermined context window size. This limitation, combined with the quadratic complexity of self-attention, makes pretrained LLMs struggle with long sequences. In this work, we introduce a quasi-recurrent gist attention mechanism designed to effectively capture long contextual information within LLMs. The proposed approach employs quasi-recurrent context compression techniques to iteratively integrate historical context details into the gist representation. The quasi-recurrent gist attention reduces the computation complexity from $O(n^2)$ by full-attention to $O(n)$ with no change of the original Transformer model architecture, which enables seamless fine-tuning from pretrained language models such as Llama \cite{touvron2023llama} and facilitates the natural extension of the context window. Experimental results indicate that the proposed attention mechanism yields better performance to the full-attention approach on multiple public benchmarks, while reducing the latency for modeling long context significantly.
关键词
Large Language Modelsgist attentionlong contextual information
评审与讨论
暂无评审记录