暂无评分数据
ICLR 2025
Language Models' Internal Conflicts: Layer-wise Usable Information For Detecting Model (Un)answerability
摘要
We propose a new framework for detecting unanswerable questions. Large language models often sound overly convincing when providing inaccurate answers. We explore how language models behave when required to answer questions without relevant information in the provided context, a situation that is likely to result in hallucination. We state that unanswerable questions represent a deficiency of $\mathcal{V}$-usable information across the layers of a pre-trained language model $\mathcal{V}$. To capture this, we propose a layer-wise usable information ($\mathcal{L}$I), tracking how much information is usable by language models within their mechanism of updating layers. We empirically argue that information is not always monotonically gained or lost across layers, and hence tracking all the layers within the language models is far more informative than embracing the final layer as the complete form of computation.
Our method does not require label annotations to fine-tune classifiers or modify model architectures. Hence it is computationally feasible to universal large language models.
关键词
UncertaintyQuestion Answering
评审与讨论
作者撤稿通知
I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.