Skip to main navigation menu Skip to main content Skip to site footer

Context-Aligned and Evidence-Based Detection of Hallucinations in Large Language Model Outputs

Abstract

This paper addresses the issues of semantic drift and hallucinated information commonly found in the outputs of large language models. It proposes a detection framework based on fine-grained analysis. The framework consists of two main components: a Context-Aligned Representation module and a Layered Verification of Evidence module. The first module builds semantic alignment between the input and the generated text. It effectively identifies contextual shifts and logical inconsistencies. The second module segments the generated content into multiple semantic units and performs layer-by-layer verification using external knowledge. This enables the localization and modeling of potential hallucinated content. The model adopts a shared representation learning structure. It maintains strong semantic consistency modeling while improving the detection of implicit hallucinations in complex reasoning tasks. Systematic experiments on the TruthfulQA dataset show that the proposed method significantly outperforms existing mainstream detection models in terms of precision, recall, F1-score, and fact consistency. It demonstrates strong fine-grained awareness and cross-task stability. In addition, transfer evaluations in both open-domain and closed-domain scenarios, along with extended experiments under different domains and knowledge source conditions, further confirm the adaptability and practicality of the method in real-world generation settings. This approach provides solid technical support for improving the trustworthiness and safety of language model outputs. It also offers a structured solution for intelligent review of generated content.

pdf