Sparse Retrieval and Deep Language Modeling for Robust Fact Verification in Financial Texts
Abstract
This study addresses issues such as factual inconsistency, semantic ambiguity, and knowledge gaps in financial texts by proposing a financial fact verification method that combines sparse retrieval mechanisms with large language models. The approach consists of three main modules: input encoding, evidence retrieval, and fusion-based reasoning. It aims to achieve close alignment between statement understanding and external evidence matching. By introducing a sparse retrieval mechanism, the model extracts the most relevant supporting evidence from a constructed financial knowledge base, reducing overreliance on embedded knowledge during generation. The fusion reasoning module then jointly models the input statement and multiple evidence passages to accurately classify labels into support, refute, or not enough information. To validate the effectiveness of the method, the study conducts various perturbation experiments, including changes in input length, learning rate settings, data distribution shifts, and noisy evidence injection. A comprehensive analysis is performed in terms of accuracy, macro-average F1 score, and model robustness. Experimental results show that the proposed method demonstrates strong generalization and stability across different risk scenarios, with notable advantages in handling dynamic financial events and multi-evidence cross-sentence reasoning. This research advances the practical application of fact verification in financial text processing and provides methodological support for building structured and high-confidence financial language understanding systems.