Robust Text Semantic Classification via Retrieval-Augmented Generation
Abstract
This study proposes a retrieval-augmented optimization algorithm based on semantic encoding and robust calibration to address the problems of semantic inconsistency and feature perturbation amplification in retrieval-augmented generation models under high-noise contexts. The method introduces a dual-layer semantic alignment and multi-scale retrieval filtering mechanism within a unified generative framework to achieve joint optimization of text representation, contextual retrieval, and classification. First, the semantic encoding module extracts contextual dependencies through a hierarchical embedding structure, ensuring global consistency in the feature space. Second, the retrieval-augmentation module filters and reweights irrelevant passages under dynamic attention guidance, thereby reducing the impact of external noise on semantic representations. Then, distribution calibration and parameter regularization are applied to decouple the generative and classification spaces, improving model stability and generalization. Robustness tests conducted under multiple noise injection and environmental perturbation settings show that the proposed model outperforms baseline methods across five key metrics-Accuracy, Macro-F1, Parameter Efficiency, Inference Latency, and Task Conflict. The model maintains stable semantic discrimination under complex conditions such as retrieval noise, feature redundancy, and semantic drift. These results validate the effective collaboration between semantic structure modeling and retrieval augmentation, providing a new methodological foundation for robust applications of retrieval-augmented generation algorithms in complex text understanding and semantic reasoning tasks.