Skip to main navigation menu Skip to main content Skip to site footer

Deep Learning and NLP Methods for Unified Summarization and Structuring of Electronic Medical Records

Abstract

This study addresses the problems of redundancy in unstructured text, inconsistent formatting, and difficulty in extracting key information from electronic medical records by proposing a unified multi-task learning framework for automatic summarization and structured processing. The framework uses a shared encoder as the core and applies a multi-layer self-attention mechanism for global semantic modeling of the input text. After feature extraction, it branches into a summarization module and a structured information extraction module. The summarization branch adopts a sequence-to-sequence architecture with a coverage mechanism to improve key information coverage and generation fluency, while the structured extraction branch employs a conditional random field to model label dependencies for precise identification of medical entities such as diseases, symptoms, tests, and treatments, as well as their relationships. The two tasks are jointly optimized through a weighted loss function, leveraging semantic complementarity to enhance overall performance. Experiments conducted on the i2b2 2010 Clinical Concept Extraction Dataset show that the proposed method achieves superior results over multiple mainstream models in ROUGE-1, Entity F1, and Relation F1. Additional sensitivity analyses on hyperparameters, environmental factors, and data scale examine the effects of learning rate, encoder depth, training data size, and inference latency on model performance. The results demonstrate that the framework not only excels in precision and recall but also maintains high stability and robustness under different operational environments and data conditions, providing an effective technical solution for efficient utilization and standardized processing of electronic medical records.

pdf