Skip to main navigation menu Skip to main content Skip to site footer

Precision Recognition of Irregular Scene Text Leveraging Advanced Attention Mechanisms

Abstract

Scene text recognition has emerged as a crucial research area due to its wide-ranging applications in domains such as traffic sign recognition, driverless cars, and product packaging. Recognizing irregular scene text—characterized by curved, distorted, or low-resolution features—presents significant challenges for existing methods. This paper introduces a novel Multi-Scale Feature Fusion Attention Recognition Network (MSFARN) to address these challenges effectively. MSFARN comprises two core components: a Multi-Scale Feature Fusion Network (MSFN) for text rectification and an Attention-Based Recognition Network (ARN) for precise text recognition. MSFN applies multi-scale feature extraction and fusion to rectify irregular text images, enhancing readability. ARN combines channel and spatial attention mechanisms to focus on critical feature regions, ensuring robust recognition performance. Extensive experiments conducted on multiple datasets, including IIIT5K, ICDAR2003, ICDAR2013, ICDAR2015, SVT-Perspective, and CUTE80, validate the framework’s superiority. The results demonstrate MSFARN's ability to achieve state-of-the-art recognition accuracy. Future research will explore text detection and recognition in more complex scenes and further generalize the approach to any font type and orientation.

pdf