Combining Bidirectional Self-Attention and Residual Learning for Robust Regression

Blythe Gainsborough; Magnus Trelawney

doi:10.5281/zenodo.16923616

Vol. 5 No. 8 (2025)

Articles

Combining Bidirectional Self-Attention and Residual Learning for Robust Regression

https://doi.org/10.5281/zenodo.16923616

Published 2025-08-15

Blythe Gainsborough
Magnus Trelawney

Abstract

This paper proposes a deep regression algorithm that combines a bidirectional Transformer structure with residual connections to address the limitations in structural modeling capacity and representation stability in nonlinear regression tasks. The method adopts a bidirectional self-attention mechanism to capture both forward and backward dependencies among input features, thereby enhancing the model's awareness of global contextual information. Residual connections and normalization modules are introduced in the encoder to improve the efficiency of feature transmission in deep networks and to stabilize the training process. This design effectively alleviates issues such as information dilution and gradient vanishing. The overall architecture consists of input encoding, a bidirectional Transformer encoder, a residual fusion module, and a regression prediction layer, supporting end-to-end feature extraction and numerical regression mapping. In the output stage, the model applies pooling operations and fully connected transformations to compress and map the fused deep features, producing high-precision predictions of the target variable. To validate the effectiveness of the proposed method, comprehensive experiments are conducted on the public California housing dataset. These include comparative tests, hyperparameter sensitivity analysis, and data perturbation evaluations. The results demonstrate that the proposed method outperforms mainstream regression models across multiple metrics, showing strong modeling capability and robustness.

pdf