Emotion-Aware Human-Computer Interaction: A Multimodal Affective Computing Framework with Deep Learning Integration
Abstract
Affective computing has become a central enabler of advanced human–computer interaction (HCI), as it allows computational systems to recognize and respond to users’ emotions in real time. While traditional unimodal approaches relying on facial expressions, speech, or physiological signals have achieved partial success, their robustness and generalizability remain limited in real-world applications. To address these issues, this paper introduces a multimodal affective computing framework that integrates electroencephalogram (EEG) signals, facial features, and speech cues through a deep learning-based feature fusion strategy. Experimental evaluations conducted on public benchmark datasets demonstrate that the proposed method significantly outperforms conventional unimodal approaches in recognition accuracy, adaptability, and noise resilience. Contributions of this work include the design of a scalable multimodal pipeline, the introduction of an optimized mathematical formulation for affective state fusion, and the validation of the framework’s effectiveness in enhancing interaction quality across education, healthcare, and immersive environments.