A Convolutional Neural Network-Based Speech Recognition System for Autonomous Driving

Emory Caldwell

doi:10.5281/zenodo.14912698

Vol. 5 No. 2 (2025)

Articles

A Convolutional Neural Network-Based Speech Recognition System for Autonomous Driving

https://doi.org/10.5281/zenodo.14912698

Published 2025-02-15

Emory Caldwell

Abstract

With the rapid development of autonomous driving technology, speech recognition has become a crucial component of human-machine interaction. Traditional speech recognition methods, such as Hidden Markov Models (HMM) and Gaussian Mixture Models (GMM), struggle to maintain high accuracy in complex and noisy driving environments. Recently, deep learning, particularly Convolutional Neural Networks (CNNs), has shown significant advantages in speech recognition by efficiently extracting relevant features from speech signals. This paper proposes a CNN-based speech recognition system designed specifically for autonomous driving environments. The system extracts Mel spectrogram features from speech input and utilizes a multi-layer CNN to classify spoken commands.We conduct extensive experiments on LibriSpeech, Mozilla Common Voice, and a self-collected in-car speech dataset, which simulates real-world driving conditions. The proposed CNN model outperforms traditional methods in terms of accuracy, robustness, and computational efficiency. We further evaluate the impact of different CNN architectures (such as ResNet, DenseNet, and VGG) on speech recognition performance and analyze the effectiveness of various training optimizations, including data augmentation, batch normalization, and dropout regularization.

pdf