Advanced Speaker Identification with CNNs and Maximum Likelihood Criterion
Abstract
Speaker identification is a crucial topic in various fields, including linguistics, speech acoustic technology, and artificial intelligence. Despite the progress, speaker identification remains a challenge, particularly in acoustically noisy contexts or when the speakers are phonetically similar. Moreover, concerns regarding privacy and data protection frequently arise in speaker identification, particularly concerning the use of personal audio data. Signal processing and machine learning techniques have significantly advanced, improving the accuracy and resilience of voice recognition systems. New methods, including Convolutional Neural Networks (CNN), are advancing voice information extraction performance. This study aims to develop a Speaker Identification System based on deep learning techniques. These techniques have gained widespread recognition in the field of automatic acoustic signal processing. Many researchers have used convolutional neural networks, and the recognition phase is based on the cross-entropy criterion. This article proposes an advanced technique to combine convolutional neural networks with the maximum likelihood criterion. This proposed technique has yielded promising results when compared to traditional systems, such as Vector Quantization (VQ), and Gaussian Mixture Model (GMM). The suggested approach achieves an accuracy of 87.97% using all the data from the LibriSpeech corpus.
