Advanced Speaker Identification with CNNs and Maximum Likelihood Criterion

Achéne Abed; Aissa Amrouche; Redha Bendoumia; Abdelghafour Herizi; Ahmed Bouchekhlal

doi:10.3311/PPee.42209

Authors

Achéne Abed

Affiliation

Laboratory of Electrical Engineering (LGE), University of M'sila, PO Box166 Ichebilia, 28000 M'sila, Algeria
Aissa Amrouche

Affiliation

Department of Electronics, Faculty of Technology, University of Blida 1, Blida, Algeria
Redha Bendoumia

Affiliation

Laboratory of Detection, Information and Communication, Department of Electronics, University of Blida 1, Algeria
Abdelghafour Herizi

Affiliation

Laboratory of Electrical Engineering (LGE), University of M'sila, PO Box166 Ichebilia, 28000 M'sila, Algeria
Ahmed Bouchekhlal

Affiliation

Higher School of Signals (HSS), Po Box 11 Kolea,42070, Tipaza, Algeria

Abstract

Speaker identification is a crucial topic in various fields, including linguistics, speech acoustic technology, and artificial intelligence. Despite the progress, speaker identification remains a challenge, particularly in acoustically noisy contexts or when the speakers are phonetically similar. Moreover, concerns regarding privacy and data protection frequently arise in speaker identification, particularly concerning the use of personal audio data. Signal processing and machine learning techniques have significantly advanced, improving the accuracy and resilience of voice recognition systems. New methods, including Convolutional Neural Networks (CNN), are advancing voice information extraction performance. This study aims to develop a Speaker Identification System based on deep learning techniques. These techniques have gained widespread recognition in the field of automatic acoustic signal processing. Many researchers have used convolutional neural networks, and the recognition phase is based on the cross-entropy criterion. This article proposes an advanced technique to combine convolutional neural networks with the maximum likelihood criterion. This proposed technique has yielded promising results when compared to traditional systems, such as Vector Quantization (VQ), and Gaussian Mixture Model (GMM). The suggested approach achieves an accuracy of 87.97% using all the data from the LibriSpeech corpus.

Keywords:

speaker identification, MFCC, VQ, GMM, maximum likelihood

Citation data from Crossref and Scopus

Advanced Speaker Identification with CNNs and Maximum Likelihood Criterion

Authors

Abstract

Keywords:

Citation data from Crossref and Scopus

Published Online

How to Cite

Issue

Section

Make a Submission