Advanced Speaker Identification with CNNs and Maximum Likelihood Criterion

Authors

  • Achéne Abed
    Affiliation
    Laboratory of Electrical Engineering (LGE), University of M'sila, PO Box166 Ichebilia, 28000 M'sila, Algeria
  • Aissa Amrouche
    Affiliation
    Department of Electronics, Faculty of Technology, University of Blida 1, Blida, Algeria
  • Redha Bendoumia
    Affiliation
    Laboratory of Detection, Information and Communication, Department of Electronics, University of Blida 1, Algeria
  • Abdelghafour Herizi
    Affiliation
    Laboratory of Electrical Engineering (LGE), University of M'sila, PO Box166 Ichebilia, 28000 M'sila, Algeria
  • Ahmed Bouchekhlal
    Affiliation
    Higher School of Signals (HSS), Po Box 11 Kolea,42070, Tipaza, Algeria
https://doi.org/10.3311/PPee.42209

Abstract

Speaker identification is a crucial topic in various fields, including linguistics, speech acoustic technology, and artificial intelligence. Despite the progress, speaker identification remains a challenge, particularly in acoustically noisy contexts or when the speakers are phonetically similar. Moreover, concerns regarding privacy and data protection frequently arise in speaker identification, particularly concerning the use of personal audio data. Signal processing and machine learning techniques have significantly advanced, improving the accuracy and resilience of voice recognition systems. New methods, including Convolutional Neural Networks (CNN), are advancing voice information extraction performance. This study aims to develop a Speaker Identification System based on deep learning techniques. These techniques have gained widespread recognition in the field of automatic acoustic signal processing. Many researchers have used convolutional neural networks, and the recognition phase is based on the cross-entropy criterion. This article proposes an advanced technique to combine convolutional neural networks with the maximum likelihood criterion. This proposed technique has yielded promising results when compared to traditional systems, such as Vector Quantization (VQ), and Gaussian Mixture Model (GMM). The suggested approach achieves an accuracy of 87.97% using all the data from the LibriSpeech corpus.

Keywords:

speaker identification, MFCC, VQ, GMM, maximum likelihood

Citation data from Crossref and Scopus

Published Online

2025-12-22

How to Cite

Abed, A., Amrouche, A., Bendoumia, R., Herizi, A., Bouchekhlal, A. “Advanced Speaker Identification with CNNs and Maximum Likelihood Criterion”, Periodica Polytechnica Electrical Engineering and Computer Science, 2025. https://doi.org/10.3311/PPee.42209

Issue

Section

Control, signal processing and signal analysis, medical applications