ON SPEECH AND SPEAKER RECOGNITION USING NEURAL NET MODELS

Authors

  • J. S. Mason
  • E. C. Andrews

Abstract

The field of digital speech processing may be divided into three distinct and somewhat in- dependent applications, namely speech recognition, speaker recognition and speech com- munications. Linear-predictive (LP) analysis techniques are used in all three areas to provide a compact signal representation that has a high information content. This paper examines the somewhat conflicting tasks of speech and speaker recog- nition using perceptually based LP features. Using the recently developed multi-layer perceptron it is possible to construct a single architecture that may be trained to perform either one of these tasks using identical speech training data. An Eset / 8 speaker and an alphabet / 26 speaker system are examined with both 5th and 14th order features. Both speech and speaker recognition tasks perform well confirming that the same structure fed with identical inputs can achieve both goals. It is found that the speaker specific information is contained predominantly in the higher order feature coefficients, with speech specific information concentrated in the lower order coefficients, confirming the results of Hermansky and Gu.

Keywords:

linear-prediction, speech recognition, speaker recognition, neural networks, perceptual weightings

How to Cite

Mason, J. S., Andrews, E. C. β€œON SPEECH AND SPEAKER RECOGNITION USING NEURAL NET MODELS ”, Periodica Polytechnica Electrical Engineering, 34(3), pp. 157–165, 1990.

Issue

Section

Articles