On the Effects of Automatic Transcription and Segmentation Errors in Hungarian Spoken Language Processing

Authors

  • Máté Ákos Tündik
    Affiliation

    Department of Telecommunication and Media Informatics, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, H-1117 Budapest, Magyar tudósok körútja 2., Hungary; Nokia Solutions and Networks Ltd., 1083 Budapest, Bókay János u. 36-42, Hungary

  • Valér Kaszás
    Affiliation

    Department of Telecommunication and Media Informatics, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, H-1117 Budapest, Magyar tudósok körútja 2., Hungary

  • György Szaszák
    Affiliation

    Department of Telecommunication and Media Informatics, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, H-1117 Budapest, Magyar tudósok körútja 2., Hungary

https://doi.org/10.3311/PPee.14052

Abstract

Emerging Artificial Intelligence (AI) technology has brought machines to reach an equal or even superior level compared to human capabilities in several fields; nevertheless, among many other fields, making a computer able to understand human language still remains a challenge. When dealing with speech understanding, Automatic Speech Recognition (ASR) is used to generate transcripts, which are processed with text-based tools targeting Spoken Language Understanding (SLU). Depending on the ASR quality (which further depends on speech quality, the complexity of the topic, environment etc.), transcripts contain errors, which propagate further into the processing pipeline. Subjective tests show on the other hand, that humans understand quite well ASR-closed captions, despite the word and punctuation errors. Through word embedding based semantic parsing, the present paper is interested in quantifying the semantic bias introduced by ASR error propagation. As a special use case, speech summarization is also evaluated with regard to ASR error propagation. We show, that despite the higher word error rates seen with the highly inflectional Hungarian, the semantic space suffers least impact than the difference in Word Error Rate would suggest.

Keywords:

automatic punctuation, word embedding, semantic similarity, automatic summarization, speech recognition

Citation data from Crossref and Scopus

Published Online

2019-06-13

How to Cite

Tündik, M. Ákos, Kaszás, V., Szaszák, G. “On the Effects of Automatic Transcription and Segmentation Errors in Hungarian Spoken Language Processing”, Periodica Polytechnica Electrical Engineering and Computer Science, 63(4), pp. 254–262, 2019. https://doi.org/10.3311/PPee.14052

Issue

Section

Articles