英文摘要 |
One of the primary tasks of a computer-assisted the pronunciation techniques (CAPT) system is mispronunciation detection and diagnosis. Previous research on CAPT mostly relies on a forced-alignment procedure which is usually conducted with the acoustic models adopted from a traditional speech recognition system, in conjunction with a phoneme paragraph, to calculate the goodness of pronunciation (GOP) scores for the phonemes of spoken words with respect to a text prompt. However, the training process of the traditional speech recognition system is complicated. In recent years, the end-to-end speech recognition system has not only greatly simplified this problem, but also has the trend of catching up with traditional speech recognition. In view of this, this thesis sets out to conduct mispronunciation detection and diagnosis on the strength of end-to-end speech recognition. To this end, we design and develop two mispronunciation detection methods: 1) method leveraging a recognition confidence measure; 2) method simply based speech recognition results; A series of experiments showed that leveraging end-to-end speech recognition architecture on mispronunciation detection and diagnosis not only reduced the training steps originally required for traditional speech recognition but also improve the performance of detection and diagnosis significantly. |