英文摘要 |
In this research, we developed the Taiwanese speech recognition system which used the Kaldi toolkit to implement. The Taiwanese corpus was collected by Taiwan Taiwanese National Reading Competition and Classmate Recording, and a total of about 11 hours of audio files were collected. Because the training data is small dataset, two audio augmentation methods are used to increase the training data, so that the acoustic model can be more robust and more effective training. One method is speed perturbation, which speeds up the original data by 1.1 times and slows it down by 0.9 times. Another method is to use multi-condition training data to simulate reverberation of the original speech and add background noise. The background noise includes music, speech, and noise. The acoustic model is trained for different hybrid deep neural network architectures which can use the advantages of each neural network by hybrid different neural networks, including TDNN, CNN-TDNN and CNN-LSTM-TDNN. In the experimental results, the CER in the domain of language modeling reaches 3.95%, and the CER of online decoding test is 3.06%. Compared with other researches on Taiwanese speech recognition of similar dataset size, the recognition results are better than other studies. |