英文摘要 |
This study used DNN (Deep Neural Network) to process Voice Activation Detection, and discussed the following variable which affect the performance of VAD: (1) The analyzed window size of MFCC feature extraction, (2) Layer number of DNN, (3) Signal to Noise Ratio, and (4) The type of background condition. This experiment used NTPU Noise Corpus, which is mixed by many kinds of background noise recorded by smart phone and TCC300 Corpus. The background noise includes: (1) Bus Stop, (2) MRT, (3) Train Station, (4) Restaurant, and the SNR is 10 dB, 5 dB, 0 dB and clean speech. Evaluated standards of system are frame accuracy and equal error rate (EER). The experiment result indicated that when the feature parameter analyzed window is bigger, the performances of training and validation set obviously become better, but the improved range of outside test is smaller. When layers number of DNN in 2 layer, the performance of multi-condition is better, and when the SNR is higher, the improvement is obviously, in particularly, the background condition is restaurant. In conclusion, in every conditions of the multi-condition training, the performances of outside test are all better than in matched-condition, and it proved that every conditions in multi-condition can learn each other in the hidden layer. |