運用類神經網路方法之語音端點偵測研究

鄧有志; 江振宇; 潘振銘

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	運用類神經網路方法之語音端點偵測研究
並列篇名	A Study on Voice Activation Detection by Using Neural Networks
作者	鄧有志、江振宇、潘振銘
中文摘要	本研究以深層類神經網路（Deep Neural Network, DNN）進行語音端點偵測，討論了以下影響語音端點偵測表現的幾個變量：（1）特徵參數抽取時考量的分析視窗大小、（2）DNN層數、（3）訊躁比以及（4）背景環境類型。實驗是使用台北大學雜訊語料庫（NTPU Noise Corpus），此資料庫是由智慧型手機錄製的各種背景雜訊以及TCC300語料庫混音而成，背景環境包含：（1）公車站、（2）捷運站、（3）火車站、（4）餐廳，而混音的訊躁比有：10dB、5dB、OdB以及乾淨語音。系統評量的標準為音框正確率（frame accuracy）以及equal error rate（EER）。實驗結果指出特徵參數分析視窗越大而在訓練與發展集合的表現有明顯變好的趨勢，但在測試集合則進步幅度較小。DNN層數在2 layer時的multi-condition其表現較好，訊躁比越高則進步也比較顯著，尤其是在背景環境為餐廳的情況下。最後multi-condition訓練法中的各個condition，在測試集合的表現皆優於matched-condition，證實了multi-condition中的各個condition，在hidden layer中能夠互相的學習。
英文摘要	This study used DNN (Deep Neural Network) to process Voice Activation Detection, and discussed the following variable which affect the performance of VAD: (1) The analyzed window size of MFCC feature extraction, (2) Layer number of DNN, (3) Signal to Noise Ratio, and (4) The type of background condition. This experiment used NTPU Noise Corpus, which is mixed by many kinds of background noise recorded by smart phone and TCC300 Corpus. The background noise includes: (1) Bus Stop, (2) MRT, (3) Train Station, (4) Restaurant, and the SNR is 10 dB, 5 dB, 0 dB and clean speech. Evaluated standards of system are frame accuracy and equal error rate (EER). The experiment result indicated that when the feature parameter analyzed window is bigger, the performances of training and validation set obviously become better, but the improved range of outside test is smaller. When layers number of DNN in 2 layer, the performance of multi-condition is better, and when the SNR is higher, the improvement is obviously, in particularly, the background condition is restaurant. In conclusion, in every conditions of the multi-condition training, the performances of outside test are all better than in matched-condition, and it proved that every conditions in multi-condition can learn each other in the hidden layer.
起訖頁	5-20
關鍵詞	語音端點偵測、MLP、DNN、台北大學雜訊語料庫、VAD、MLP、DNN、NTPU Additive Noise Corpus、layer #、feature frames、multicondition、matched-condition、frame accuracy、EER
刊名	ROCLING論文集
期數	2017 (2017期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	以知識表徵方法建構台語聲調群剖析器
該期刊-下一篇	基於卷積類神經網路之廣播節目音訊事件偵測系統