月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
運用類神經網路方法之語音端點偵測研究
並列篇名
A Study on Voice Activation Detection by Using Neural Networks
作者 鄧有志江振宇潘振銘
中文摘要
本研究以深層類神經網路(Deep Neural Network, DNN)進行語音端點偵測,討論了以下影響語音端點偵測表現的幾個變量:(1)特徵參數抽取時考量的分析視窗大小、(2)DNN層數、(3)訊躁比以及(4)背景環境類型。實驗是使用台北大學雜訊語料庫(NTPU Noise Corpus),此資料庫是由智慧型手機錄製的各種背景雜訊以及TCC300語料庫混音而成,背景環境包含:(1)公車站、(2)捷運站、(3)火車站、(4)餐廳,而混音的訊躁比有:10dB、5dB、OdB以及乾淨語音。系統評量的標準為音框正確率(frame accuracy)以及equal error rate(EER)。實驗結果指出特徵參數分析視窗越大而在訓練與發展集合的表現有明顯變好的趨勢,但在測試集合則進步幅度較小。DNN層數在2 layer時的multi-condition其表現較好,訊躁比越高則進步也比較顯著,尤其是在背景環境為餐廳的情況下。最後multi-condition訓練法中的各個condition,在測試集合的表現皆優於matched-condition,證實了multi-condition中的各個condition,在hidden layer中能夠互相的學習。
英文摘要
This study used DNN (Deep Neural Network) to process Voice Activation Detection, and discussed the following variable which affect the performance of VAD: (1) The analyzed window size of MFCC feature extraction, (2) Layer number of DNN, (3) Signal to Noise Ratio, and (4) The type of background condition. This experiment used NTPU Noise Corpus, which is mixed by many kinds of background noise recorded by smart phone and TCC300 Corpus. The background noise includes: (1) Bus Stop, (2) MRT, (3) Train Station, (4) Restaurant, and the SNR is 10 dB, 5 dB, 0 dB and clean speech. Evaluated standards of system are frame accuracy and equal error rate (EER). The experiment result indicated that when the feature parameter analyzed window is bigger, the performances of training and validation set obviously become better, but the improved range of outside test is smaller. When layers number of DNN in 2 layer, the performance of multi-condition is better, and when the SNR is higher, the improvement is obviously, in particularly, the background condition is restaurant. In conclusion, in every conditions of the multi-condition training, the performances of outside test are all better than in matched-condition, and it proved that every conditions in multi-condition can learn each other in the hidden layer.
起訖頁 5-20
關鍵詞 語音端點偵測MLPDNN台北大學雜訊語料庫VADMLPDNNNTPU Additive Noise Corpuslayer #feature framesmulticonditionmatched-conditionframe accuracyEER
刊名 ROCLING論文集  
期數 2017 (2017期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 以知識表徵方法建構台語聲調群剖析器
該期刊-下一篇 基於卷積類神經網路之廣播節目音訊事件偵測系統
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄