進階式調變頻譜補償法於強健性語音辨識之研究

葉威志; 杜文祥; 洪志偉

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	進階式調變頻譜補償法於強健性語音辨識之研究
並列篇名	Advanced Modulation Spectrum Compensation Techniques for Robust Speech Recognition
作者	葉威志、杜文祥、洪志偉
中文摘要	自動語音辨識是一門很值得研究開發的課題，現今多數的語音辨識系統若應用於不受干擾的安靜環境，雖然能得到相當滿意的辨識效果，但若將其應用於實際的環境中，則會受到環境雜訊的影響，導致辨識效能明顯地下降，因此發展多年的環境強健性技術，即是針對此項缺點作改進。在各種環境強健性技術中，有一類技術為對語音特徵的調變頻譜作統計上的正規化，而在先前這一類技術的研究裡，若對分頻段的頻譜做正規化處理，相對於全頻帶正規化的處理法有較好的強健性效能，但其中由於不等切的切割方式，將調變頻譜中低頻部份分的比較細，導致低頻範圍的子頻段，會有頻譜點數不足的問題，影響到我們計算其頻譜特徵統計值的精確度，因此這些方法應有改進的空間。基於此觀察，本論文提出一系列重疊式分頻段調變頻譜統計正規化法，此類方法可以有效提升子頻段中用以計算統計值的頻譜點數，提升統計值的精確度，進而改善分頻段統計正規化法的效能，可以使所得特徵在環境強健性上的效能更為優越。本論文採用國際通用的AURORA-2連續數字語料庫作一系列的語音辨識實驗，由實驗結果可明確驗證，我們提出的重疊式分頻段方法比起傳統非重疊式分頻段的方法更能有效地提升各種雜訊環境下的辨識精確率。此外，我們也將這些新方法結合傳統之時間序列域特徵正規化法，實驗結果皆顯示這樣的組合皆能比單一方法更有效地提升辨識率，足見它們有良好的加成性。
英文摘要	In this paper, we propose a novel scheme in performing feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal domain feature sequence is first converted into the modulation spectral domain. The magnitude part of the modulation spectrum is decomposed into overlapped non-uniform sub-band segments, and then each sub-band segment is individually processed by the well-known normalization methods, like mean normalization (MN) and mean and variance normalization (MVN). Finally, we reconstruct the feature stream with all the modified sub-band magnitude spectral segments and the original phase spectrum using the inverse DFT. With this process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately and more spectral samples within each band give rise to more accurate statistic estimates due to overlapping the adjacent segments. For the Aurora-2 clean-condition training task, the new proposed overlapping sub-band spectral MN and MVN provide further error rate reductions over the conventional non-overlapping ones.
起訖頁	236-250
關鍵詞	語音辨識、調變頻譜、正規化、強健性語音特徵參數、speech recognition、modulation spectrum、statistics normalization、robust speech features
刊名	ROCLING論文集
期數	2010 (2010期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	可變速中文文字轉語音系統
該期刊-下一篇	Identifying Correction Rules for Auto Editing

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱