月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
WaveNet聲碼器及其於語音轉換之應用
並列篇名
WaveNet Vocoder and its Applications in Voice Conversion
作者 黃文勁羅振州黃信德曹昱王新民
中文摘要
多數語音轉換模型仰賴以傳統來源濾波器模型(source-filter model)為基礎之聲碼器(vocoder)對語音訊號進行語音參數抽取以及合成語音。然而,受限於傳統聲碼器的諸多理論與假設,以傳統聲碼器為架構進行語音轉換所生成的語音,其自然度以及與目標語者的相似度均無法進一步提升。在深度學習(deep learning)領域中,WaveNet是現階段最成功的語音生成技術之一,能產生與過去方法相比自然度更高的語音。WaveNet聲碼器為WaveNet的一個延伸,具備產生超越傳統聲碼器的高品質語音的能力,並已逐漸被從事語音轉換研究之國外團隊所採用。過去,國內研究團隊所開發的語音轉換模型多以傳統聲碼器為基礎進行語音轉換,本論文試圖將WaveNet聲碼器引入國內幾個新近提出的語音轉換模型,以評估WaveNet聲碼器在這些語音轉換模型上的應用潛力。於實驗中,我們比較了三種語音轉換模型分別使用傳統聲碼器與WaveNet聲碼器所得到的結果。其中,所比較的語音轉換模型包括1)變分式自動編碼器(variational auto-encoder, VAE)、2)結合生成式對抗型網路之變分式自動編碼器、以及3)跨特徵領域變分式自動編碼器(cross domain VAE, CDVAE)。實驗結果顯示,三個語音轉換模型在使用WaveNet聲碼器後,與目標語者的相似度均獲得顯著的改善。在自然度方面,則僅有以VAE為基礎之語音轉換模型在使用WaveNet聲碼器後有顯著的提升。
英文摘要
Most voice conversion models rely on vocoders based on the source-filter model to extract speech parameters and synthesize speech. However, the naturalness and similarity of the converted speech are limited due to the vast theories and constraints posed by traditional vocoders. In the field of deep learning, a network structure called WaveNet is one of the stateof- the-art techniques in speech synthesis, which is capable of generating speech samples of extremely high quality compared with past methods. One of the extensions of WaveNet is the WaveNet vocoder. Its ability to synthesize speech of quality higher than traditional vocoders has made it gradually adopted by several foreign voice conversion research teams. In this work, we study the combination of the WaveNet vocoder with the voice conversion models recently developed by domestic research teams, in order to evaluate the potential of applying the WaveNet vocoder to these voice conversion models and to introduce the WaveNet vocoder to the domestic speech processing research community. In the experiments, we compared the converted speeches generated by three voice conversion models using a traditional WORLD vocoder and the WaveNet vocoder, respectively. The compared voice conversion models include 1) variational auto-encoder (VAE), 2) variational autoencoding Wasserstein generative adversarial network (VAW-GAN), and 3) cross domain variarional auto-encoder (CDVAE). Experimental results show that, using the WaveNet vocoder, the similarity between the converted speech generated by all the three models and the target speech is significantly improved. As for naturalness, only VAE benefits from the WaveNet vocoder.
起訖頁 96-110
關鍵詞 WaveNet聲碼器語音轉換變分式自動編碼器WaveNetVocoderVoice ConversionVariational Auto-Encoder
刊名 ROCLING論文集  
期數 2018 (2018期)
出版單位 中華民國計算語言學學會
該期刊-上一篇 基於基因演算法的組合式多文件摘要方法
該期刊-下一篇 探討鑑別式訓練聲學模型之類神經網路架構及優化方法的改進
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄