WaveNet聲碼器及其於語音轉換之應用

黃文勁; 羅振州; 黃信德; 曹昱; 王新民

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	WaveNet聲碼器及其於語音轉換之應用
並列篇名	WaveNet Vocoder and its Applications in Voice Conversion
作者	黃文勁、羅振州、黃信德、曹昱、王新民
中文摘要	多數語音轉換模型仰賴以傳統來源濾波器模型（source-filter model）為基礎之聲碼器（vocoder）對語音訊號進行語音參數抽取以及合成語音。然而，受限於傳統聲碼器的諸多理論與假設，以傳統聲碼器為架構進行語音轉換所生成的語音，其自然度以及與目標語者的相似度均無法進一步提升。在深度學習（deep learning）領域中，WaveNet是現階段最成功的語音生成技術之一，能產生與過去方法相比自然度更高的語音。WaveNet聲碼器為WaveNet的一個延伸，具備產生超越傳統聲碼器的高品質語音的能力，並已逐漸被從事語音轉換研究之國外團隊所採用。過去，國內研究團隊所開發的語音轉換模型多以傳統聲碼器為基礎進行語音轉換，本論文試圖將WaveNet聲碼器引入國內幾個新近提出的語音轉換模型，以評估WaveNet聲碼器在這些語音轉換模型上的應用潛力。於實驗中，我們比較了三種語音轉換模型分別使用傳統聲碼器與WaveNet聲碼器所得到的結果。其中，所比較的語音轉換模型包括1）變分式自動編碼器（variational auto-encoder, VAE）、2）結合生成式對抗型網路之變分式自動編碼器、以及3）跨特徵領域變分式自動編碼器（cross domain VAE, CDVAE）。實驗結果顯示，三個語音轉換模型在使用WaveNet聲碼器後，與目標語者的相似度均獲得顯著的改善。在自然度方面，則僅有以VAE為基礎之語音轉換模型在使用WaveNet聲碼器後有顯著的提升。
英文摘要	Most voice conversion models rely on vocoders based on the source-filter model to extract speech parameters and synthesize speech. However, the naturalness and similarity of the converted speech are limited due to the vast theories and constraints posed by traditional vocoders. In the field of deep learning, a network structure called WaveNet is one of the stateof- the-art techniques in speech synthesis, which is capable of generating speech samples of extremely high quality compared with past methods. One of the extensions of WaveNet is the WaveNet vocoder. Its ability to synthesize speech of quality higher than traditional vocoders has made it gradually adopted by several foreign voice conversion research teams. In this work, we study the combination of the WaveNet vocoder with the voice conversion models recently developed by domestic research teams, in order to evaluate the potential of applying the WaveNet vocoder to these voice conversion models and to introduce the WaveNet vocoder to the domestic speech processing research community. In the experiments, we compared the converted speeches generated by three voice conversion models using a traditional WORLD vocoder and the WaveNet vocoder, respectively. The compared voice conversion models include 1) variational auto-encoder (VAE), 2) variational autoencoding Wasserstein generative adversarial network (VAW-GAN), and 3) cross domain variarional auto-encoder (CDVAE). Experimental results show that, using the WaveNet vocoder, the similarity between the converted speech generated by all the three models and the target speech is significantly improved. As for naturalness, only VAE benefits from the WaveNet vocoder.
起訖頁	96-110
關鍵詞	WaveNet、聲碼器、語音轉換、變分式自動編碼器、WaveNet、Vocoder、Voice Conversion、Variational Auto-Encoder
刊名	ROCLING論文集
期數	2018 (2018期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	基於基因演算法的組合式多文件摘要方法
該期刊-下一篇	探討鑑別式訓練聲學模型之類神經網路架構及優化方法的改進

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱