通過卷積多視角注意力和SudoNet進行高效的人聲分離

Che-Wei Liao; Aye Nyein Aung; Jeih-Weih Hung

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	通過卷積多視角注意力和SudoNet進行高效的人聲分離
並列篇名	ESC MA-SD Net: Effective Speaker Separation through Convolutional Multi-View Attention and SudoNet
作者	Che-Wei Liao (Che-Wei Liao)、Aye Nyein Aung (Aye Nyein Aung)、Jeih-Weih Hung (Jeih-Weih Hung)
中文摘要	本研究以人聲分離(speech separation)為主題，研究如何將混合的多個人聲信號成功分離。我們是利用端到端(end-to-end)的高效語音分離模型SuDoRM-RF做為基礎，並結合了MANNER模型中的殘差卷積轉換器區塊(Residual Conformer Block)以及多視角注意力區塊(Multi-view Attention block)來達到高效的語音分離模型ESCMA-SD Net。本模型中殘差卷積轉換器區塊在於移除無用資訊的同時還能保留重要語音信息，而透過多視角注意力模塊則用以關注擷取對各個面向語音特徵，如此一來，我們將可以得到相較原本SuDoRM-RF模型更加高效的語音分離模型ESC MA-SD Net。在我們的實驗中，分別從驗證資料(Validation dataset)以及時頻圖(Spectrogram)來展示提出之方法的良好的語音分離成效。
英文摘要	This study focuses on speaker separation, investigating how to successfully separate mixed multiple speech signals. We build upon the efficient end-to-end speech separation model SuDoRM-RF and integrate the Residual Conformer Block from the MANNER model along with the Multi-view Attention block to create the efficient speech separation model ESC MA-SD Net. The Residual Conformer Block in this model eliminates irrelevant information while preserving crucial speech details. The Multi-view Attention module is employed to capture diverse aspects of speech features. By doing so, we achieve a more efficient speech separation model, ESC MA-SD Net, compared to the original SuDoRM-RF model. In our experiments, we demonstrate the effectiveness of the proposed method using validation data and spectrograms to showcase the improved speech separation performance.
起訖頁	157-161
關鍵詞	語音分離、殘差連接法、端到端模型、Speech separation、Residual connect method、End to end module
刊名	ROCLING論文集
期數	202310 (2023期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Category Mapping for Zero-shot Text Classification
該期刊-下一篇	A Comparative Study of Generative Pre-trained Transformer-based Models for Chinese Slogan Generation of Crowdfunding