具有特定語音分辨率的RCRNN聲音事件偵測系統

Sung-Jen Huang; Yih-Wen Wang; Chia-Ping Chen; Chung-Li Lu; Bo-Cheng Chan

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	具有特定語音分辨率的RCRNN聲音事件偵測系統
並列篇名	RCRNN-based Sound Event Detection System with Specific Speech Resolution
作者	Sung-Jen Huang、Yih-Wen Wang (Yih-Wen Wang)、Chia-Ping Chen (Chia-Ping Chen)、Chung-Li Lu、Bo-Cheng Chan
中文摘要	聲音事件偵測的目標是標記出音訊中的聲音事件及其時間界線。我們基於半監督式學習的均值教師框架，提出一個帶有殘差連接與注意力機制的RCRNN網路架構，其可用大量弱標註/未標註資料來訓練。而在許多聲音事件中，語音具有更豐富的訊息量，因此我們使用特定的時間頻率參數來擷取該類別的聲學特徵，並且利用資料增強與後處理來進一步提升效能。我們提出的系統於DCASE 2021 Task 4的驗證集上，PSDS(Polyphonic Sound Detection Score)-scenario 1、2和Event-based F1-Score分別達到38.2%,58.2%和44.3%，優於baseline的33.8%, 52.9%和40.7%。
英文摘要	Sound event detection (SED) system outputs sound events and their time boundaries in audio signals. We proposed an RCRNN-based SED system with residual connection and convolution block attention mechanism based on the mean-teacher framework of semi-supervised learning. The neural network can be trained with an amount of weakly labeled data and unlabeled data. In addition, we consider that the speech event has more information than other sound events. Thus, we use the specific time-frequency resolution to extract the acoustic feature of the speech event. Furthermore, we apply data augmentation and post-processing to improve the performance. On the DCASE 2021 Task 4 validation set, the proposed system achieves the PSDS (Poly-phonic Sound Event Detection Score)-scenario 2 of 57.6% and event-based F1-score of 41.6%, outperforming the baseline score of 52.7% and 40.7%.
起訖頁	118-123
關鍵詞	聲音事件偵測、均值教師模型、卷積注意力機制、語音、Sound event detection、Mean teacher model、CBMA、Speech
刊名	ROCLING論文集
期數	202112 (2021期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Mining Commonsense and Domain Knowledge from Math Word Problems
該期刊-下一篇	使用對話行為嵌入改善對話系統用戶訊息中提問句與閒聊句之判別