中文摘要 |
聲音事件偵測的目標是標記出音訊中的聲音事件及其時間界線。我們基於半監督式學習的均值教師框架,提出一個帶有殘差連接與注意力機制的RCRNN網路架構,其可用大量弱標註/未標註資料來訓練。而在許多聲音事件中,語音具有更豐富的訊息量,因此我們使用特定的時間頻率參數來擷取該類別的聲學特徵,並且利用資料增強與後處理來進一步提升效能。我們提出的系統於DCASE 2021 Task 4的驗證集上,PSDS(Polyphonic Sound Detection Score)-scenario 1、2和Event-based F1-Score分別達到38.2%,58.2%和44.3%,優於baseline的33.8%, 52.9%和40.7%。 |
英文摘要 |
Sound event detection (SED) system outputs sound events and their time boundaries in audio signals. We proposed an RCRNN-based SED system with residual connection and convolution block attention mechanism based on the mean-teacher framework of semi-supervised learning. The neural network can be trained with an amount of weakly labeled data and unlabeled data. In addition, we consider that the speech event has more information than other sound events. Thus, we use the specific time-frequency resolution to extract the acoustic feature of the speech event. Furthermore, we apply data augmentation and post-processing to improve the performance. On the DCASE 2021 Task 4 validation set, the proposed system achieves the PSDS (Poly-phonic Sound Event Detection Score)-scenario 2 of 57.6% and event-based F1-score of 41.6%, outperforming the baseline score of 52.7% and 40.7%. |