中文摘要 |
本文中提出以模型輕量化為目標的聲音事件偵測RepVGGRNN模型。其於卷積層使用RepVGG卷積塊,透過殘差連接的網路結構使模型達到良好的效能,並於模型訓練完畢後透過結構重參數化使得卷積參數得以縮減。此外,其於訓練階段合併使用知識蒸餾及均值教師模型之訓練方法進一步提昇輕量化模型之預測準確度。RepVGGRNN在DCASE 2022Task4驗證集中,PSDS(Polyphonic soundevent detection score)-scenario 1, 2分別以40.8%, 67.7%優於官方baseline系統所達到的34.4%, 57.2%,並在模型參數量上,RepVGGRNN使用的參數量約為49.6萬,僅baseline系統之44.6%。 |
英文摘要 |
In this paper, we proposed RepVGGRNN, which is a light weight sound event detection model. We use RepVGG convolution blocks in the convolution part to improve performance, and re-parameterize the RepVGG blocks after the model is trained to reduce the parameters of the convolution layers. To further improve the accuracy of the model, we incorporated both the mean teacher method and knowledge distillation to train the lightweight model. The proposed system achieves PSDS (Polyphonic sound event detection score)-scenario 1, 2 of 40.8% and 67.7% outperforms the baseline system of 34.4% and 57.2% on the DCASE 2022 Task4 validation dataset. The quantity of the parameters in the proposed system is about 49.6K, only 44.6% of the baseline system. |