運用不同音訊長度於遷移式學習以提升電鋸聲音識別能力之研究

Jia-Wei Chang; Zhong-Yun Hu

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	運用不同音訊長度於遷移式學習以提升電鋸聲音識別能力之研究
並列篇名	A Study on Using Different Audio Lengths in Transfer Learning for Improving Chainsaw Sound Recognition
作者	Jia-Wei Chang (Jia-Wei Chang)、Zhong-Yun Hu
中文摘要	在山林中，由於聲音的多元複雜及環境中諸多的雜訊，電鋸聲音的識別是富有挑戰性的任務。本研究認為以不同的聲音長度對於模型的訓練結果可能有所差異，故以簡易的LeNet模型結合了平均池化層設計出能夠接受任意長度音訊的識別模型。本研究主要分析不同聲音長度對於模型訓練之影響以及短至長與長至短音訊的遷移學習結果。本實驗皆以ESC-10資料集來訓練模型並以自行蒐集的電鋸聲資料集驗證模型的準確度。實驗結果表明(1)以1秒、3秒、5秒資料集分別訓練的三個模型，在1秒、3秒與5秒的電鋸聲驗證集中，各達到74%~78%、74%~77%與79%~83%的準確度。(2)以1秒→3秒→5秒的ESC-10資料遷移學習的模型於1秒、3秒與5秒電鋸聲驗證集中分別達到85.28%、88.67%與91.8%準確度，均較原訓練方法有所明顯提升。(3)在遷移式學習中，相較於長至短秒數的遷移訓練，以短至長秒數的遷移訓練得到了較佳的結果；尤其在5秒的電鋸聲驗證集中相差了14%的準確度。
英文摘要	Chainsaw sound recognition is a challenging task because of the complexity of sound and the excessive noises in mountain environments. This study aims to discuss the influence of different sound lengths on the accuracy of model training. Therefore, this study used LeNet, a simple model with few parameters, and adopted the design of average pooling to enable the proposed models to receive audio of any length. In performance comparison, we mainly compared the influence of different audio lengths and further tested the transfer learning from short-to-long and long-to-short audio. In experiments, we used the ESC-10 dataset for training models and validated their performance via the self-collected chainsaw-audio dataset. The experimental results show that (a) the models trained with different audio lengths (1s, 3s, and 5s) have accuracy from 74%~78%, 74%~77%, and 79%~83% on the self-collected dataset. (b) The generalization of the previous models is significantly improved by transfer learning, the models achieved 85.28%, 88.67%, and 91.8% of accuracy. (c) In transfer learning, the model learned from short-to-long audios can achieve better results than that learned from long-to-short audios, especially being differed 14% of accuracy on 5s chainsaw-audios.
起訖頁	67-74
關鍵詞	聲音辨識、環境聲音分類、電鋸聲音識別、遷移學習、Voice Recognition、Environmental Sound Classification、Chainsaw Sound Recognition、Transfer Learning
刊名	ROCLING論文集
期數	202212 (2022期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	基於RoBERTa的中藥命名實體識別模型
該期刊-下一篇	Using Grammatical and Semantic Correction Model to Improve Chinese-to-Taiwanese Machine Translation Fluency