| 中文摘要 |
自動發音評測(Automatic Pronunciation Assessment, APA)是在量化非母語(L2)學習者在某種語言中發音的熟練程度。然而隨著技術的發展APA已經可以評測多個發音細粒度如音素層級、單字層級和語句層級及發音準確度、流利度、重音等多個面向。然而目前的APA方法使用均方誤差(Mean Squard Error, MSE)損失函數,但在每個細粒度的標籤都存在資料高度不平衡的問題,這會影響模型的泛化能力和公平性,MSE會低估稀有的標籤,但現有的研究卻很少涉及數據不平衡的問題。因此在本研究中,我們參考了在視覺分類建模中使用的類平衡損失函數,使用重新採樣的方式及加入一個可訓練的變數,縮小了在不平衡的回歸任務中,訓練集和測試集不匹配的程度。而我們在speechocean762資料集上評估我們的方法,這個資料集上字詞層級顯示出明顯不平衡的標籤,而我們的實驗結果顯示,在這個不平衡的資料集上,我們實驗的結果明顯獲得改善。 |
| 英文摘要 |
Automatic Pronunciation Assessment (APA) aims to quantify non-native (L2) learners' pronunciation proficiency in a specific language. With technological advancements, APA now evaluates various aspects of pronunciation, from phoneme level to sentence level, including accuracy, fluency, stress, and more. However, current APA methods rely on the Mean Squared Error (MSE) loss function, which struggles with imbalanced labels across different levels of granularity. This imbalance affects model generalizability and fairness, as MSE tends to underestimate rare labels. Despite these issues, existing research has not adequately addressed data imbalance. To address this gap, we draw inspiration from class-balanced loss functions in visual classification. Our approach involves resampling and introducing a trainable variable to narrow the gap between training and testing sets in imbalanced regression tasks, aiming to alleviate label imbalance effects in APA. Evaluating our method on the Speechocean762 dataset, known for significant word-level label imbalance, we observe remarkable enhancements in performance. Our proposed approach shows promise in tackling challenges stemming from imbalanced data in automatic pronunciation assessment. |