運用基於生成預訓練轉換器架構的OpenAI Whisper多語言語音辨識引擎之台語及華語語音辨識之實作

Yueh-Che Hsieh; Ke-ming Lyu; Ren-Yuan Lyu

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	運用基於生成預訓練轉換器架構的OpenAI Whisper多語言語音辨識引擎之台語及華語語音辨識之實作
並列篇名	Taiwanese/Mandarin Speech Recognition using OpenAI's Whisper Multilingual Speech Recognition Engine Based on Generative Pretrained Transformer Architecture
作者	Yueh-Che Hsieh (Yueh-Che Hsieh)、Ke-ming Lyu (Ke-ming Lyu)、Ren-Yuan Lyu (Ren-Yuan Lyu)
中文摘要	本篇論文中，我們對OpenAI Whisper進行台語的模型微調，使Whisper能夠輸出華語和台語的繁體漢字。我們使用Hugging Face官方所提供的Whisper的Medium和Large-v2模型和微調方式，並使用CommonVoice的台語資料集和網路上蒐集的台語連續劇影片和字幕檔共800小時，CER最佳為50.7%。我們將在後續提供我們所微調的程式碼。
英文摘要	In this paper, we conducted model finetuning on OpenAI's Whisper for Taiwanese languages, enabling Whisper to generate both Mandarin and Taiwanese text outputs. We employed Hugging Face's official Whisper models, namely Medium and Large-v2, and their finetuning methodology. Additionally, we utilized the Taiwanese dataset from CommonVoice and collected around 800 hours of Taiwanese drama videos along with their subtitle files from the internet. The achieved Character Error Rate (CER) reached approximately 50.7%. We will provide the code we have fine-tuned in the subsequent updates.
起訖頁	210-214
關鍵詞	語音辨識、台語、華語、OpenAI Whisper、Speech recognition、Taiwanese (Minnan)、Mandarin、OpenAI Whisper
刊名	ROCLING論文集
期數	202310 (2023期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Sentence-level Revision with Neural Reinforcement Learning
該期刊-下一篇	KNOT-MCTS: An Effective Approach to Addressing Hallucinations in Generative Language Modeling for Question Answering