中文摘要 |
目標: 本研究目的在於建置護理紀錄語料庫與辭典, 並評估語音辨識應用於護理紀錄時之可行性。方法: 本研究捨棄傳統的紙本資料, 直接向台灣中部某護理紀錄全面電子化之醫學中心申請護理紀錄電子檔做為訓練語料; 收集自 2007 年 7 月至 2008 年 5 月間之護理紀錄, 包括 7 個加護單位及 5 個一般病房。 本研究使用中央研究院之中文斷詞系統與未知詞擷取系統做為工具, 分為三個階段進行研究: 第一階段為建置護理紀錄語料庫, 第二階段為建置護理紀錄辭典, 第三階段分別建立兩組語言模型並計算其混淆度做為第一、 二階段之評值。 結果: 共得到 1,000,000 筆之護理紀錄語料庫與 974 個詞的護理紀錄辭典, 語言模型之語言混淆度相對差異值約為 15.541%。 結論: 本研究所建置之護理紀錄辭典對語音辨識的影響屬於值得注意的成果。 |
英文摘要 |
Purpose: The purpose of this study is to build a routine nursing record corpus and evaluate the feasibility of using speech recognition in nursing record. Methods: We selected electronic nursing record from a medical center in Taiwan as text training data. The data were recorded during Jul, 07 to Mar, 08, included 7 ICU wards and 5 general wards. We used word segmentation and unknown-word extraction system from ACADEMIA SINICA to extract nursing record lexicon and training language models with and without the new lexicon, then calculate the perplexity of language models as evaluation. Results: This study accomplishes 1,000,000 nursing record corpus and 974 words nursing record lexicon. The relative perplexity reduction is 15.541% from the model without nursing record lexicon to the model with 974-words nursing record lexicon. Conclusions: According to rule of thumb, the influence of nursing records lexicon on the speech recognition is noteworthy. |