  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
ROCLING論文集 本站僅提供期刊文獻檢索。

A Way to Extract Unknown Words Without Dictionary from Chinese Corpus and Its Applications
作者 林義証余明興黃世陽吳明哲
We propose a way to detect the unknown words from the corpus. We call such unknown words Chinese frequent strings(CFS). The strings could be the combinations of some common Chinese words that are defined in a traditional dictionary. Such Chinese frequent strings appear more than once in some Chinese texts. The method we proposed can automatically detect such strings without using any lexicon, and no word segmentation is needed. We retrieve 55,518 Chinese frequent strings (reached for 13-gram in character) from a corpus consisting of 536,171 characters. To show that the strings we got are useful, we use these strings in Chinese phoneme-to-character and character-to-phoneme tasks. The test corpus contains manually-tagged phonetic symbols for each character. The correctness of the phoneme-to-character test is 96.5% and the correctness of the character-to-phoneme test is 99.7%. We make an MOS test about the determination of prosodic segments. The MOS score is 4.66 relative to the prosodic segments in spontaneous speech. This shows that the strings we retrieved are helpful in this aspect.
起訖頁 217-226
關鍵詞 unknown wordsphoneme-to-charactercharacter-to-phonemeprosodic segment
刊名 ROCLING論文集  
期數 1998 (1998期)
出版單位 國立高雄師範大學輔導與諮商研究所
該期刊-上一篇 Corpus-based Evaluation of Language Processing Systems Using Information Restoration Model
該期刊-下一篇 應用隱藏式馬可夫模型於口述對話系統之研究




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄