中文摘要 |
國家教育研究院臺灣華語文語料庫(Corpus of Contemporary Taiwanese Mandarin, COCT)語料包括書面語、口語、華英雙語及華語中介語。本文目的主要為應用華語文語料庫研發華語文漢字、詞語及語法點分級及研發語料庫整合應用系統。本文應用華語文語料庫語料的詞頻、覆蓋率、分布均勻度、類詞綴、語義場關聯詞、構詞率及組字力的統計分析結果,輔以學者專家和資深華語文教師諮詢,完成華語文漢字、詞語及語法點分級標準。此外,整合應用華語文分級標準成果及語料庫科技研發建置了「語料庫索引典系統」、「語義場關聯詞查詢系統」、「作文錯別字自動批改系統」及「例句編輯輔助系統」等系統。最後,本文並對未來華語文語料庫在通用詞頻表的建置、基礎詞彙表的建構、及華語文搭配詞結構分析等之研究,提出建議。
The main reason for the National Academy for Educational Research to construct the Corpus of Contemporary Taiwanese Mandarin (COCT) is to make sure a comprehensive applications for Teaching Chinese as a Second Language (TCSL). The COCT includes corpora taken from written language, spoken language, bilingual Chinese-English and Chinese learners' interlanguage. This paper aims to explore the application of the COCT in establishing difficulty levels of Chinese characters, words, and grammar for TCSL, and the development of corpus techniques in TCSL with standard system integration. After conducting statistical analyses of lexical frequency, coverage, distribution uniformity, affixes, semantic-field-related words, character and word formation rates from the COCT, as well as consulting with experts and senior TCSL teachers, the researchers have been able to establish a standard for the classification of Chinese characters, words, and grammatical patterns. Furthermore, a NAER concordance system, a Semantic-field-related word query system, a Writing typos automatic correction system and an Example sentences editing-assistance system were completed by integrating the standard system and corpus techniques. Finally, this paper puts forward some suggestions for the future use of the COCT in the construction of a common-word frequency table, a basic vocabulary table, and the analysis of the Chinese collocation structure. |