Lexical Coverage in Taiwan Mandarin Conversation

Shu-Chuan Tseng

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	Lexical Coverage in Taiwan Mandarin Conversation
作者	Shu-Chuan Tseng (Shu-Chuan Tseng)
中文摘要	Information about the lexical capacity of the speakers of a specific language is indispensible for empirical and experimental studies on the human behavior of using speech as a communicative means. Unlike the increasing number of igantic text- or web-based corpora that have been developed in recent decades, publicly distributed spoken resources, espcially conversations, are few in number. This article studies the lexical coverage of a corpus of Taiwan Mandarin conversations recorded in three speaking scenarios. A wordlist based on this corpus has been prepared and provides information about frequency counts of words and parts of speech processed by an automatic system. Manual post-editing of the results was performed to ensure the usability and reliability of the wordlist. Syllable information was derived by automatically converting the Chinese characters to a conventional romanization scheme, followed by manual correction of conversion errors and disambiguiation of homographs. As a result, the wordlist contains 405,435 ordinary words and 57,696 instances of discourse particles, markers, fillers, and feedback words. Lexical coverage in Taiwan Mandarin conversation is revealed and is compared with a balanced corpus of texts in terms of words, syllables, and word categories.
起訖頁	1-18
關鍵詞	Taiwan Mandarin、Conversation、Frequency Counts、Lexical Coverage、Discourse Items
刊名	中文計算語言學期刊
期數	201303 (18:1期)
出版單位	中華民國計算語言學學會
該期刊-下一篇	Learning to Find Translations and Transliterations on the Web based on Conditional Random Fields