英文摘要 |
Knowledge data are indispensable for the comprehension and context analysis of natural language. The author describes the ways of acquiring and expanding such knowledge data. Kanji (Chinese character) strings are frequently used in the Japanese language. The author attached importance to five-character Kanji strings and decided to extract the five-character strings which can be divided into two character ⊕ three-character or three character ⊕ two-character combinations. A large quantity of such data were collected and know ledge data were further expanded by combining them with postpositive particles and auxiliary verbs. Five-character strings were extracted from the Asahi Shimbun, and about 76,000 items of knowledge data were obtained by sorting them out. The knowledge data thus obtained could be further expanded. |