英文摘要 |
This paper is concerned with the automatic extraction of multi-character words from a corpus of ancient Chinese poetry and with some applications at word level. A detailed description of the word-extraction algorithm is given and compared with the mutual-information method. The study has been based on a 4.8 million-character corpus of Tang Dynasty poetry and a 1.6 million-character corpus of Song Dynasty poetry. Statistical analysis to date includes collocation, word-to-author analysis information, etc. Further research would include sentencesimilarity retrieval and a semantic index. |