英文摘要 |
In Chinese natural language processing, word segmentation and part-of-speech tagging is generally carried out as two separate steps. Earlier, the authors introduced a tag-based Markov-model approach to word segmentation. As the tags are of a syntactic nature, this is effectively doing word segmentation and part-of-speech tagging simultaneously. We have used a best-first algorithm with empirical results showing the search for the best solution to be efficient for inputs of reasonable length. In this paper, we will see that the job can be done using an O(n2) algorithm. In our experiments, we actually had the algorithm reduced to O(n) by setting a maximum number of character for words in Chinese to a constant. We also show that performing word segmentation and part-of-speech tagging in one step will bring about improvement in accurracy. |