英文摘要 |
A Chinese Text Summarization system is developed, which is based on the surface information of context as well as the corpus based word segmentation and keyword identification. Unknown words identification is the most difficult topic on Chinese Word Segmentation. The context information is utilized here to resolve the unknown words and ambiguous segmentation problem by integrating word frequency and word length to dynamically weight the word weight, the theory and experiments show that this approach is superior than traditional dictionary based matching approach and pure word frequency-based statistical approach. The segmentation precision is 98% for real text. The keyword identification is not only based on word frequency but also word length, salient sentence determination is solved by using word weights, sentence length, number of clauses, numeric word and unknown words etc.., less relying on sentence position and surface cues. The evaluation measures of summary is studied and experimental results are provided. |