英文摘要 |
PAT tree is an efficient n-gram indexing structure. Except for text retrieval, it is believed also useful in many natural language processing applications for the construction of n-gram language models. But, an original PAT tree requires much space in memory to maintain fast speed of n-gram access and is limited to construct a large language model in practical environments. The purpose of this paper is to present an improved PAT tree structure, called CPAT tree (Compact PAT tree) for natural language modeling applications. The CPAT tree can significantly reduce the main memory requirement of original PAT trees and is found very efficient in constructing large n -gram language models. Such an advantage has been proven in OCRed-text verification and will be also introduced in this paper. |