英文摘要 |
In this paper, we address the semantic classification of non-text symbols in Mandarin text using multiple decision classifiers. Some non-text symbols (e.g., '/' and ':')appear frequently within the Mandarin texts (such as newspaper, magazine and files in Internet). Usually, these symbols in sentence may have more than one possible oral expression. In contrast to 2-gram, 3-gram and n-gram language models, the paper proposes the multiple layer decision classifiers, which can resolve the category ambiguities of oral expression for patterns containing one or several non-text symbols in Mandarin texts efficiently. There are two principal phases in our proposed approach: training phase and classification phase. Currently, classification phase contains two decision classifiers. We can predict the correct category of the non-text symbols then translate the non-text symbols into correct oral expression further. The empirical precision rates for inside and outside test are 97.8% and 93.0% respectively. |