英文摘要 |
Usually, there are various non-alphabet symbols (“/”, “:”, “-”, etc.) occurring in Mandarin texts. Such symbols may be pronounced more than one oral expression with respect to its sense category. In our previous works, we proposed the multi-layer decision classifier to disambiguate the sense category of non-alphabet symbols; the elementary feature is the statistical probability of token adopting the Bayesian rule. This paper adopts more features of tokens in sentences. Three techniques are further proposed to improve the performance. Experiments show that the proposed techniques can disambiguate the sense category of target symbols quite well, even with small size of data. The precision rates for inside and outside tests are upgraded to 99.6% and 96.5% by using more features of token and techniques. |