英文摘要 |
In a Machine Translation (MT) system, it is necessary to be able to determine the most likely structure among the ambiguities. This can be accomplished by using the probability as a selection basis for the well-formedness of each structure. However, this method requires a very large set of training data for the probabilistic database in order to obtain an acceptable degree of selection appropriateness. In ArchTran English-Chinese Machine Translation System, a probability-based approach to automatizing the structure selection process is adopted. Although this method performs satisfactorily for structures already in the database, it performs rather poorly for structures not in the database. This is the problem with a sparse database. Therefore, in this paper, we propose to improve the prediction power of the database by a technique called Database Smoothing. Briefly, there are two smoothing methods that can be adopted. The first method is to employ a flattening constant to smooth the empty probability cells of the database. The second method is to incorporate additional information from another database into the one to be smoothed. We have conducted a simulation on the smoothed database and an improvement of 13.1 percent is observed for the open test samples. This is very encouraging because it shows improvements can be achieved for all database applications that employ a smoothed probabilistic model. |