英文摘要 |
This paper proposes an improved probabilistic CFG, called mixture probabilistic GFG, based on an idea of cluster-based language modeling. The basic idea of this model involves clustering a training corpus into a number of subcorpora, and then training probabilistic CFGs from these subcorpora. At the clustering, the similar linguistic objects (e.g., belonging to the same context, topic or domain) are formed into one cluster. The resulting probabilistic CFGs become context- or topic-dependent, and thus accurate language modeling would be possible. The effectiveness of the proposed model is confirmed both from perplexity reduction and speech recognition experiments. |