中文摘要 |
This paper presents a method of Chinese named entity (NE) identification using a
class-based language model (LM). Our NE identification concentrates on three
types of NEs, namely, personal names (PERs), location names (LOCs) and
organization names (ORGs). Each type of NE is defined as a class. Our language
model consists of two sub-models: (1) a set of entity models, each of which
estimates the generative probability of a Chinese character string given an NE class;
and (2) a contextual model, which estimates the generative probability of a class
sequence. The class-based LM thus provides a statistical framework for
incorporating Chinese word segmentation and NE identification in a unified way.
This paper also describes methods for identifying nested NEs and NE abbreviations.
Evaluation based on a test data with broad coverage shows that the proposed model
achieves the performance of state-of-the-art Chinese NE identification systems. |