英文摘要 |
In this paper, we assessed the Latent Semantic Indexing (LSI) approach for Chinese information filtering. The assessment was for Chinese news filtering agents that used a character-based and hierarchical filtering scheme. The traditional vector space model was employed as information filtering model, and each document was converted into a vector of weights of terms. Instead of using words as terms in IR denominating tradition, the terms were referred to Chinese characters. LSI captured the semantic relationship between the documents and Chinese characters. We used the Singular-value Decomposition(SVD) technique to compress the terms space into a lower dimension which achieves latent association between document and terms. We showed by experiments that the recall and precision results of Chinese news filtering by character-based approach incorporating the LSI technique into the information filtering system were satisfactory. |