英文摘要 |
The Latent Dirichlet Allocation model in text analysis has weak generalization ability and poor interpretability of the topic words. In this paper, we address these issues using a topic analysis framework for Latent Dirichlet Allocation based on keyword selection. Our proposed solution extracts the keywords from scientific research articles and builds a keywords list according to filter rules. Then several words are selected in the abstracts of the articles based on the keywords list and the LDA model is used to analyze the topics of the selected words. To evaluate the performance of our proposed approach, journal articles in the field of educational technology are selected as data sources, and two types of comparative analysis are performed. Firstly, “verb”, and “verb + noun” word selection strategies are adopted to conduct a comparative study from aspects including domain expert analysis, model perplexity, topic coherence measure, and inter-topic distance analysis. Secondly, Hierarchical Dirichlet Process, Correlated Topic Models, and LDA-Word2Vec models are used to conduct a study from model perplexity and predictive log-likelihood aspects. The experimental results confirm that the topic analysis based on the keywords selection method overperforms others in both types of comparisons. |