中文摘要 |
The chi-square test is one of the statistical tests and is good to analyze whether categorical variable A is the significant factor to categorical variable B. On the other hand, a decision tree is one of useful models for data classification. To achieve the goal of efficient knowledge discovery by a compact decision tree, in this paper, we propose a method by making use of the result of the chi-square test to reduce the number of concerned attributes. We make use of the P-value from the chi-square test to decide the significant factors as the preprocessing step to prune insignificant factors before constructing the decision tree. In such a way, we can avoid constructing the inaccurate decision tree. We use the public baseball database as an example to illustrate our method. From our performance study, we observe that the way of checking the most significant factor (i.e., the factor with the minimum P-value) first can reduce the number of conditions (i.e., levels) to be decided. Therefore, the compact decision tree constructed from our method can provide less storage cost, faster prediction time and higher degree of accuracy for data classification than the decision tree concerning all original factors. |