中文摘要 |
In this paper, we present a new method to deal with the Iris data classification problem based on the distribution of training instances. First, we find two useful attributes of the Iris data from the training instances that are more suitable to deal with the classification problem. It means that the distribution of the values of these two useful attributes of the three species (i.e., Setosa, Versicolor and Virginica) has less overlapping. Then, we calculate the average attribute values and the standard deviations of these two useful attributes. We also calculate the overlapping areas formed by the values of these two useful attributes between species of the training instances, the average attribute values, and the standard deviations of the values of these two useful attributes of each species. Then, we calculate the difference between the values of these two useful attributes of a testing instance to be classified and the values of these two useful attributes of each species of the training instances. We choose the species that has the smallest difference between the values of these two useful attributes of the testing instance and the values of these two useful attributes of each species of the training instances as the classification result of the testing instance. The proposed method gets a higher average classification accuracy rate than the existing methods. |