英文摘要 |
In social science research, it is important to collect categorical survey data and interpret relationships among the qualitative variables. For example, the image studies revealing the relationships between and within bands and attributes, or a study on the relationship between students’ characters, their majors and their future careers. Correspondence analysis (CA) is an exploratory data analysis technique for the graphical display of multivariate categorical data (Hoffman & Franke, 1986). When complex multivariate relationships are examined, it often occurs that the results of CA are too complex to be easily read. This situation can be improved with the help of clustering techniques. The purpose of this study is that from the point of data collection, we provide a way of using a multiple-response-categorical-data survey instrument. It is more popular, simpler, and the interviewees find it easier to answer. From the point of data analysis, we provide a method that analyzes qualitative data completely by complementary use of CA and cluster analysis and so, the optimal graphical representation can then be used to reveal the structure in the data clearly. Two examples illustrate how complementary use of correspondence analysis and cluster analysis can provide the graphical display of multivariable categorical survey data. The first example shows how to use the pick-any method to data collection, that is: multiple selections were permitted from 31 categories of a student character survey (313 students). Respondents were clustered into 6 distinct character segments by K-means cluster analysis based on their coordinates in the full-information-used space ( i.e., 30-dimension CA space ) which accounts for 100% of the total variance. The result shows that the cluster dendrogram is successfully used in CA map to remedy the distortion of distances due to the planar approximation of this map, which was also found in Lebart (1994). The second example shows to combine the advantages of correspondence analysis and cluster analysis to reduce the two CA maps to just one, named correspondence-cluster dendrogram (Hi, Zhu & Wu, 1995). One of the CA maps is the joint display of ‘major’ and ‘future career’. The other one is the joint display of ‘character’ and ‘future career’. The correspondence-cluster dendrogram which accounts for a high degree of the total variance shows all the relationships that exist within and between the three variables (character, major, and future career). The result shows that the students’ future careers are influenced by their character and major, especially the latter. For example, the group, with its significant characteristics, includes students with ‘optimism’ and ‘self-esteem‘ on the student’s character profile. They are divided into two subgroups: the students who major in hospitality and tourism management are in favor of a hospitality-related career, and the students who major in business administration are in favor of a marketing-related career. Both subgroups are also in favor of the travel industry. Overall, qualitative data are common products of social science research. As the study presents it, the multiple-response categorical scale is easier and less demanding for the respondent. Sometimes, the interviewees are more willing to answer when there are large numbers of attribute categories being measured, which coincides with the findings of Arimond & Elfessi (2001). Complementary use of correspondence analysis and cluster analysis is highly recommended for useful and reduced special representation of multivariate categorical data. For the first example, correspondence analysis and K-means cluster analysis can be used in tandem to provide a category map and a student-character-segmentation process without losing information. The combined use of hierarchical grouping techniques and correspondence analysis can improve the understanding of the data. The second example provides an output of the method: named correspondence-cluster dendrogram, which shows simultaneously and clearly the relationships between all multi-variable categories. Furthermore, the examples provide a clearer and more reliable reference for education improvement. |