英文摘要 |
Applying Bioinformatics methods has become a must in the exploration of progressions or causes of carcinogenesis, where correctly finding relevant genes of cancers is a vital component. Hence, there is a proliferation of research works related to gene screening via computational methods. Due to the large number of human genes, it is too costly, time-consuming and labor intensive currently to apply experimental procedures of wet lab to screen genes related to a particular cancer of interest. Hence, we propose to apply Bioinformatics methods to identify relevant genes of cancers at a lower cost. In terms of computational methods, cancer gene screening via Bioinformatics analysis on gene expression levels of cDNA microarrays can be achieved by feature selection. In practical applications, it is implausible to effectively identify a sufficiently small number of cancer candidate genes in a realistic time span especially if the number of genes that need to be evaluated is massive. The integration of genetic algorithm and k-nearest neighbor classifier (GA/KNN) seems to be an effective method for identifying genes related to a certain cancer of interest. One major disadvantage of the GA/KNN method is its extensive execution time; therefore this paper proposes a two-phase method to explore how to significantly reduce GA/KNN's computation time for screening key genes without reducing its effectiveness. |