中文摘要 |
Escherichia coli (E. coli) K12 was sequenced in 1997. The 4,639,221-base pair DNA sequence consists of 4288 annotated protein-coding genes, 38 percent of which have no attributed function. One of the major problems in predicting prokaryotic promoters is locating the spacers between the -35 box and -10 box and between the -10 box and transcription start site. In this paper, we use the adopted expectation maximization (EM) algorithm to accurately find the localizations of the promoter regions. A brand new purine-pyrimidine encoding method is proposedto reduce the dimensions of the training data. The heavy demand on systems for both computation and memory space can then be avoided through the choice of coding factor. The most representative features are used for training learning vector quantization networks. The simulation results of the proposed coding approach reveal that the precision of promoter prediction using the proposed approach is approximately the same as the precision using the traditional encoding method. |