英文摘要 |
Due to the development of information technology and the need for content analysis of digital humanities research, the use of optical character recognition technology (OCR) to convert contents into verbatim texts can facilitate full-text search and content exploration. In order to understand the feasibility of using the OCR software to convert the full text of the ancient books, this study used the ancient texts to conduct a measured analysis to explore the effectiveness of OCR identification and the reasons for the impact of text recognition. The study selected 40 different layouts and glyphs of Ming Dynasty ancient books for analysis. The results show that the ancient book layout and image quality would affect the OCR recognition rate. When the layout is too crowded and the image quality is blurred, it is not conducive to OCR recognition. This study summarized six common types of identification error glyphs, which can provide the collection agencies to carry out the plan of the full text conversion of similar ancient books. |