中文摘要 |
The Constrained Run-Length Algorithm (CRLA) is a well-known technique for page segmentation. The algorithm is very efficient for partitioning documents with Manhattan layouts but not suited to deal with complex layout pages, e.g. irregular graphics embedded in a text paragraph. Its main drawback is to use only local information during the smearing stage, which may lead to erroneous linkage of text and graphics. This paper presents a solution to this problem by adding global information into the process of the CRLA. This enhanced CRLA can be applied to non-Manhattan page layout successfully. It can also extract text surrounded by a box. Both cases cannot be processed by the original CRLA. |