TY - GEN

T1 - Table detection via probability optimization

AU - Wang, Yalin

AU - Phillips, Ihsin T.

AU - Haralick, Robert M.

N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2002.

PY - 2002

Y1 - 2002

N2 - In this paper, we define the table detection problem as a probability optimization problem. We begin, as we do in our previous algorithm, finding and validating each detected table candidates. We proceed to compute a set of probability measurements for each of the table entities. The computation of the probability measurements takes into consideration tables, table text separators and table neighboring text blocks. Then, an iterative updating method is used to optimize the page segmentation probability to obtain the final result. This new algorithm shows a great improvement over our previous algorithm. The training and testing data set for the algorithm include 1, 125 document pages having 518 table entities and a total of 10, 934 cell entities. Compared with our previouswork, it raised the accuracy rate to 95.67% from 90.32% and to 97.05% from 92.04%.

AB - In this paper, we define the table detection problem as a probability optimization problem. We begin, as we do in our previous algorithm, finding and validating each detected table candidates. We proceed to compute a set of probability measurements for each of the table entities. The computation of the probability measurements takes into consideration tables, table text separators and table neighboring text blocks. Then, an iterative updating method is used to optimize the page segmentation probability to obtain the final result. This new algorithm shows a great improvement over our previous algorithm. The training and testing data set for the algorithm include 1, 125 document pages having 518 table entities and a total of 10, 934 cell entities. Compared with our previouswork, it raised the accuracy rate to 95.67% from 90.32% and to 97.05% from 92.04%.

UR - http://www.scopus.com/inward/record.url?scp=25144520914&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=25144520914&partnerID=8YFLogxK

U2 - 10.1007/3-540-45869-7_31

DO - 10.1007/3-540-45869-7_31

M3 - Conference contribution

AN - SCOPUS:25144520914

SN - 3540440682

SN - 9783540440680

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 272

EP - 282

BT - Document Analysis Systems V - 5th International Workshop, DAS 2002, Proceedings

A2 - Lopresti, Daniel

A2 - Hu, Jianying

A2 - Kashi, Ramanujan

PB - Springer Verlag

T2 - 5th International Workshop on Document Analysis Systems, DAS 2002

Y2 - 19 August 2002 through 21 August 2002

ER -