Table structure understanding and its performance evaluation

Yalin Wang, Ihsin T. Phillips, Robert M. Haralick

Research output: Contribution to journalArticle

31 Citations (Scopus)

Abstract

This paper presents a table structure understanding algorithm designed using optimization methods. The algorithm is probability based, where the probabilities are estimated from geometric measurements made on the various entities in a large training set. The methodology includes a global parameter optimization scheme, a novel automatic table ground truth generation system and a table structure understanding performance evaluation protocol. With a document data set having 518 table and 10,934 cell entities, it performed at the 96.76% accuracy rate on the cell level and 98.32% accuracy rate on the table level.

Original languageEnglish (US)
Pages (from-to)1479-1497
Number of pages19
JournalPattern Recognition
Volume37
Issue number7
DOIs
StatePublished - Jul 2004
Externally publishedYes

Keywords

  • Document image analysis
  • Document layout analysis
  • Non-parametric statistical modeling
  • Optimization
  • Pattern recognition
  • Performance evaluation
  • Table structure understanding

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Table structure understanding and its performance evaluation. / Wang, Yalin; Phillips, Ihsin T.; Haralick, Robert M.

In: Pattern Recognition, Vol. 37, No. 7, 07.2004, p. 1479-1497.

Research output: Contribution to journalArticle

Wang, Yalin ; Phillips, Ihsin T. ; Haralick, Robert M. / Table structure understanding and its performance evaluation. In: Pattern Recognition. 2004 ; Vol. 37, No. 7. pp. 1479-1497.
@article{5740f11f7fed49b9a0daec0203762bbc,
title = "Table structure understanding and its performance evaluation",
abstract = "This paper presents a table structure understanding algorithm designed using optimization methods. The algorithm is probability based, where the probabilities are estimated from geometric measurements made on the various entities in a large training set. The methodology includes a global parameter optimization scheme, a novel automatic table ground truth generation system and a table structure understanding performance evaluation protocol. With a document data set having 518 table and 10,934 cell entities, it performed at the 96.76{\%} accuracy rate on the cell level and 98.32{\%} accuracy rate on the table level.",
keywords = "Document image analysis, Document layout analysis, Non-parametric statistical modeling, Optimization, Pattern recognition, Performance evaluation, Table structure understanding",
author = "Yalin Wang and Phillips, {Ihsin T.} and Haralick, {Robert M.}",
year = "2004",
month = "7",
doi = "10.1016/j.patcog.2004.01.012",
language = "English (US)",
volume = "37",
pages = "1479--1497",
journal = "Pattern Recognition",
issn = "0031-3203",
publisher = "Elsevier Limited",
number = "7",

}

TY - JOUR

T1 - Table structure understanding and its performance evaluation

AU - Wang, Yalin

AU - Phillips, Ihsin T.

AU - Haralick, Robert M.

PY - 2004/7

Y1 - 2004/7

N2 - This paper presents a table structure understanding algorithm designed using optimization methods. The algorithm is probability based, where the probabilities are estimated from geometric measurements made on the various entities in a large training set. The methodology includes a global parameter optimization scheme, a novel automatic table ground truth generation system and a table structure understanding performance evaluation protocol. With a document data set having 518 table and 10,934 cell entities, it performed at the 96.76% accuracy rate on the cell level and 98.32% accuracy rate on the table level.

AB - This paper presents a table structure understanding algorithm designed using optimization methods. The algorithm is probability based, where the probabilities are estimated from geometric measurements made on the various entities in a large training set. The methodology includes a global parameter optimization scheme, a novel automatic table ground truth generation system and a table structure understanding performance evaluation protocol. With a document data set having 518 table and 10,934 cell entities, it performed at the 96.76% accuracy rate on the cell level and 98.32% accuracy rate on the table level.

KW - Document image analysis

KW - Document layout analysis

KW - Non-parametric statistical modeling

KW - Optimization

KW - Pattern recognition

KW - Performance evaluation

KW - Table structure understanding

UR - http://www.scopus.com/inward/record.url?scp=2442434418&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442434418&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2004.01.012

DO - 10.1016/j.patcog.2004.01.012

M3 - Article

AN - SCOPUS:2442434418

VL - 37

SP - 1479

EP - 1497

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

IS - 7

ER -