Document zone content classification and its performance evaluation

Yalin Wang, Ihsin T. Phillips, Robert M. Haralick

Research output: Contribution to journalArticlepeer-review

51 Scopus citations

Abstract

This paper describes an algorithm for the determination of zone content type of a given zone within a document image. We take a statistical based approach and represent each zone with 25 dimensional feature vectors. An optimized decision tree classifier is used to classify each zone into one of nine zone content classes. A performance evaluation protocol is proposed. The training and testing data sets include a total of 24,177 zones from the University of Washington English Document Image database III. The algorithm accuracy is 98.45% with a mean false alarm rate of 0.50%.

Original languageEnglish (US)
Pages (from-to)57-73
Number of pages17
JournalPattern Recognition
Volume39
Issue number1
DOIs
StatePublished - Jan 1 2006
Externally publishedYes

Keywords

  • Background analysis
  • Decision tree classifier
  • Document image analysis
  • Document layout analysis
  • Pattern recognition
  • Viterbi algorithm
  • Zone content classification

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Document zone content classification and its performance evaluation'. Together they form a unique fingerprint.

Cite this