Abstract
This paper describes an algorithm for the determination of zone content type of a given zone within a document image. We take a statistical based approach and represent each zone with 25 dimensional feature vectors. An optimized decision tree classifier is used to classify each zone into one of nine zone content classes. A performance evaluation protocol is proposed. The training and testing data sets include a total of 24,177 zones from the University of Washington English Document Image database III. The algorithm accuracy is 98.45% with a mean false alarm rate of 0.50%.
Original language | English (US) |
---|---|
Pages (from-to) | 57-73 |
Number of pages | 17 |
Journal | Pattern Recognition |
Volume | 39 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2006 |
Externally published | Yes |
Keywords
- Background analysis
- Decision tree classifier
- Document image analysis
- Document layout analysis
- Pattern recognition
- Viterbi algorithm
- Zone content classification
ASJC Scopus subject areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence