Abstract
A system for extracting data from irregular text tables is designed and implemented. This system, CuteX, is an association between every items in a column. It is implemented in Java and is approximately about 3000 lines of code. The system automatically partitions the set of input text tables into directories containing correct and incorrect extractions. This paper focuses on the demonstration of illustrating the robustness and iterative process of improving the extraction yield of the clustering algorithm.
Original language | English (US) |
---|---|
Number of pages | 1 |
Journal | SIGIR Forum (ACM Special Interest Group on Information Retrieval) |
State | Published - Dec 1 2002 |
Externally published | Yes |
Event | Proceedings of the Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - Tampere, Finland Duration: Aug 11 2002 → Aug 15 2002 |
ASJC Scopus subject areas
- Management Information Systems
- Hardware and Architecture