CuTeX: A system for extracting data from text tables

Hasan Davulcu, Saikat Mukherjee, Arvind Seth, I. V. Ramakrishnan

Research output: Contribution to journalArticle

Abstract

A system for extracting data from irregular text tables is designed and implemented. This system, CuteX, is an association between every items in a column. It is implemented in Java and is approximately about 3000 lines of code. The system automatically partitions the set of input text tables into directories containing correct and incorrect extractions. This paper focuses on the demonstration of illustrating the robustness and iterative process of improving the extraction yield of the clustering algorithm.

Original languageEnglish (US)
Pages (from-to)457
Number of pages1
JournalSIGIR Forum (ACM Special Interest Group on Information Retrieval)
StatePublished - 2002
Externally publishedYes

Fingerprint

Clustering algorithms
Demonstrations
Java
Robustness
Clustering algorithm

ASJC Scopus subject areas

  • Management Information Systems
  • Hardware and Architecture

Cite this

CuTeX : A system for extracting data from text tables. / Davulcu, Hasan; Mukherjee, Saikat; Seth, Arvind; Ramakrishnan, I. V.

In: SIGIR Forum (ACM Special Interest Group on Information Retrieval), 2002, p. 457.

Research output: Contribution to journalArticle

@article{b33062980d394145909a78e36a30c0fe,
title = "CuTeX: A system for extracting data from text tables",
abstract = "A system for extracting data from irregular text tables is designed and implemented. This system, CuteX, is an association between every items in a column. It is implemented in Java and is approximately about 3000 lines of code. The system automatically partitions the set of input text tables into directories containing correct and incorrect extractions. This paper focuses on the demonstration of illustrating the robustness and iterative process of improving the extraction yield of the clustering algorithm.",
author = "Hasan Davulcu and Saikat Mukherjee and Arvind Seth and Ramakrishnan, {I. V.}",
year = "2002",
language = "English (US)",
pages = "457",
journal = "SIGIR Forum (ACM Special Interest Group on Information Retrieval)",
issn = "0163-5840",
publisher = "Association for Computing Machinery (ACM)",

}

TY - JOUR

T1 - CuTeX

T2 - A system for extracting data from text tables

AU - Davulcu, Hasan

AU - Mukherjee, Saikat

AU - Seth, Arvind

AU - Ramakrishnan, I. V.

PY - 2002

Y1 - 2002

N2 - A system for extracting data from irregular text tables is designed and implemented. This system, CuteX, is an association between every items in a column. It is implemented in Java and is approximately about 3000 lines of code. The system automatically partitions the set of input text tables into directories containing correct and incorrect extractions. This paper focuses on the demonstration of illustrating the robustness and iterative process of improving the extraction yield of the clustering algorithm.

AB - A system for extracting data from irregular text tables is designed and implemented. This system, CuteX, is an association between every items in a column. It is implemented in Java and is approximately about 3000 lines of code. The system automatically partitions the set of input text tables into directories containing correct and incorrect extractions. This paper focuses on the demonstration of illustrating the robustness and iterative process of improving the extraction yield of the clustering algorithm.

UR - http://www.scopus.com/inward/record.url?scp=0036989535&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036989535&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0036989535

SP - 457

JO - SIGIR Forum (ACM Special Interest Group on Information Retrieval)

JF - SIGIR Forum (ACM Special Interest Group on Information Retrieval)

SN - 0163-5840

ER -