Toward multidatabase mining: Identifying relevant databases

Huan Liu, Hongjun Lu, Jun Yao

Research output: Contribution to journalArticle

38 Scopus citations

Abstract

Various tools and systems for knowledge discovery and data mining are developed and available for applications. However, when we are immersed in heaps of databases, an immediate question is where we should start mining. It is not true that the more databases, the better for data mining. It is only true when the databases involved are relevant to a task at hand. In this paper, breaking away from the conventional data mining assumption that many databases be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most likely relevant to an application; without doing so, the mining process can be lengthy, aimless, and ineffective. A measure of relevance is thus proposed for mining tasks with an objective of finding patterns or regularities about certain attributes. An efficient algorithm for identifying relevant databases is described. Experiments are conducted to verify the measure's performance and to exemplify its application.

Original languageEnglish (US)
Pages (from-to)541-553
Number of pages13
JournalIEEE Transactions on Knowledge and Data Engineering
Volume13
Issue number4
DOIs
StatePublished - Jul 1 2001

Keywords

  • Data mining
  • Multiple databases
  • Query
  • Relevance measure

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Toward multidatabase mining: Identifying relevant databases'. Together they form a unique fingerprint.

  • Cite this