Identifying relevant databases for multidatabase mining

Huan Liu, Hongjun Lu, Jun Yao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

35 Citations (Scopus)

Abstract

Various tools and systems for knowledge discovery and data mining are developed and available for applications. However, when we are immersed in heaps of databases, an immediate question facing practitioners is where we should start mining. In this paper, breaking away from the conventional data mining assumption that many databases be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most likely relevant to an application; without doing so, the mining process can be lengthy, aimless and ineffective. A relevance measure is thus proposed to identify relevant databases for mining tasks with an objective to find patterns or regularities about certain attributes. An efficient implementation for identifying relevant databases is described. Experiments are conducted to validate the measure’s performance and to show its promising applications.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages210-221
Number of pages12
Volume1394
ISBN (Print)3540643834, 9783540643838
DOIs
StatePublished - 1998
Externally publishedYes
Event2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 1998 - Melbourne, Australia
Duration: Apr 15 1998Apr 17 1998

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1394
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 1998
CountryAustralia
CityMelbourne
Period4/15/984/17/98

Fingerprint

Mining
Data mining
Data Mining
Process Mining
Heap
Knowledge Discovery
Efficient Implementation
Performance Measures
Regularity
Likely
Attribute
Experiment
Experiments

Keywords

  • Data mining
  • Multiple databases
  • Query
  • Relevance measure

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Liu, H., Lu, H., & Yao, J. (1998). Identifying relevant databases for multidatabase mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1394, pp. 210-221). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1394). Springer Verlag. https://doi.org/10.1007/3-540-64383-4_18

Identifying relevant databases for multidatabase mining. / Liu, Huan; Lu, Hongjun; Yao, Jun.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1394 Springer Verlag, 1998. p. 210-221 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1394).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liu, H, Lu, H & Yao, J 1998, Identifying relevant databases for multidatabase mining. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 1394, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1394, Springer Verlag, pp. 210-221, 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 1998, Melbourne, Australia, 4/15/98. https://doi.org/10.1007/3-540-64383-4_18
Liu H, Lu H, Yao J. Identifying relevant databases for multidatabase mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1394. Springer Verlag. 1998. p. 210-221. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/3-540-64383-4_18
Liu, Huan ; Lu, Hongjun ; Yao, Jun. / Identifying relevant databases for multidatabase mining. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1394 Springer Verlag, 1998. pp. 210-221 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{8bfbb1af325043798e6f80475a57fc85,
title = "Identifying relevant databases for multidatabase mining",
abstract = "Various tools and systems for knowledge discovery and data mining are developed and available for applications. However, when we are immersed in heaps of databases, an immediate question facing practitioners is where we should start mining. In this paper, breaking away from the conventional data mining assumption that many databases be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most likely relevant to an application; without doing so, the mining process can be lengthy, aimless and ineffective. A relevance measure is thus proposed to identify relevant databases for mining tasks with an objective to find patterns or regularities about certain attributes. An efficient implementation for identifying relevant databases is described. Experiments are conducted to validate the measure’s performance and to show its promising applications.",
keywords = "Data mining, Multiple databases, Query, Relevance measure",
author = "Huan Liu and Hongjun Lu and Jun Yao",
year = "1998",
doi = "10.1007/3-540-64383-4_18",
language = "English (US)",
isbn = "3540643834",
volume = "1394",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "210--221",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Identifying relevant databases for multidatabase mining

AU - Liu, Huan

AU - Lu, Hongjun

AU - Yao, Jun

PY - 1998

Y1 - 1998

N2 - Various tools and systems for knowledge discovery and data mining are developed and available for applications. However, when we are immersed in heaps of databases, an immediate question facing practitioners is where we should start mining. In this paper, breaking away from the conventional data mining assumption that many databases be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most likely relevant to an application; without doing so, the mining process can be lengthy, aimless and ineffective. A relevance measure is thus proposed to identify relevant databases for mining tasks with an objective to find patterns or regularities about certain attributes. An efficient implementation for identifying relevant databases is described. Experiments are conducted to validate the measure’s performance and to show its promising applications.

AB - Various tools and systems for knowledge discovery and data mining are developed and available for applications. However, when we are immersed in heaps of databases, an immediate question facing practitioners is where we should start mining. In this paper, breaking away from the conventional data mining assumption that many databases be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most likely relevant to an application; without doing so, the mining process can be lengthy, aimless and ineffective. A relevance measure is thus proposed to identify relevant databases for mining tasks with an objective to find patterns or regularities about certain attributes. An efficient implementation for identifying relevant databases is described. Experiments are conducted to validate the measure’s performance and to show its promising applications.

KW - Data mining

KW - Multiple databases

KW - Query

KW - Relevance measure

UR - http://www.scopus.com/inward/record.url?scp=84958983792&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84958983792&partnerID=8YFLogxK

U2 - 10.1007/3-540-64383-4_18

DO - 10.1007/3-540-64383-4_18

M3 - Conference contribution

AN - SCOPUS:84958983792

SN - 3540643834

SN - 9783540643838

VL - 1394

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 210

EP - 221

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -