Similarity detection among data files - a machine learning approach

M. Dash, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

In any database, description files are essential to understand the data files in it. However, it is not uncommon that one is left with data files without any description file. An example is the aftermath of a system crash; other examples are related to security problems. Manual determination of the subject of a data file can be a difficult and tedious task particularly if files are look-alike. An example is a big survey database where data files that look alike are actually related to different subjects. Two data files on the same subject will probably have similar semantic structures of attributes. We detect the similarity between two attributes. Then we create clusters of attributes to compare the similarity of the subjects of two data files. And finally a machine learning technique is used to predict the subject of unseen data files.

Original languageEnglish (US)
Title of host publicationProceedings of the IEEE Knowledge & Data Engineering Exchange Workshop, KDEX
Editors Anon
Place of PublicationPiscataway, NJ, United States
PublisherIEEE
Pages172-179
Number of pages8
StatePublished - 1997
Externally publishedYes
EventProceedings of the 1997 IEEE Knowledge & Data Engineering Exchange Workshop, KDEX - Newport Beach, CA, USA
Duration: Nov 4 1997Nov 4 1997

Other

OtherProceedings of the 1997 IEEE Knowledge & Data Engineering Exchange Workshop, KDEX
CityNewport Beach, CA, USA
Period11/4/9711/4/97

Fingerprint

Learning systems
Semantics

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Dash, M., & Liu, H. (1997). Similarity detection among data files - a machine learning approach. In Anon (Ed.), Proceedings of the IEEE Knowledge & Data Engineering Exchange Workshop, KDEX (pp. 172-179). Piscataway, NJ, United States: IEEE.

Similarity detection among data files - a machine learning approach. / Dash, M.; Liu, Huan.

Proceedings of the IEEE Knowledge & Data Engineering Exchange Workshop, KDEX. ed. / Anon. Piscataway, NJ, United States : IEEE, 1997. p. 172-179.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dash, M & Liu, H 1997, Similarity detection among data files - a machine learning approach. in Anon (ed.), Proceedings of the IEEE Knowledge & Data Engineering Exchange Workshop, KDEX. IEEE, Piscataway, NJ, United States, pp. 172-179, Proceedings of the 1997 IEEE Knowledge & Data Engineering Exchange Workshop, KDEX, Newport Beach, CA, USA, 11/4/97.
Dash M, Liu H. Similarity detection among data files - a machine learning approach. In Anon, editor, Proceedings of the IEEE Knowledge & Data Engineering Exchange Workshop, KDEX. Piscataway, NJ, United States: IEEE. 1997. p. 172-179
Dash, M. ; Liu, Huan. / Similarity detection among data files - a machine learning approach. Proceedings of the IEEE Knowledge & Data Engineering Exchange Workshop, KDEX. editor / Anon. Piscataway, NJ, United States : IEEE, 1997. pp. 172-179
@inproceedings{8a649982f80f4849b38fd71386b79017,
title = "Similarity detection among data files - a machine learning approach",
abstract = "In any database, description files are essential to understand the data files in it. However, it is not uncommon that one is left with data files without any description file. An example is the aftermath of a system crash; other examples are related to security problems. Manual determination of the subject of a data file can be a difficult and tedious task particularly if files are look-alike. An example is a big survey database where data files that look alike are actually related to different subjects. Two data files on the same subject will probably have similar semantic structures of attributes. We detect the similarity between two attributes. Then we create clusters of attributes to compare the similarity of the subjects of two data files. And finally a machine learning technique is used to predict the subject of unseen data files.",
author = "M. Dash and Huan Liu",
year = "1997",
language = "English (US)",
pages = "172--179",
editor = "Anon",
booktitle = "Proceedings of the IEEE Knowledge & Data Engineering Exchange Workshop, KDEX",
publisher = "IEEE",

}

TY - GEN

T1 - Similarity detection among data files - a machine learning approach

AU - Dash, M.

AU - Liu, Huan

PY - 1997

Y1 - 1997

N2 - In any database, description files are essential to understand the data files in it. However, it is not uncommon that one is left with data files without any description file. An example is the aftermath of a system crash; other examples are related to security problems. Manual determination of the subject of a data file can be a difficult and tedious task particularly if files are look-alike. An example is a big survey database where data files that look alike are actually related to different subjects. Two data files on the same subject will probably have similar semantic structures of attributes. We detect the similarity between two attributes. Then we create clusters of attributes to compare the similarity of the subjects of two data files. And finally a machine learning technique is used to predict the subject of unseen data files.

AB - In any database, description files are essential to understand the data files in it. However, it is not uncommon that one is left with data files without any description file. An example is the aftermath of a system crash; other examples are related to security problems. Manual determination of the subject of a data file can be a difficult and tedious task particularly if files are look-alike. An example is a big survey database where data files that look alike are actually related to different subjects. Two data files on the same subject will probably have similar semantic structures of attributes. We detect the similarity between two attributes. Then we create clusters of attributes to compare the similarity of the subjects of two data files. And finally a machine learning technique is used to predict the subject of unseen data files.

UR - http://www.scopus.com/inward/record.url?scp=0031342641&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031342641&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0031342641

SP - 172

EP - 179

BT - Proceedings of the IEEE Knowledge & Data Engineering Exchange Workshop, KDEX

A2 - Anon, null

PB - IEEE

CY - Piscataway, NJ, United States

ER -