CP/CV: Concept similarity mining without frequency information from domain describing taxonomies

Jong Wook Kim, Kasim Candan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

41 Scopus citations

Abstract

Domain specific ontologies are heavily used in many applications. For instance, these form the bases on which similarity/dissimilarity between keywords are extracted for various knowledge discovery and retrieval tasks. Existing similarity computation schemes can be categorized as (a) structure- or (b) information-based approaches. Structure based approaches compute dissimilarity between keywords using a (weighted) count of edges between two keywords. Information-base approaches, on the other hand, leverage available corpora to extract additional information, such as keyword frequency, to achieve better performance in similarity computation than structure-based approaches. Unfortunately, in many application domains (such as applications that rely on unique-keys in a relational database), frequency information required by information-based approaches does not exist. In this paper, we note that there is a third way of computing similarity: if each node in a given hierarchy can be represented as a vector of related concepts, these vectors could be compared to compute similarities. This requires mapping concept-nodes in a given hierarchy onto a concept space. In this paper, we propose a concept propagation (CP) scheme, which relies on the semantical relationships between concepts implied by the structure of the hierarchy to annotate each concept-node with a concept-vector (CV). We refer to this approach as CP/CV. Comparison of keyword similarity results shows that CP/CV provides significantly better (upto 33%) results than existing structure-based schemes. Also, even if CP/CV does not assume the availability of an appropriate corpus to extract keyword frequency information, our approach matches (and slightly improves on) the performance of information-based approaches.

Original languageEnglish (US)
Title of host publicationProceedings of the 15th ACM Conference on Information and Knowledge Management, CIKM 2006
Pages483-492
Number of pages10
DOIs
StatePublished - Dec 1 2006
Event15th ACM Conference on Information and Knowledge Management, CIKM 2006 - Arlington, VA, United States
Duration: Nov 6 2006Nov 11 2006

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other15th ACM Conference on Information and Knowledge Management, CIKM 2006
Country/TerritoryUnited States
CityArlington, VA
Period11/6/0611/11/06

Keywords

  • Concept hierarchies
  • Concept propagation
  • Mining keyword similarities

ASJC Scopus subject areas

  • General Decision Sciences
  • General Business, Management and Accounting

Fingerprint

Dive into the research topics of 'CP/CV: Concept similarity mining without frequency information from domain describing taxonomies'. Together they form a unique fingerprint.

Cite this