Propagation-vectors for trees (PVT): Concise yet effective summaries for hierarchical data and trees

Venkata S. Cherukuri, Kasim Candan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Summarization of hierarchical data and metadata is a fundamental operation in applications in many domains. In particular, similarity search of hierarchical data, such as XML, would benefit greatly fromconcise and indexable summaries. This is especially true in P2P scenarios, where the search needs to be done in a distributed fashion on multiple peers. This situation requires summaries which are small, yet effective in identifying potential peers that need to be further explored. In this paper, we propose a method, called propagation-vectors for trees (PVT) which constructs very concise and accurate summaries of hierarchical data, such as XML trees. We then show how to use this summary to perform similarity search on summarized data. The proposed summarization scheme relies on a label-propagation mechanism, which constructs an n-dimensional vector from a given tree with n unique data labels. Experimental results have shown that the constructed PVT summaries capture the structure of the input trees very accurately, the representations are highly concise, and that the search based on these summaries are faster than the existing approaches.

Original languageEnglish (US)
Title of host publicationProceedings of the 2008 ACM Workshop on Large-Scale Distributed Systems for Information Retrieval, LSDS-IR'08, Co-located with the 17th ACM Conference on Information and Knowledge Management, CIKM'08
Pages3-10
Number of pages8
DOIs
StatePublished - 2008
Event2008 ACM Workshop on Large-Scale Distributed Systems for Information Retrieval, LSDS-IR'08, Co-located with the 17th ACM Conference on Information and Knowledge Management, CIKM'08 - Napa Valley, CA, United States
Duration: Oct 26 2008Oct 30 2008

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other2008 ACM Workshop on Large-Scale Distributed Systems for Information Retrieval, LSDS-IR'08, Co-located with the 17th ACM Conference on Information and Knowledge Management, CIKM'08
Country/TerritoryUnited States
CityNapa Valley, CA
Period10/26/0810/30/08

ASJC Scopus subject areas

  • General Decision Sciences
  • General Business, Management and Accounting

Fingerprint

Dive into the research topics of 'Propagation-vectors for trees (PVT): Concise yet effective summaries for hierarchical data and trees'. Together they form a unique fingerprint.

Cite this