Multi-hierarchy documents clustering based on LSA space dimensionality character

Yunfeng Liu, Huan Qi, Xiang'en Hu, Zhiqiang Cai, Jianmin Dai

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

The statistical characteristics of dimensionality in latent semantic analysis (LSA) space were studied to realize automatic document clustering under different concept levels. It is concluded that dimensionalities corresponding bigger singular values describe commonness among semantic elements, while dimensionalities corresponding smaller ones describe discrepancy. There exists some latent relation between dimensionalities in LSA Space and concept granularities in natural languages. Different dimensionalities of LSA Space are adopted for document clustering under certain concept granularity. Experimental results are in good agreement with the above idea. In addition, in the LSA-based algorithm of document clustering, better clustering precisions are obtained by taking the row vectors of document self-indexing matrix as the objects to be clustered, instead of document vectors with low dimensions.

Original languageEnglish (US)
Pages (from-to)1783-1786
Number of pages4
JournalQinghua Daxue Xuebao/Journal of Tsinghua University
Volume45
Issue numberSUPPL.
StatePublished - Sep 1 2005
Externally publishedYes

Keywords

  • Concept granularity
  • Document clustering
  • Document self-indexing matrix
  • Information processing
  • Latent semantic analysis

ASJC Scopus subject areas

  • General Engineering
  • Computer Science Applications
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Multi-hierarchy documents clustering based on LSA space dimensionality character'. Together they form a unique fingerprint.

Cite this