Multi-hierarchy documents clustering based on LSA space dimensionality character

Yunfeng Liu, Huan Qi, Xiang'en Hu, Zhiqiang Cai, Jianmin Dai

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

The statistical characteristics of dimensionality in latent semantic analysis (LSA) space were studied to realize automatic document clustering under different concept levels. It is concluded that dimensionalities corresponding bigger singular values describe commonness among semantic elements, while dimensionalities corresponding smaller ones describe discrepancy. There exists some latent relation between dimensionalities in LSA Space and concept granularities in natural languages. Different dimensionalities of LSA Space are adopted for document clustering under certain concept granularity. Experimental results are in good agreement with the above idea. In addition, in the LSA-based algorithm of document clustering, better clustering precisions are obtained by taking the row vectors of document self-indexing matrix as the objects to be clustered, instead of document vectors with low dimensions.

Original languageEnglish (US)
Pages (from-to)1783-1786
Number of pages4
JournalQinghua Daxue Xuebao/Journal of Tsinghua University
Volume45
Issue numberSUPPL.
StatePublished - Sep 1 2005
Externally publishedYes

Fingerprint

Latent Semantic Analysis
Document Clustering
Dimensionality
Semantics
Granularity
Row vector
Singular Values
Indexing
Natural Language
Discrepancy
Hierarchy
Character
Clustering
Experimental Results
Concepts

Keywords

  • Concept granularity
  • Document clustering
  • Document self-indexing matrix
  • Information processing
  • Latent semantic analysis

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science Applications
  • Applied Mathematics

Cite this

Multi-hierarchy documents clustering based on LSA space dimensionality character. / Liu, Yunfeng; Qi, Huan; Hu, Xiang'en; Cai, Zhiqiang; Dai, Jianmin.

In: Qinghua Daxue Xuebao/Journal of Tsinghua University, Vol. 45, No. SUPPL., 01.09.2005, p. 1783-1786.

Research output: Contribution to journalArticle

Liu, Yunfeng ; Qi, Huan ; Hu, Xiang'en ; Cai, Zhiqiang ; Dai, Jianmin. / Multi-hierarchy documents clustering based on LSA space dimensionality character. In: Qinghua Daxue Xuebao/Journal of Tsinghua University. 2005 ; Vol. 45, No. SUPPL. pp. 1783-1786.
@article{b029ec050d4a4c59b764065fa21f3ce4,
title = "Multi-hierarchy documents clustering based on LSA space dimensionality character",
abstract = "The statistical characteristics of dimensionality in latent semantic analysis (LSA) space were studied to realize automatic document clustering under different concept levels. It is concluded that dimensionalities corresponding bigger singular values describe commonness among semantic elements, while dimensionalities corresponding smaller ones describe discrepancy. There exists some latent relation between dimensionalities in LSA Space and concept granularities in natural languages. Different dimensionalities of LSA Space are adopted for document clustering under certain concept granularity. Experimental results are in good agreement with the above idea. In addition, in the LSA-based algorithm of document clustering, better clustering precisions are obtained by taking the row vectors of document self-indexing matrix as the objects to be clustered, instead of document vectors with low dimensions.",
keywords = "Concept granularity, Document clustering, Document self-indexing matrix, Information processing, Latent semantic analysis",
author = "Yunfeng Liu and Huan Qi and Xiang'en Hu and Zhiqiang Cai and Jianmin Dai",
year = "2005",
month = "9",
day = "1",
language = "English (US)",
volume = "45",
pages = "1783--1786",
journal = "Qinghua Daxue Xuebao/Journal of Tsinghua University",
issn = "1000-0054",
publisher = "Press of Tsinghua University",
number = "SUPPL.",

}

TY - JOUR

T1 - Multi-hierarchy documents clustering based on LSA space dimensionality character

AU - Liu, Yunfeng

AU - Qi, Huan

AU - Hu, Xiang'en

AU - Cai, Zhiqiang

AU - Dai, Jianmin

PY - 2005/9/1

Y1 - 2005/9/1

N2 - The statistical characteristics of dimensionality in latent semantic analysis (LSA) space were studied to realize automatic document clustering under different concept levels. It is concluded that dimensionalities corresponding bigger singular values describe commonness among semantic elements, while dimensionalities corresponding smaller ones describe discrepancy. There exists some latent relation between dimensionalities in LSA Space and concept granularities in natural languages. Different dimensionalities of LSA Space are adopted for document clustering under certain concept granularity. Experimental results are in good agreement with the above idea. In addition, in the LSA-based algorithm of document clustering, better clustering precisions are obtained by taking the row vectors of document self-indexing matrix as the objects to be clustered, instead of document vectors with low dimensions.

AB - The statistical characteristics of dimensionality in latent semantic analysis (LSA) space were studied to realize automatic document clustering under different concept levels. It is concluded that dimensionalities corresponding bigger singular values describe commonness among semantic elements, while dimensionalities corresponding smaller ones describe discrepancy. There exists some latent relation between dimensionalities in LSA Space and concept granularities in natural languages. Different dimensionalities of LSA Space are adopted for document clustering under certain concept granularity. Experimental results are in good agreement with the above idea. In addition, in the LSA-based algorithm of document clustering, better clustering precisions are obtained by taking the row vectors of document self-indexing matrix as the objects to be clustered, instead of document vectors with low dimensions.

KW - Concept granularity

KW - Document clustering

KW - Document self-indexing matrix

KW - Information processing

KW - Latent semantic analysis

UR - http://www.scopus.com/inward/record.url?scp=33644836793&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33644836793&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:33644836793

VL - 45

SP - 1783

EP - 1786

JO - Qinghua Daxue Xuebao/Journal of Tsinghua University

JF - Qinghua Daxue Xuebao/Journal of Tsinghua University

SN - 1000-0054

IS - SUPPL.

ER -