PhC: Multiresolution visualization and exploration of text corpora with parallel hierarchical coordinates

Kasim Candan, Luigi Di Caro, Maria Luisa Sapino

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

The high-dimensional nature of the textual data complicates the design of visualization tools to support exploration of large document corpora. In this article, we first argue that the Parallel Coordinates (PC) technique, which can map multidimensional vectors onto a 2D space in such a way that elements with similar values are represented as similar poly-lines or curves in the visualization space, can be used to help users discern patterns in document collections. The inherent reduction in dimensionality during the mapping from multidimensional points to 2D lines, however, may result in visual complications. For instance, the lines that correspond to clusters of objects that are separate in the multidimensional space may overlap each other in the 2D space; the resulting increase in the number of crossings would make it hard to distinguish the individual document clusters. Such crossings of lines and overly dense regions are significant sources of visual clutter, thus avoiding them may help interpret the visualization. In this article, we note that visual clutter can be significantly reduced by adjusting the resolution of the individual term coordinates by clustering the corresponding values. Such reductions in the resolution of the individual term-coordinates, however, will lead to a certain degree of information loss and thus the appropriate resolution for the term-coordinates has to be selected carefully. Thus, in this article we propose a controlled clutter reduction approach, called Parallel hierarchical Coordinates (or PhC), for reducing the visual clutter in PC-based visualizations of text corpora. We define visual clutter and information loss measures and provide extensive evaluations that show that the proposed PhC provides significant visual gains (i.e., multiple orders of reductions in visual clutter) with small information loss during visualization and exploration of document collections.

Original languageEnglish (US)
Article number22
JournalACM Transactions on Intelligent Systems and Technology
Volume3
Issue number2
DOIs
StatePublished - 2012

Keywords

  • Clutter reduction
  • Document set visualization
  • Parallel coordinates

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'PhC: Multiresolution visualization and exploration of text corpora with parallel hierarchical coordinates'. Together they form a unique fingerprint.

Cite this