Abstract

The high-dimensional nature of the textual data complicates the design of visualization tools to support exploration of large document corpora. In this article, we first argue that the Parallel Coordinates (PC) technique, which can map multidimensional vectors onto a 2D space in such a way that elements with similar values are represented as similar poly-lines or curves in the visualization space, can be used to help users discern patterns in document collections. The inherent reduction in dimensionality during the mapping from multidimensional points to 2D lines, however, may result in visual complications. For instance, the lines that correspond to clusters of objects that are separate in the multidimensional space may overlap each other in the 2D space; the resulting increase in the number of crossings would make it hard to distinguish the individual document clusters. Such crossings of lines and overly dense regions are significant sources of visual clutter, thus avoiding them may help interpret the visualization. In this article, we note that visual clutter can be significantly reduced by adjusting the resolution of the individual term coordinates by clustering the corresponding values. Such reductions in the resolution of the individual term-coordinates, however, will lead to a certain degree of information loss and thus the appropriate resolution for the term-coordinates has to be selected carefully. Thus, in this article we propose a controlled clutter reduction approach, called Parallel hierarchical Coordinates (or PhC), for reducing the visual clutter in PC-based visualizations of text corpora. We define visual clutter and information loss measures and provide extensive evaluations that show that the proposed PhC provides significant visual gains (i.e., multiple orders of reductions in visual clutter) with small information loss during visualization and exploration of document collections.

Original languageEnglish (US)
Article number22
JournalACM Transactions on Intelligent Systems and Technology
Volume3
Issue number2
DOIs
StatePublished - 2012

Fingerprint

Multiresolution
Clutter
Visualization
Information Loss
Line
Term
Vision
Corpus
Text
Complications
Dimensionality
Overlap
High-dimensional
Clustering
Curve
Evaluation

Keywords

  • Clutter reduction
  • Document set visualization
  • Parallel coordinates

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Artificial Intelligence

Cite this

PhC : Multiresolution visualization and exploration of text corpora with parallel hierarchical coordinates. / Candan, Kasim; Di Caro, Luigi; Sapino, Maria Luisa.

In: ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 2, 22, 2012.

Research output: Contribution to journalArticle

@article{39f108a7648a4f66bc665eba5da6d9b9,
title = "PhC: Multiresolution visualization and exploration of text corpora with parallel hierarchical coordinates",
abstract = "The high-dimensional nature of the textual data complicates the design of visualization tools to support exploration of large document corpora. In this article, we first argue that the Parallel Coordinates (PC) technique, which can map multidimensional vectors onto a 2D space in such a way that elements with similar values are represented as similar poly-lines or curves in the visualization space, can be used to help users discern patterns in document collections. The inherent reduction in dimensionality during the mapping from multidimensional points to 2D lines, however, may result in visual complications. For instance, the lines that correspond to clusters of objects that are separate in the multidimensional space may overlap each other in the 2D space; the resulting increase in the number of crossings would make it hard to distinguish the individual document clusters. Such crossings of lines and overly dense regions are significant sources of visual clutter, thus avoiding them may help interpret the visualization. In this article, we note that visual clutter can be significantly reduced by adjusting the resolution of the individual term coordinates by clustering the corresponding values. Such reductions in the resolution of the individual term-coordinates, however, will lead to a certain degree of information loss and thus the appropriate resolution for the term-coordinates has to be selected carefully. Thus, in this article we propose a controlled clutter reduction approach, called Parallel hierarchical Coordinates (or PhC), for reducing the visual clutter in PC-based visualizations of text corpora. We define visual clutter and information loss measures and provide extensive evaluations that show that the proposed PhC provides significant visual gains (i.e., multiple orders of reductions in visual clutter) with small information loss during visualization and exploration of document collections.",
keywords = "Clutter reduction, Document set visualization, Parallel coordinates",
author = "Kasim Candan and {Di Caro}, Luigi and Sapino, {Maria Luisa}",
year = "2012",
doi = "10.1145/2089094.2089098",
language = "English (US)",
volume = "3",
journal = "ACM Transactions on Intelligent Systems and Technology",
issn = "2157-6904",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

TY - JOUR

T1 - PhC

T2 - Multiresolution visualization and exploration of text corpora with parallel hierarchical coordinates

AU - Candan, Kasim

AU - Di Caro, Luigi

AU - Sapino, Maria Luisa

PY - 2012

Y1 - 2012

N2 - The high-dimensional nature of the textual data complicates the design of visualization tools to support exploration of large document corpora. In this article, we first argue that the Parallel Coordinates (PC) technique, which can map multidimensional vectors onto a 2D space in such a way that elements with similar values are represented as similar poly-lines or curves in the visualization space, can be used to help users discern patterns in document collections. The inherent reduction in dimensionality during the mapping from multidimensional points to 2D lines, however, may result in visual complications. For instance, the lines that correspond to clusters of objects that are separate in the multidimensional space may overlap each other in the 2D space; the resulting increase in the number of crossings would make it hard to distinguish the individual document clusters. Such crossings of lines and overly dense regions are significant sources of visual clutter, thus avoiding them may help interpret the visualization. In this article, we note that visual clutter can be significantly reduced by adjusting the resolution of the individual term coordinates by clustering the corresponding values. Such reductions in the resolution of the individual term-coordinates, however, will lead to a certain degree of information loss and thus the appropriate resolution for the term-coordinates has to be selected carefully. Thus, in this article we propose a controlled clutter reduction approach, called Parallel hierarchical Coordinates (or PhC), for reducing the visual clutter in PC-based visualizations of text corpora. We define visual clutter and information loss measures and provide extensive evaluations that show that the proposed PhC provides significant visual gains (i.e., multiple orders of reductions in visual clutter) with small information loss during visualization and exploration of document collections.

AB - The high-dimensional nature of the textual data complicates the design of visualization tools to support exploration of large document corpora. In this article, we first argue that the Parallel Coordinates (PC) technique, which can map multidimensional vectors onto a 2D space in such a way that elements with similar values are represented as similar poly-lines or curves in the visualization space, can be used to help users discern patterns in document collections. The inherent reduction in dimensionality during the mapping from multidimensional points to 2D lines, however, may result in visual complications. For instance, the lines that correspond to clusters of objects that are separate in the multidimensional space may overlap each other in the 2D space; the resulting increase in the number of crossings would make it hard to distinguish the individual document clusters. Such crossings of lines and overly dense regions are significant sources of visual clutter, thus avoiding them may help interpret the visualization. In this article, we note that visual clutter can be significantly reduced by adjusting the resolution of the individual term coordinates by clustering the corresponding values. Such reductions in the resolution of the individual term-coordinates, however, will lead to a certain degree of information loss and thus the appropriate resolution for the term-coordinates has to be selected carefully. Thus, in this article we propose a controlled clutter reduction approach, called Parallel hierarchical Coordinates (or PhC), for reducing the visual clutter in PC-based visualizations of text corpora. We define visual clutter and information loss measures and provide extensive evaluations that show that the proposed PhC provides significant visual gains (i.e., multiple orders of reductions in visual clutter) with small information loss during visualization and exploration of document collections.

KW - Clutter reduction

KW - Document set visualization

KW - Parallel coordinates

UR - http://www.scopus.com/inward/record.url?scp=84985034765&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84985034765&partnerID=8YFLogxK

U2 - 10.1145/2089094.2089098

DO - 10.1145/2089094.2089098

M3 - Article

VL - 3

JO - ACM Transactions on Intelligent Systems and Technology

JF - ACM Transactions on Intelligent Systems and Technology

SN - 2157-6904

IS - 2

M1 - 22

ER -