CUTS: CUrvature-based development pattern analysis and segmentation for blogs and other text streams

Qi Yan, Kasim Candan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

Weblogs (blogs) are becoming prominent forms of informa,tion exchange in the Internet. A large number and variety of blogs, like personal journals or commentaries, are available for general consumption. However, effective indexes and navigation structures (like the table of content in a book) are not available for blogs. Therefore, it is generally not possible to navigate among entries in a, given collection of blog entries in an informed manner. This paper focuses on the segmentation of entries in filter-type [9] blogs, with the aim of using this information for developing hypertext and navigational helps. In particular, we are interested in the analysis of topic development patterns that can provide information about not only the entries themselves, but how these entries develop and relate to each other. The proposed algorithm, CUTS, maps entries into a curve in a way that makes apparent a, variety of topic development patterns. We then use curve analysis for automatic segmentation of topics. The resulting base topic segments are classified into different topic development patterns that can be visualized and indexed. Experimental results show that the proposed technique has very good performance in identifying boundaries in text streams, especially filter style blogs, versus existing schemes. Furthermore, compared with other topic segmentation methods, the proposed mechanism highlights not only topic boundaries, but also topic development patterns.

Original languageEnglish (US)
Title of host publicationProceedings of the Seventeenth ACM Conference on Hypertext and Hypermedia, HT'06
Pages1-10
Number of pages10
DOIs
StatePublished - 2006
EventSeventeenth ACM Conference on Hypertext and Hypermedia, HT'06 - Odense, Denmark
Duration: Aug 22 2006Aug 25 2006

Publication series

NameProceedings of the Seventeenth ACM Conference on Hypertext and Hypermedia, HT'06
Volume2006

Other

OtherSeventeenth ACM Conference on Hypertext and Hypermedia, HT'06
Country/TerritoryDenmark
CityOdense
Period8/22/068/25/06

Keywords

  • Curve segmentation
  • Topic development patterns
  • Topic segmentation
  • Weblogs

ASJC Scopus subject areas

  • Computer Science Applications
  • Media Technology
  • Software

Fingerprint

Dive into the research topics of 'CUTS: CUrvature-based development pattern analysis and segmentation for blogs and other text streams'. Together they form a unique fingerprint.

Cite this