TY - GEN
T1 - CUTS
T2 - Seventeenth ACM Conference on Hypertext and Hypermedia, HT'06
AU - Yan, Qi
AU - Candan, Kasim
PY - 2006
Y1 - 2006
N2 - Weblogs (blogs) are becoming prominent forms of informa,tion exchange in the Internet. A large number and variety of blogs, like personal journals or commentaries, are available for general consumption. However, effective indexes and navigation structures (like the table of content in a book) are not available for blogs. Therefore, it is generally not possible to navigate among entries in a, given collection of blog entries in an informed manner. This paper focuses on the segmentation of entries in filter-type [9] blogs, with the aim of using this information for developing hypertext and navigational helps. In particular, we are interested in the analysis of topic development patterns that can provide information about not only the entries themselves, but how these entries develop and relate to each other. The proposed algorithm, CUTS, maps entries into a curve in a way that makes apparent a, variety of topic development patterns. We then use curve analysis for automatic segmentation of topics. The resulting base topic segments are classified into different topic development patterns that can be visualized and indexed. Experimental results show that the proposed technique has very good performance in identifying boundaries in text streams, especially filter style blogs, versus existing schemes. Furthermore, compared with other topic segmentation methods, the proposed mechanism highlights not only topic boundaries, but also topic development patterns.
AB - Weblogs (blogs) are becoming prominent forms of informa,tion exchange in the Internet. A large number and variety of blogs, like personal journals or commentaries, are available for general consumption. However, effective indexes and navigation structures (like the table of content in a book) are not available for blogs. Therefore, it is generally not possible to navigate among entries in a, given collection of blog entries in an informed manner. This paper focuses on the segmentation of entries in filter-type [9] blogs, with the aim of using this information for developing hypertext and navigational helps. In particular, we are interested in the analysis of topic development patterns that can provide information about not only the entries themselves, but how these entries develop and relate to each other. The proposed algorithm, CUTS, maps entries into a curve in a way that makes apparent a, variety of topic development patterns. We then use curve analysis for automatic segmentation of topics. The resulting base topic segments are classified into different topic development patterns that can be visualized and indexed. Experimental results show that the proposed technique has very good performance in identifying boundaries in text streams, especially filter style blogs, versus existing schemes. Furthermore, compared with other topic segmentation methods, the proposed mechanism highlights not only topic boundaries, but also topic development patterns.
KW - Curve segmentation
KW - Topic development patterns
KW - Topic segmentation
KW - Weblogs
UR - http://www.scopus.com/inward/record.url?scp=34247387632&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34247387632&partnerID=8YFLogxK
U2 - 10.1145/1149941.1149944
DO - 10.1145/1149941.1149944
M3 - Conference contribution
AN - SCOPUS:34247387632
SN - 1595934170
SN - 9781595934178
T3 - Proceedings of the Seventeenth ACM Conference on Hypertext and Hypermedia, HT'06
SP - 1
EP - 10
BT - Proceedings of the Seventeenth ACM Conference on Hypertext and Hypermedia, HT'06
Y2 - 22 August 2006 through 25 August 2006
ER -