Time-dependent event hierarchy construction

Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Huan Liu, Philip S. Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

54 Citations (Scopus)

Abstract

In this paper, an algorithm called Time Driven Documents-partition (TDD) is proposed to construct an event hierarchy in a text corpus based on a given query. Specifically, assume that a query contains only one feature - Election. Election is directly related to the events such as 2006 US Midterm Elections Campaign, 2004 US Presidential Election Campaign and 2004 Taiwan Presidential Election Campaign, where these events may further be divided into several smaller events (e.g. the 2006 US Midterm Elections Campaign can be broken down into events such as campaign for vote, election results and the resignation of Donald H. Rumsfeld). As such, an event hierarchy is resulted. Our proposed algorithm, TDD, tackles the problem by three major steps: (1)Identify the features that are related to the query according to both the timestamps and the contents of the documents. The features identified are regarded as bursty features; (2) Extract the documents that are highly related to the bursty features based on time; (3) Partition the extracted documents to form events and organize them in a hierarchicalstructure. To the best of our knowledge, there is little works targeting for constructing a feature-based event hierarchy for a text corpus. Practically, event hierarchies can assist us to efficiently locate our target information in a text corpus easily. Again, assume that Election is used for a query. Without an event hierarchy, it is very difficult to identify what are the major events related to it, when do these events happened, as well as the features and the news articles that are related to each of these events. We have archived two-year news articles to evaluate the feasibility of TDD. The encouraging results indicated that TDD is practically sound and highly effective.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages300-309
Number of pages10
DOIs
StatePublished - 2007
EventKDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - San Jose, CA, United States
Duration: Aug 12 2007Aug 15 2007

Other

OtherKDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CitySan Jose, CA
Period8/12/078/15/07

Fingerprint

Acoustic waves

Keywords

  • Clustering
  • Events
  • Hierarchies
  • Presentation
  • Retrieval
  • Text
  • Time

ASJC Scopus subject areas

  • Information Systems

Cite this

Fung, G. P. C., Yu, J. X., Liu, H., & Yu, P. S. (2007). Time-dependent event hierarchy construction. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 300-309) https://doi.org/10.1145/1281192.1281227

Time-dependent event hierarchy construction. / Fung, Gabriel Pui Cheong; Yu, Jeffrey Xu; Liu, Huan; Yu, Philip S.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007. p. 300-309.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fung, GPC, Yu, JX, Liu, H & Yu, PS 2007, Time-dependent event hierarchy construction. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 300-309, KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, United States, 8/12/07. https://doi.org/10.1145/1281192.1281227
Fung GPC, Yu JX, Liu H, Yu PS. Time-dependent event hierarchy construction. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007. p. 300-309 https://doi.org/10.1145/1281192.1281227
Fung, Gabriel Pui Cheong ; Yu, Jeffrey Xu ; Liu, Huan ; Yu, Philip S. / Time-dependent event hierarchy construction. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007. pp. 300-309
@inproceedings{8fa7910ce0194049ae41e64233fc285f,
title = "Time-dependent event hierarchy construction",
abstract = "In this paper, an algorithm called Time Driven Documents-partition (TDD) is proposed to construct an event hierarchy in a text corpus based on a given query. Specifically, assume that a query contains only one feature - Election. Election is directly related to the events such as 2006 US Midterm Elections Campaign, 2004 US Presidential Election Campaign and 2004 Taiwan Presidential Election Campaign, where these events may further be divided into several smaller events (e.g. the 2006 US Midterm Elections Campaign can be broken down into events such as campaign for vote, election results and the resignation of Donald H. Rumsfeld). As such, an event hierarchy is resulted. Our proposed algorithm, TDD, tackles the problem by three major steps: (1)Identify the features that are related to the query according to both the timestamps and the contents of the documents. The features identified are regarded as bursty features; (2) Extract the documents that are highly related to the bursty features based on time; (3) Partition the extracted documents to form events and organize them in a hierarchicalstructure. To the best of our knowledge, there is little works targeting for constructing a feature-based event hierarchy for a text corpus. Practically, event hierarchies can assist us to efficiently locate our target information in a text corpus easily. Again, assume that Election is used for a query. Without an event hierarchy, it is very difficult to identify what are the major events related to it, when do these events happened, as well as the features and the news articles that are related to each of these events. We have archived two-year news articles to evaluate the feasibility of TDD. The encouraging results indicated that TDD is practically sound and highly effective.",
keywords = "Clustering, Events, Hierarchies, Presentation, Retrieval, Text, Time",
author = "Fung, {Gabriel Pui Cheong} and Yu, {Jeffrey Xu} and Huan Liu and Yu, {Philip S.}",
year = "2007",
doi = "10.1145/1281192.1281227",
language = "English (US)",
isbn = "1595936092",
pages = "300--309",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Time-dependent event hierarchy construction

AU - Fung, Gabriel Pui Cheong

AU - Yu, Jeffrey Xu

AU - Liu, Huan

AU - Yu, Philip S.

PY - 2007

Y1 - 2007

N2 - In this paper, an algorithm called Time Driven Documents-partition (TDD) is proposed to construct an event hierarchy in a text corpus based on a given query. Specifically, assume that a query contains only one feature - Election. Election is directly related to the events such as 2006 US Midterm Elections Campaign, 2004 US Presidential Election Campaign and 2004 Taiwan Presidential Election Campaign, where these events may further be divided into several smaller events (e.g. the 2006 US Midterm Elections Campaign can be broken down into events such as campaign for vote, election results and the resignation of Donald H. Rumsfeld). As such, an event hierarchy is resulted. Our proposed algorithm, TDD, tackles the problem by three major steps: (1)Identify the features that are related to the query according to both the timestamps and the contents of the documents. The features identified are regarded as bursty features; (2) Extract the documents that are highly related to the bursty features based on time; (3) Partition the extracted documents to form events and organize them in a hierarchicalstructure. To the best of our knowledge, there is little works targeting for constructing a feature-based event hierarchy for a text corpus. Practically, event hierarchies can assist us to efficiently locate our target information in a text corpus easily. Again, assume that Election is used for a query. Without an event hierarchy, it is very difficult to identify what are the major events related to it, when do these events happened, as well as the features and the news articles that are related to each of these events. We have archived two-year news articles to evaluate the feasibility of TDD. The encouraging results indicated that TDD is practically sound and highly effective.

AB - In this paper, an algorithm called Time Driven Documents-partition (TDD) is proposed to construct an event hierarchy in a text corpus based on a given query. Specifically, assume that a query contains only one feature - Election. Election is directly related to the events such as 2006 US Midterm Elections Campaign, 2004 US Presidential Election Campaign and 2004 Taiwan Presidential Election Campaign, where these events may further be divided into several smaller events (e.g. the 2006 US Midterm Elections Campaign can be broken down into events such as campaign for vote, election results and the resignation of Donald H. Rumsfeld). As such, an event hierarchy is resulted. Our proposed algorithm, TDD, tackles the problem by three major steps: (1)Identify the features that are related to the query according to both the timestamps and the contents of the documents. The features identified are regarded as bursty features; (2) Extract the documents that are highly related to the bursty features based on time; (3) Partition the extracted documents to form events and organize them in a hierarchicalstructure. To the best of our knowledge, there is little works targeting for constructing a feature-based event hierarchy for a text corpus. Practically, event hierarchies can assist us to efficiently locate our target information in a text corpus easily. Again, assume that Election is used for a query. Without an event hierarchy, it is very difficult to identify what are the major events related to it, when do these events happened, as well as the features and the news articles that are related to each of these events. We have archived two-year news articles to evaluate the feasibility of TDD. The encouraging results indicated that TDD is practically sound and highly effective.

KW - Clustering

KW - Events

KW - Hierarchies

KW - Presentation

KW - Retrieval

KW - Text

KW - Time

UR - http://www.scopus.com/inward/record.url?scp=36849000332&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36849000332&partnerID=8YFLogxK

U2 - 10.1145/1281192.1281227

DO - 10.1145/1281192.1281227

M3 - Conference contribution

AN - SCOPUS:36849000332

SN - 1595936092

SN - 9781595936097

SP - 300

EP - 309

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ER -