Automatic navbox generation by interpretable clustering over linked entities

Chenhao Xie, Lihan Chen, Jiaqing Liang, Kezun Zhang, Yanghua Xiao, Hanghang Tong, Haixun Wang, Wei Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership and editing efficiency of Wikipedia. In this paper, we target on the automatic generation of Navbox for Wikipedia articles. Instead of performing information extraction over unstructured natural language text directly, an alternative avenue is explored by focusing on a rich set of semi-structured data in Wikipedia articles: linked entities. The core idea of this paper is as follows: If we cluster the linked entities and interpret them appropriately, we can construct a high-quality Navbox for the article entity. We propose a clustering-then-labeling algorithm to realize the idea. Experiments show that the proposed solutions are effective. Ultimately, our approach enriches Wikipedia with 1.95 million new Navboxes of high quality.

Original languageEnglish (US)
Title of host publicationCIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages1857-1865
Number of pages9
VolumePart F131841
ISBN (Electronic)9781450349185
DOIs
StatePublished - Nov 6 2017
Event26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore
Duration: Nov 6 2017Nov 10 2017

Other

Other26th ACM International Conference on Information and Knowledge Management, CIKM 2017
CountrySingapore
CitySingapore
Period11/6/1711/10/17

Fingerprint

Clustering
Wikipedia
Navigation
Editing
Labeling
Semistructured data
Experiment
Information extraction
Language

Keywords

  • Clustering-thenlabeling
  • Interpretable clustering
  • Knowledge extraction
  • Navbox generation

ASJC Scopus subject areas

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Cite this

Xie, C., Chen, L., Liang, J., Zhang, K., Xiao, Y., Tong, H., ... Wang, W. (2017). Automatic navbox generation by interpretable clustering over linked entities. In CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management (Vol. Part F131841, pp. 1857-1865). Association for Computing Machinery. https://doi.org/10.1145/3132847.3132899

Automatic navbox generation by interpretable clustering over linked entities. / Xie, Chenhao; Chen, Lihan; Liang, Jiaqing; Zhang, Kezun; Xiao, Yanghua; Tong, Hanghang; Wang, Haixun; Wang, Wei.

CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. Vol. Part F131841 Association for Computing Machinery, 2017. p. 1857-1865.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Xie, C, Chen, L, Liang, J, Zhang, K, Xiao, Y, Tong, H, Wang, H & Wang, W 2017, Automatic navbox generation by interpretable clustering over linked entities. in CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. vol. Part F131841, Association for Computing Machinery, pp. 1857-1865, 26th ACM International Conference on Information and Knowledge Management, CIKM 2017, Singapore, Singapore, 11/6/17. https://doi.org/10.1145/3132847.3132899
Xie C, Chen L, Liang J, Zhang K, Xiao Y, Tong H et al. Automatic navbox generation by interpretable clustering over linked entities. In CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. Vol. Part F131841. Association for Computing Machinery. 2017. p. 1857-1865 https://doi.org/10.1145/3132847.3132899
Xie, Chenhao ; Chen, Lihan ; Liang, Jiaqing ; Zhang, Kezun ; Xiao, Yanghua ; Tong, Hanghang ; Wang, Haixun ; Wang, Wei. / Automatic navbox generation by interpretable clustering over linked entities. CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. Vol. Part F131841 Association for Computing Machinery, 2017. pp. 1857-1865
@inproceedings{6c6df9e8aa12475493afe49065e80dcd,
title = "Automatic navbox generation by interpretable clustering over linked entities",
abstract = "Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership and editing efficiency of Wikipedia. In this paper, we target on the automatic generation of Navbox for Wikipedia articles. Instead of performing information extraction over unstructured natural language text directly, an alternative avenue is explored by focusing on a rich set of semi-structured data in Wikipedia articles: linked entities. The core idea of this paper is as follows: If we cluster the linked entities and interpret them appropriately, we can construct a high-quality Navbox for the article entity. We propose a clustering-then-labeling algorithm to realize the idea. Experiments show that the proposed solutions are effective. Ultimately, our approach enriches Wikipedia with 1.95 million new Navboxes of high quality.",
keywords = "Clustering-thenlabeling, Interpretable clustering, Knowledge extraction, Navbox generation",
author = "Chenhao Xie and Lihan Chen and Jiaqing Liang and Kezun Zhang and Yanghua Xiao and Hanghang Tong and Haixun Wang and Wei Wang",
year = "2017",
month = "11",
day = "6",
doi = "10.1145/3132847.3132899",
language = "English (US)",
volume = "Part F131841",
pages = "1857--1865",
booktitle = "CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Automatic navbox generation by interpretable clustering over linked entities

AU - Xie, Chenhao

AU - Chen, Lihan

AU - Liang, Jiaqing

AU - Zhang, Kezun

AU - Xiao, Yanghua

AU - Tong, Hanghang

AU - Wang, Haixun

AU - Wang, Wei

PY - 2017/11/6

Y1 - 2017/11/6

N2 - Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership and editing efficiency of Wikipedia. In this paper, we target on the automatic generation of Navbox for Wikipedia articles. Instead of performing information extraction over unstructured natural language text directly, an alternative avenue is explored by focusing on a rich set of semi-structured data in Wikipedia articles: linked entities. The core idea of this paper is as follows: If we cluster the linked entities and interpret them appropriately, we can construct a high-quality Navbox for the article entity. We propose a clustering-then-labeling algorithm to realize the idea. Experiments show that the proposed solutions are effective. Ultimately, our approach enriches Wikipedia with 1.95 million new Navboxes of high quality.

AB - Rare efforts have been devoted to generating the structured Navigation Box (Navbox) for Wikipedia articles. A Navbox is a table in Wikipedia article page that provides a consistent navigation system for related entities. Navbox is critical for the readership and editing efficiency of Wikipedia. In this paper, we target on the automatic generation of Navbox for Wikipedia articles. Instead of performing information extraction over unstructured natural language text directly, an alternative avenue is explored by focusing on a rich set of semi-structured data in Wikipedia articles: linked entities. The core idea of this paper is as follows: If we cluster the linked entities and interpret them appropriately, we can construct a high-quality Navbox for the article entity. We propose a clustering-then-labeling algorithm to realize the idea. Experiments show that the proposed solutions are effective. Ultimately, our approach enriches Wikipedia with 1.95 million new Navboxes of high quality.

KW - Clustering-thenlabeling

KW - Interpretable clustering

KW - Knowledge extraction

KW - Navbox generation

UR - http://www.scopus.com/inward/record.url?scp=85037371535&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85037371535&partnerID=8YFLogxK

U2 - 10.1145/3132847.3132899

DO - 10.1145/3132847.3132899

M3 - Conference contribution

VL - Part F131841

SP - 1857

EP - 1865

BT - CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management

PB - Association for Computing Machinery

ER -