WiiCluster: A platform for wikipedia infobox generation

Kezun Zhang; Yanghua Xiao; Hanghang Tong; Haixun Wang; Wei Wang

doi:10.1145/2661829.2661840

WiiCluster: A platform for wikipedia infobox generation

Kezun Zhang, Yanghua Xiao, Hanghang Tong, Haixun Wang, Wei Wang

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

Wikipedia has become one of the best sources for creating and sharing a massive volume of human knowledge. Much effort has been devoted to generating and enriching the structured data by automatic information extraction from unstructured text in Wikipedia. Most, if not all, of the existing work share the same paradigm, that is, starting with information extraction over the unstructured text data, followed by supervised machine learning. Although remarkable progresses have been made, this paradigm has its own limitations in terms of effectiveness, scalability as well as the high labeling cost. We present WiiCluster, a scalable platform for automatically generating infobox for articles in Wikipedia. The heart of our system is an effective cluster-then-label algorithm over a rich set of semi-structured data in Wikipedia articles: linked entities. It is totally unsupervised and thus does not require any human label. It is effective in generating semantically meaningful summarization for Wikipedia articles. We further propose a cluster-reuse algorithm to scale up our system. Overall, our WiiCluster is able to generate nearly 10 million new facts. We also develop a web-based platform to demonstrate WiiCluster, which enables the users to access and browse the generated knowledge.

Original language	English (US)
Title of host publication	CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
Publisher	Association for Computing Machinery
Pages	2033-2035
Number of pages	3
ISBN (Electronic)	9781450325981
DOIs	https://doi.org/10.1145/2661829.2661840
State	Published - Nov 3 2014
Event	23rd ACM International Conference on Information and Knowledge Management, CIKM 2014 - Shanghai, China Duration: Nov 3 2014 → Nov 7 2014

Publication series

Name	CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management

Other

Other	23rd ACM International Conference on Information and Knowledge Management, CIKM 2014
Country/Territory	China
City	Shanghai
Period	11/3/14 → 11/7/14

Keywords

Cluster visualization
Knowledge extraction
Summarization

ASJC Scopus subject areas

Information Systems and Management
Computer Science Applications
Information Systems

Access to Document

10.1145/2661829.2661840

Cite this

Zhang, K., Xiao, Y., Tong, H., Wang, H., & Wang, W. (2014). WiiCluster: A platform for wikipedia infobox generation. In CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management (pp. 2033-2035). (CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management). Association for Computing Machinery. https://doi.org/10.1145/2661829.2661840

WiiCluster: A platform for wikipedia infobox generation. / Zhang, Kezun; Xiao, Yanghua; Tong, Hanghang et al.
CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, 2014. p. 2033-2035 (CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Zhang, K, Xiao, Y, Tong, H, Wang, H & Wang, W 2014, WiiCluster: A platform for wikipedia infobox generation. in CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management. CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management, Association for Computing Machinery, pp. 2033-2035, 23rd ACM International Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, 11/3/14. https://doi.org/10.1145/2661829.2661840

Zhang K, Xiao Y, Tong H, Wang H, Wang W. WiiCluster: A platform for wikipedia infobox generation. In CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management. Association for Computing Machinery. 2014. p. 2033-2035. (CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management). doi: 10.1145/2661829.2661840

Zhang, Kezun ; Xiao, Yanghua ; Tong, Hanghang et al. / WiiCluster : A platform for wikipedia infobox generation. CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, 2014. pp. 2033-2035 (CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management).

@inproceedings{8439d2bd96d84570a62cc9f2cab731a0,

title = "WiiCluster: A platform for wikipedia infobox generation",

abstract = "Wikipedia has become one of the best sources for creating and sharing a massive volume of human knowledge. Much effort has been devoted to generating and enriching the structured data by automatic information extraction from unstructured text in Wikipedia. Most, if not all, of the existing work share the same paradigm, that is, starting with information extraction over the unstructured text data, followed by supervised machine learning. Although remarkable progresses have been made, this paradigm has its own limitations in terms of effectiveness, scalability as well as the high labeling cost. We present WiiCluster, a scalable platform for automatically generating infobox for articles in Wikipedia. The heart of our system is an effective cluster-then-label algorithm over a rich set of semi-structured data in Wikipedia articles: linked entities. It is totally unsupervised and thus does not require any human label. It is effective in generating semantically meaningful summarization for Wikipedia articles. We further propose a cluster-reuse algorithm to scale up our system. Overall, our WiiCluster is able to generate nearly 10 million new facts. We also develop a web-based platform to demonstrate WiiCluster, which enables the users to access and browse the generated knowledge.",

keywords = "Cluster visualization, Knowledge extraction, Summarization",

author = "Kezun Zhang and Yanghua Xiao and Hanghang Tong and Haixun Wang and Wei Wang",

year = "2014",

month = nov,

day = "3",

doi = "10.1145/2661829.2661840",

language = "English (US)",

series = "CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management",

publisher = "Association for Computing Machinery",

pages = "2033--2035",

booktitle = "CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management",

note = "23rd ACM International Conference on Information and Knowledge Management, CIKM 2014 ; Conference date: 03-11-2014 Through 07-11-2014",

}

TY - GEN

T1 - WiiCluster

T2 - 23rd ACM International Conference on Information and Knowledge Management, CIKM 2014

AU - Zhang, Kezun

AU - Xiao, Yanghua

AU - Tong, Hanghang

AU - Wang, Haixun

AU - Wang, Wei

PY - 2014/11/3

Y1 - 2014/11/3

N2 - Wikipedia has become one of the best sources for creating and sharing a massive volume of human knowledge. Much effort has been devoted to generating and enriching the structured data by automatic information extraction from unstructured text in Wikipedia. Most, if not all, of the existing work share the same paradigm, that is, starting with information extraction over the unstructured text data, followed by supervised machine learning. Although remarkable progresses have been made, this paradigm has its own limitations in terms of effectiveness, scalability as well as the high labeling cost. We present WiiCluster, a scalable platform for automatically generating infobox for articles in Wikipedia. The heart of our system is an effective cluster-then-label algorithm over a rich set of semi-structured data in Wikipedia articles: linked entities. It is totally unsupervised and thus does not require any human label. It is effective in generating semantically meaningful summarization for Wikipedia articles. We further propose a cluster-reuse algorithm to scale up our system. Overall, our WiiCluster is able to generate nearly 10 million new facts. We also develop a web-based platform to demonstrate WiiCluster, which enables the users to access and browse the generated knowledge.

AB - Wikipedia has become one of the best sources for creating and sharing a massive volume of human knowledge. Much effort has been devoted to generating and enriching the structured data by automatic information extraction from unstructured text in Wikipedia. Most, if not all, of the existing work share the same paradigm, that is, starting with information extraction over the unstructured text data, followed by supervised machine learning. Although remarkable progresses have been made, this paradigm has its own limitations in terms of effectiveness, scalability as well as the high labeling cost. We present WiiCluster, a scalable platform for automatically generating infobox for articles in Wikipedia. The heart of our system is an effective cluster-then-label algorithm over a rich set of semi-structured data in Wikipedia articles: linked entities. It is totally unsupervised and thus does not require any human label. It is effective in generating semantically meaningful summarization for Wikipedia articles. We further propose a cluster-reuse algorithm to scale up our system. Overall, our WiiCluster is able to generate nearly 10 million new facts. We also develop a web-based platform to demonstrate WiiCluster, which enables the users to access and browse the generated knowledge.

KW - Cluster visualization

KW - Knowledge extraction

KW - Summarization

UR - http://www.scopus.com/inward/record.url?scp=84937555263&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937555263&partnerID=8YFLogxK

U2 - 10.1145/2661829.2661840

DO - 10.1145/2661829.2661840

M3 - Conference contribution

AN - SCOPUS:84937555263

T3 - CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management

SP - 2033

EP - 2035

BT - CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management

PB - Association for Computing Machinery

Y2 - 3 November 2014 through 7 November 2014

ER -

WiiCluster: A platform for wikipedia infobox generation

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this