TY - JOUR
T1 - Detecting high-quality posts in community question answering sites
AU - Yao, Yuan
AU - Tong, Hanghang
AU - Xie, Tao
AU - Akoglu, Leman
AU - Xu, Feng
AU - Lu, Jian
N1 - Funding Information:
This work is supported by the National 973 Program of China (No. 2015CB352202 ), and the National Natural Science Foundation of China (Nos. 91318301 and 61321491 ). This material is partially supported by the National Science Foundation under Grant No. IIS1017415 , by the Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053 , by Defense Advanced Research Projects Agency (DARPA) under Contract Numbers W911NF-11-C-0200 and W911NF-12-C-0028 , by Region II University Transportation Center under the project number 49997-33 25, by the ARO Young Investigator Program grant with Contract No. W911NF-14-1-0029 , by an R&D gift from Northrop Grumman Aerospace Systems, and by the Stony Brook University Office of Vice President for Research. Tao Xie’s work is supported in part by a Microsoft Research Award, NSF grants CCF-1349666 , CNS-1434582 , CCF-1434596 , CCF-1434590 , CNS-1439481 , and NSF of China No. 61228203.
Publisher Copyright:
© 2015 Elsevier Inc. All rights reserved.
PY - 2015/5/1
Y1 - 2015/5/1
N2 - Community question answering (CQA) has become a new paradigm for seeking and sharing information. In CQA sites, users can ask and answer questions, and provide feedback (e.g., by voting or commenting) to these questions/answers. In this article, we propose the early detection of high-quality CQA questions/answers. Such detection can help discover a high-impact question that would be widely recognized by the users in these CQA sites, as well as identify a useful answer that would gain much positive feedback from site users. In particular, we view the post quality from the perspective of the voting outcome. First, our key intuition is that the voting score of an answer is strongly positively correlated with that of its question, and we verify such correlation in two real CQA data sets. Second, armed with the verified correlation, we propose a family of algorithms to jointly detecting the high-quality questions and answers soon after they are posted in the CQA sites. We conduct extensive experimental evaluations to demonstrate the effectiveness and efficiency of our approaches. Overall, our algorithms can outperform the best competitor in prediction performance, while enjoying linear scalability with respect to the total number of posts.
AB - Community question answering (CQA) has become a new paradigm for seeking and sharing information. In CQA sites, users can ask and answer questions, and provide feedback (e.g., by voting or commenting) to these questions/answers. In this article, we propose the early detection of high-quality CQA questions/answers. Such detection can help discover a high-impact question that would be widely recognized by the users in these CQA sites, as well as identify a useful answer that would gain much positive feedback from site users. In particular, we view the post quality from the perspective of the voting outcome. First, our key intuition is that the voting score of an answer is strongly positively correlated with that of its question, and we verify such correlation in two real CQA data sets. Second, armed with the verified correlation, we propose a family of algorithms to jointly detecting the high-quality questions and answers soon after they are posted in the CQA sites. We conduct extensive experimental evaluations to demonstrate the effectiveness and efficiency of our approaches. Overall, our algorithms can outperform the best competitor in prediction performance, while enjoying linear scalability with respect to the total number of posts.
KW - CQA
KW - Question and answer
KW - Voting correlation
KW - Voting prediction
UR - http://www.scopus.com/inward/record.url?scp=84922677453&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84922677453&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2014.12.038
DO - 10.1016/j.ins.2014.12.038
M3 - Article
AN - SCOPUS:84922677453
SN - 0020-0255
VL - 302
SP - 70
EP - 82
JO - Information Sciences
JF - Information Sciences
ER -