Multi-task crowdsourcing via an optimization framework

Y. Zhou; J. He; L. Ying

doi:10.1145/3310227

Multi-task crowdsourcing via an optimization framework

Y. Zhou, J. He, L. Ying

Research output: Contribution to journal › Article › peer-review

12 Scopus citations

Abstract

The unprecedented amounts of data have catalyzed the trend of combining human insights with machine learning techniques, which facilitate the use of crowdsourcing to enlist label information both effectively and efficiently. One crucial challenge in crowdsourcing is the diverse worker quality, which determines the accuracy of the label information provided by such workers. Motivated by the observations that same set of tasks are typically labeled by the same set of workers, we studied their behaviors across multiple related tasks and proposed an optimization framework for learning from task and worker dual heterogeneity. The proposed method uses a weight tensor to represent the workers’ behaviors across multiple tasks, and seeks to find the optimal solution of the tensor by exploiting its structured information. Then, we propose an iterative algorithm to solve the optimization problem and analyze its computational complexity. To infer the true label of an example, we construct a worker ensemble based on the estimated tensor, whose decisions will be weighted using a set of entropy weight. We also prove that the gradient of the most time-consuming updating block is separable with respect to the workers, which leads to a randomized algorithm with faster speed. Moreover, we extend the learning framework to accommodate to the multi-class setting. Finally, we test the performance of our framework on several datasets, and demonstrate its superiority over state-of-the-art techniques.

Original language	English (US)
Article number	27
Journal	ACM Transactions on Knowledge Discovery from Data
Volume	13
Issue number	3
DOIs	https://doi.org/10.1145/3310227
State	Published - May 29 2019

Keywords

Crowdsourcing
Entropy ensemble
Multi-task learning
Optimization
Tensor representation

ASJC Scopus subject areas

General Computer Science

Access to Document

10.1145/3310227

Cite this

@article{fdd5604857f84aea82ce0b1e47e6b166,

title = "Multi-task crowdsourcing via an optimization framework",

abstract = "The unprecedented amounts of data have catalyzed the trend of combining human insights with machine learning techniques, which facilitate the use of crowdsourcing to enlist label information both effectively and efficiently. One crucial challenge in crowdsourcing is the diverse worker quality, which determines the accuracy of the label information provided by such workers. Motivated by the observations that same set of tasks are typically labeled by the same set of workers, we studied their behaviors across multiple related tasks and proposed an optimization framework for learning from task and worker dual heterogeneity. The proposed method uses a weight tensor to represent the workers{\textquoteright} behaviors across multiple tasks, and seeks to find the optimal solution of the tensor by exploiting its structured information. Then, we propose an iterative algorithm to solve the optimization problem and analyze its computational complexity. To infer the true label of an example, we construct a worker ensemble based on the estimated tensor, whose decisions will be weighted using a set of entropy weight. We also prove that the gradient of the most time-consuming updating block is separable with respect to the workers, which leads to a randomized algorithm with faster speed. Moreover, we extend the learning framework to accommodate to the multi-class setting. Finally, we test the performance of our framework on several datasets, and demonstrate its superiority over state-of-the-art techniques.",

keywords = "Crowdsourcing, Entropy ensemble, Multi-task learning, Optimization, Tensor representation",

author = "Y. Zhou and J. He and L. Ying",

note = "Funding Information: This work is supported by National Science Foundation under Grant No. IIS-1552654, Grant No. IIS-1813464, Grant No. CNS-1629888, Grant No. CNS-1618768, and Grant No. ECCS-1547294, the U.S. Department of Homeland Security under Grant Award Number 2017-ST-061-QA0001, and an IBM Faculty Award. The views and conclusions are those of the authors and should not be interpreted as representing the official policies of the funding agencies or the government. Authors{\textquoteright} addresses: Y. Zhou, Brickyard Engineering Building (BYENG) Room 490CA, 699 S Mill Ave, Tempe, AZ, 85281; email: yzhou174@asu.edu; J. He, Brickyard Engineering Building (BYENG) Room 410, 699 S Mill Ave, Tempe, AZ, 85281; email: jingrui.he@asu.edu; L. Ying, Goldenwater Center (GWC) Room 436, 650 E. Tyler Mall Tempe, AZ 85281; email: lei.ying.2@asu.edu. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. {\textcopyright} 2019 Association for Computing Machinery. 1556-4681/2019/05-ART27 $15.00 https://doi.org/10.1145/3310227 Publisher Copyright: {\textcopyright} 2019 Association for Computing Machinery.",

year = "2019",

month = may,

day = "29",

doi = "10.1145/3310227",

language = "English (US)",

volume = "13",

journal = "ACM Transactions on Knowledge Discovery from Data",

issn = "1556-4681",

publisher = "Association for Computing Machinery (ACM)",

number = "3",

}

TY - JOUR

T1 - Multi-task crowdsourcing via an optimization framework

AU - Zhou, Y.

AU - He, J.

AU - Ying, L.

N1 - Funding Information: This work is supported by National Science Foundation under Grant No. IIS-1552654, Grant No. IIS-1813464, Grant No. CNS-1629888, Grant No. CNS-1618768, and Grant No. ECCS-1547294, the U.S. Department of Homeland Security under Grant Award Number 2017-ST-061-QA0001, and an IBM Faculty Award. The views and conclusions are those of the authors and should not be interpreted as representing the official policies of the funding agencies or the government. Authors’ addresses: Y. Zhou, Brickyard Engineering Building (BYENG) Room 490CA, 699 S Mill Ave, Tempe, AZ, 85281; email: yzhou174@asu.edu; J. He, Brickyard Engineering Building (BYENG) Room 410, 699 S Mill Ave, Tempe, AZ, 85281; email: jingrui.he@asu.edu; L. Ying, Goldenwater Center (GWC) Room 436, 650 E. Tyler Mall Tempe, AZ 85281; email: lei.ying.2@asu.edu. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2019 Association for Computing Machinery. 1556-4681/2019/05-ART27 $15.00 https://doi.org/10.1145/3310227 Publisher Copyright: © 2019 Association for Computing Machinery.

PY - 2019/5/29

Y1 - 2019/5/29

N2 - The unprecedented amounts of data have catalyzed the trend of combining human insights with machine learning techniques, which facilitate the use of crowdsourcing to enlist label information both effectively and efficiently. One crucial challenge in crowdsourcing is the diverse worker quality, which determines the accuracy of the label information provided by such workers. Motivated by the observations that same set of tasks are typically labeled by the same set of workers, we studied their behaviors across multiple related tasks and proposed an optimization framework for learning from task and worker dual heterogeneity. The proposed method uses a weight tensor to represent the workers’ behaviors across multiple tasks, and seeks to find the optimal solution of the tensor by exploiting its structured information. Then, we propose an iterative algorithm to solve the optimization problem and analyze its computational complexity. To infer the true label of an example, we construct a worker ensemble based on the estimated tensor, whose decisions will be weighted using a set of entropy weight. We also prove that the gradient of the most time-consuming updating block is separable with respect to the workers, which leads to a randomized algorithm with faster speed. Moreover, we extend the learning framework to accommodate to the multi-class setting. Finally, we test the performance of our framework on several datasets, and demonstrate its superiority over state-of-the-art techniques.

AB - The unprecedented amounts of data have catalyzed the trend of combining human insights with machine learning techniques, which facilitate the use of crowdsourcing to enlist label information both effectively and efficiently. One crucial challenge in crowdsourcing is the diverse worker quality, which determines the accuracy of the label information provided by such workers. Motivated by the observations that same set of tasks are typically labeled by the same set of workers, we studied their behaviors across multiple related tasks and proposed an optimization framework for learning from task and worker dual heterogeneity. The proposed method uses a weight tensor to represent the workers’ behaviors across multiple tasks, and seeks to find the optimal solution of the tensor by exploiting its structured information. Then, we propose an iterative algorithm to solve the optimization problem and analyze its computational complexity. To infer the true label of an example, we construct a worker ensemble based on the estimated tensor, whose decisions will be weighted using a set of entropy weight. We also prove that the gradient of the most time-consuming updating block is separable with respect to the workers, which leads to a randomized algorithm with faster speed. Moreover, we extend the learning framework to accommodate to the multi-class setting. Finally, we test the performance of our framework on several datasets, and demonstrate its superiority over state-of-the-art techniques.

KW - Crowdsourcing

KW - Entropy ensemble

KW - Multi-task learning

KW - Optimization

KW - Tensor representation

UR - http://www.scopus.com/inward/record.url?scp=85069464520&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069464520&partnerID=8YFLogxK

U2 - 10.1145/3310227

DO - 10.1145/3310227

M3 - Article

AN - SCOPUS:85069464520

SN - 1556-4681

VL - 13

JO - ACM Transactions on Knowledge Discovery from Data

JF - ACM Transactions on Knowledge Discovery from Data

IS - 3

M1 - 27

ER -

Multi-task crowdsourcing via an optimization framework

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this