Abstract

Nowadays, crowdsourcing has been commonly used to enlist label information both effectively and efficiently. One major challenge in crowdsourcing is the diverse worker quality, which determines the accuracy of the label information provided by such workers. Motivated by the observation that in many crowdsourcing platforms, the same set of workers typically work on the same set of tasks, we propose to model the diverse worker quality by studying their behaviors across multiple related tasks. To this end, we propose an optimization framework named MultiC2 for learning from task and worker dual heterogeneity. It uses a weight tensor to represent the workers' behaviors across multiple tasks, and seeks to find the optimal solution of the tensor by exploiting its structured information. We then propose an iterative algorithm to solve the optimization framework and analyze its computational complexity. To infer the true label of an example, we construct a worker ensemble based on the estimated tensor, whose decisions will be weighted using a set of entropy weight. Finally, we test the performance of MultiC2 on various data sets, and demonstrate its superiority over state-of-the-art crowdsourcing techniques.

Original languageEnglish (US)
Title of host publicationProceedings of the 17th SIAM International Conference on Data Mining, SDM 2017
PublisherSociety for Industrial and Applied Mathematics Publications
Pages579-587
Number of pages9
ISBN (Electronic)9781611974874
StatePublished - 2017
Event17th SIAM International Conference on Data Mining, SDM 2017 - Houston, United States
Duration: Apr 27 2017Apr 29 2017

Other

Other17th SIAM International Conference on Data Mining, SDM 2017
CountryUnited States
CityHouston
Period4/27/174/29/17

Fingerprint

Tensors
Labels
Computational complexity
Entropy

Keywords

  • Crowdsourcing
  • Multi-task learning
  • Tensor representation

ASJC Scopus subject areas

  • Software
  • Computer Science Applications

Cite this

Zhou, Y., Ying, L., & He, J. (2017). MultiC2: An Optimization framework for learning from task and worker dual heterogeneity. In Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017 (pp. 579-587). Society for Industrial and Applied Mathematics Publications.

MultiC2 : An Optimization framework for learning from task and worker dual heterogeneity. / Zhou, Yao; Ying, Lei; He, Jingrui.

Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017. Society for Industrial and Applied Mathematics Publications, 2017. p. 579-587.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhou, Y, Ying, L & He, J 2017, MultiC2: An Optimization framework for learning from task and worker dual heterogeneity. in Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017. Society for Industrial and Applied Mathematics Publications, pp. 579-587, 17th SIAM International Conference on Data Mining, SDM 2017, Houston, United States, 4/27/17.
Zhou Y, Ying L, He J. MultiC2: An Optimization framework for learning from task and worker dual heterogeneity. In Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017. Society for Industrial and Applied Mathematics Publications. 2017. p. 579-587
Zhou, Yao ; Ying, Lei ; He, Jingrui. / MultiC2 : An Optimization framework for learning from task and worker dual heterogeneity. Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017. Society for Industrial and Applied Mathematics Publications, 2017. pp. 579-587
@inproceedings{8b9c56d6e2e246cb8475b754c7781448,
title = "MultiC2: An Optimization framework for learning from task and worker dual heterogeneity",
abstract = "Nowadays, crowdsourcing has been commonly used to enlist label information both effectively and efficiently. One major challenge in crowdsourcing is the diverse worker quality, which determines the accuracy of the label information provided by such workers. Motivated by the observation that in many crowdsourcing platforms, the same set of workers typically work on the same set of tasks, we propose to model the diverse worker quality by studying their behaviors across multiple related tasks. To this end, we propose an optimization framework named MultiC2 for learning from task and worker dual heterogeneity. It uses a weight tensor to represent the workers' behaviors across multiple tasks, and seeks to find the optimal solution of the tensor by exploiting its structured information. We then propose an iterative algorithm to solve the optimization framework and analyze its computational complexity. To infer the true label of an example, we construct a worker ensemble based on the estimated tensor, whose decisions will be weighted using a set of entropy weight. Finally, we test the performance of MultiC2 on various data sets, and demonstrate its superiority over state-of-the-art crowdsourcing techniques.",
keywords = "Crowdsourcing, Multi-task learning, Tensor representation",
author = "Yao Zhou and Lei Ying and Jingrui He",
year = "2017",
language = "English (US)",
pages = "579--587",
booktitle = "Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017",
publisher = "Society for Industrial and Applied Mathematics Publications",
address = "United States",

}

TY - GEN

T1 - MultiC2

T2 - An Optimization framework for learning from task and worker dual heterogeneity

AU - Zhou, Yao

AU - Ying, Lei

AU - He, Jingrui

PY - 2017

Y1 - 2017

N2 - Nowadays, crowdsourcing has been commonly used to enlist label information both effectively and efficiently. One major challenge in crowdsourcing is the diverse worker quality, which determines the accuracy of the label information provided by such workers. Motivated by the observation that in many crowdsourcing platforms, the same set of workers typically work on the same set of tasks, we propose to model the diverse worker quality by studying their behaviors across multiple related tasks. To this end, we propose an optimization framework named MultiC2 for learning from task and worker dual heterogeneity. It uses a weight tensor to represent the workers' behaviors across multiple tasks, and seeks to find the optimal solution of the tensor by exploiting its structured information. We then propose an iterative algorithm to solve the optimization framework and analyze its computational complexity. To infer the true label of an example, we construct a worker ensemble based on the estimated tensor, whose decisions will be weighted using a set of entropy weight. Finally, we test the performance of MultiC2 on various data sets, and demonstrate its superiority over state-of-the-art crowdsourcing techniques.

AB - Nowadays, crowdsourcing has been commonly used to enlist label information both effectively and efficiently. One major challenge in crowdsourcing is the diverse worker quality, which determines the accuracy of the label information provided by such workers. Motivated by the observation that in many crowdsourcing platforms, the same set of workers typically work on the same set of tasks, we propose to model the diverse worker quality by studying their behaviors across multiple related tasks. To this end, we propose an optimization framework named MultiC2 for learning from task and worker dual heterogeneity. It uses a weight tensor to represent the workers' behaviors across multiple tasks, and seeks to find the optimal solution of the tensor by exploiting its structured information. We then propose an iterative algorithm to solve the optimization framework and analyze its computational complexity. To infer the true label of an example, we construct a worker ensemble based on the estimated tensor, whose decisions will be weighted using a set of entropy weight. Finally, we test the performance of MultiC2 on various data sets, and demonstrate its superiority over state-of-the-art crowdsourcing techniques.

KW - Crowdsourcing

KW - Multi-task learning

KW - Tensor representation

UR - http://www.scopus.com/inward/record.url?scp=85027831142&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027831142&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85027831142

SP - 579

EP - 587

BT - Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017

PB - Society for Industrial and Applied Mathematics Publications

ER -