Abstract

The unprecedented amounts of data have catalyzed the trend of combining human insights with machine learning techniques, which facilitate the use of crowdsourcing to enlist label information both effectively and efficiently. One crucial challenge in crowdsourcing is the diverse worker quality, which determines the accuracy of the label information provided by such workers. Motivated by the observations that same set of tasks are typically labeled by the same set of workers, we studied their behaviors across multiple related tasks and proposed an optimization framework for learning from task and worker dual heterogeneity. The proposed method uses a weight tensor to represent the workers’ behaviors across multiple tasks, and seeks to find the optimal solution of the tensor by exploiting its structured information. Then, we propose an iterative algorithm to solve the optimization problem and analyze its computational complexity. To infer the true label of an example, we construct a worker ensemble based on the estimated tensor, whose decisions will be weighted using a set of entropy weight. We also prove that the gradient of the most time-consuming updating block is separable with respect to the workers, which leads to a randomized algorithm with faster speed. Moreover, we extend the learning framework to accommodate to the multi-class setting. Finally, we test the performance of our framework on several datasets, and demonstrate its superiority over state-of-the-art techniques.

Original languageEnglish (US)
Article number27
JournalACM Transactions on Knowledge Discovery from Data
Volume13
Issue number3
DOIs
StatePublished - May 29 2019

Keywords

  • Crowdsourcing
  • Entropy ensemble
  • Multi-task learning
  • Optimization
  • Tensor representation

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Multi-task crowdsourcing via an optimization framework'. Together they form a unique fingerprint.

Cite this