Challenges of data collection on mturk: A human-AI joint face-matching task

Michelle Mancenido; Pouria Salehi; Erin Chiou; Ahmadreza Mosallanezhad; Aksheshkumar Shah; Myke Cohen

Challenges of data collection on mturk: A human-AI joint face-matching task

Michelle Mancenido, Pouria Salehi, Erin Chiou, Ahmadreza Mosallanezhad, Aksheshkumar Shah, Myke Cohen

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

During the COVID-19 pandemic, many human-subject studies have stopped in-person data collection and shifted to virtual platforms like Amazon Mechanical Turk (MTurk). This shift involves important considerations for study design and data analysis, particularly for studies involving behavioral assessment and performance with technology. We report on lessons learned from a recent study that used MTurk for a face-matching task with an open-source AI. Participants received $5 compensation for completing a 45-minute session that included questionnaires. To help address data validity issues, Qualtrics fraud-detection features (i.e., reCAPTCHA, ID-Fraud), trap-items (e.g., Respond with Often), and a modified-batch-randomization-process were employed. Participants' accumulative accuracy and response rates were also assessed. Out of 272 participants, 121 passed the data inclusion criteria. The questionnaires' reliability was within range (average 0.78) for the healthy dataset. Accumulative accuracy in the face-matching task decreased approximately halfway through the task. Subsequent data inspection revealed that almost half of the participants spent longer than 20 seconds and up to 12 minutes on a random image pair. It is possible that participants were interrupted during the study or they elected to take unscheduled breaks. Environmental factors that were easier to control during in-person laboratory studies now require built-in controls for virtual study environments. We learned that: (1) it is imperative to monitor performance measures over time for each participant; (2) the study duration may need to be kept shorter on virtual platforms compared to in-person studies; (3) an optional, planned break during the task might help prevent other unplanned breaks.

Original language	English (US)
Title of host publication	IISE Annual Conference and Expo 2021
Editors	A. Ghate, K. Krishnaiyer, K. Paynabar
Publisher	Institute of Industrial and Systems Engineers, IISE
Pages	175-180
Number of pages	6
ISBN (Electronic)	9781713838470
State	Published - 2021
Event	IISE Annual Conference and Expo 2021 - Virtual, Online Duration: May 22 2021 → May 25 2021

Publication series

Name	IISE Annual Conference and Expo 2021

Conference

Conference	IISE Annual Conference and Expo 2021
City	Virtual, Online
Period	5/22/21 → 5/25/21

Keywords

Crowdsourcing
Face verification
Human-AI joint decision systems

ASJC Scopus subject areas

Control and Systems Engineering
Industrial and Manufacturing Engineering

Cite this

Challenges of data collection on mturk: A human-AI joint face-matching task. / Mancenido, Michelle; Salehi, Pouria; Chiou, Erin et al.
IISE Annual Conference and Expo 2021. ed. / A. Ghate; K. Krishnaiyer; K. Paynabar. Institute of Industrial and Systems Engineers, IISE, 2021. p. 175-180 (IISE Annual Conference and Expo 2021).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Mancenido, M, Salehi, P, Chiou, E, Mosallanezhad, A, Shah, A & Cohen, M 2021, Challenges of data collection on mturk: A human-AI joint face-matching task. in A Ghate, K Krishnaiyer & K Paynabar (eds), IISE Annual Conference and Expo 2021. IISE Annual Conference and Expo 2021, Institute of Industrial and Systems Engineers, IISE, pp. 175-180, IISE Annual Conference and Expo 2021, Virtual, Online, 5/22/21.

@inproceedings{b21dc51932a4404882a4dc6a492fecd7,

title = "Challenges of data collection on mturk: A human-AI joint face-matching task",

abstract = "During the COVID-19 pandemic, many human-subject studies have stopped in-person data collection and shifted to virtual platforms like Amazon Mechanical Turk (MTurk). This shift involves important considerations for study design and data analysis, particularly for studies involving behavioral assessment and performance with technology. We report on lessons learned from a recent study that used MTurk for a face-matching task with an open-source AI. Participants received $5 compensation for completing a 45-minute session that included questionnaires. To help address data validity issues, Qualtrics fraud-detection features (i.e., reCAPTCHA, ID-Fraud), trap-items (e.g., Respond with Often), and a modified-batch-randomization-process were employed. Participants' accumulative accuracy and response rates were also assessed. Out of 272 participants, 121 passed the data inclusion criteria. The questionnaires' reliability was within range (average 0.78) for the healthy dataset. Accumulative accuracy in the face-matching task decreased approximately halfway through the task. Subsequent data inspection revealed that almost half of the participants spent longer than 20 seconds and up to 12 minutes on a random image pair. It is possible that participants were interrupted during the study or they elected to take unscheduled breaks. Environmental factors that were easier to control during in-person laboratory studies now require built-in controls for virtual study environments. We learned that: (1) it is imperative to monitor performance measures over time for each participant; (2) the study duration may need to be kept shorter on virtual platforms compared to in-person studies; (3) an optional, planned break during the task might help prevent other unplanned breaks.",

keywords = "Crowdsourcing, Face verification, Human-AI joint decision systems",

author = "Michelle Mancenido and Pouria Salehi and Erin Chiou and Ahmadreza Mosallanezhad and Aksheshkumar Shah and Myke Cohen",

note = "Funding Information: This material is based upon work supported by the U.S. Department of Homeland Security under Grant Award Number 17STQAC00001-04-00. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the U.S. Department of Homeland Security. Publisher Copyright: {\textcopyright} 2021 IISE Annual Conference and Expo 2021. All rights reserved.; IISE Annual Conference and Expo 2021 ; Conference date: 22-05-2021 Through 25-05-2021",

year = "2021",

language = "English (US)",

series = "IISE Annual Conference and Expo 2021",

publisher = "Institute of Industrial and Systems Engineers, IISE",

pages = "175--180",

editor = "A. Ghate and K. Krishnaiyer and K. Paynabar",

booktitle = "IISE Annual Conference and Expo 2021",

}

TY - GEN

T1 - Challenges of data collection on mturk

T2 - IISE Annual Conference and Expo 2021

AU - Mancenido, Michelle

AU - Salehi, Pouria

AU - Chiou, Erin

AU - Mosallanezhad, Ahmadreza

AU - Shah, Aksheshkumar

AU - Cohen, Myke

N1 - Funding Information: This material is based upon work supported by the U.S. Department of Homeland Security under Grant Award Number 17STQAC00001-04-00. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the U.S. Department of Homeland Security. Publisher Copyright: © 2021 IISE Annual Conference and Expo 2021. All rights reserved.

PY - 2021

Y1 - 2021

N2 - During the COVID-19 pandemic, many human-subject studies have stopped in-person data collection and shifted to virtual platforms like Amazon Mechanical Turk (MTurk). This shift involves important considerations for study design and data analysis, particularly for studies involving behavioral assessment and performance with technology. We report on lessons learned from a recent study that used MTurk for a face-matching task with an open-source AI. Participants received $5 compensation for completing a 45-minute session that included questionnaires. To help address data validity issues, Qualtrics fraud-detection features (i.e., reCAPTCHA, ID-Fraud), trap-items (e.g., Respond with Often), and a modified-batch-randomization-process were employed. Participants' accumulative accuracy and response rates were also assessed. Out of 272 participants, 121 passed the data inclusion criteria. The questionnaires' reliability was within range (average 0.78) for the healthy dataset. Accumulative accuracy in the face-matching task decreased approximately halfway through the task. Subsequent data inspection revealed that almost half of the participants spent longer than 20 seconds and up to 12 minutes on a random image pair. It is possible that participants were interrupted during the study or they elected to take unscheduled breaks. Environmental factors that were easier to control during in-person laboratory studies now require built-in controls for virtual study environments. We learned that: (1) it is imperative to monitor performance measures over time for each participant; (2) the study duration may need to be kept shorter on virtual platforms compared to in-person studies; (3) an optional, planned break during the task might help prevent other unplanned breaks.

AB - During the COVID-19 pandemic, many human-subject studies have stopped in-person data collection and shifted to virtual platforms like Amazon Mechanical Turk (MTurk). This shift involves important considerations for study design and data analysis, particularly for studies involving behavioral assessment and performance with technology. We report on lessons learned from a recent study that used MTurk for a face-matching task with an open-source AI. Participants received $5 compensation for completing a 45-minute session that included questionnaires. To help address data validity issues, Qualtrics fraud-detection features (i.e., reCAPTCHA, ID-Fraud), trap-items (e.g., Respond with Often), and a modified-batch-randomization-process were employed. Participants' accumulative accuracy and response rates were also assessed. Out of 272 participants, 121 passed the data inclusion criteria. The questionnaires' reliability was within range (average 0.78) for the healthy dataset. Accumulative accuracy in the face-matching task decreased approximately halfway through the task. Subsequent data inspection revealed that almost half of the participants spent longer than 20 seconds and up to 12 minutes on a random image pair. It is possible that participants were interrupted during the study or they elected to take unscheduled breaks. Environmental factors that were easier to control during in-person laboratory studies now require built-in controls for virtual study environments. We learned that: (1) it is imperative to monitor performance measures over time for each participant; (2) the study duration may need to be kept shorter on virtual platforms compared to in-person studies; (3) an optional, planned break during the task might help prevent other unplanned breaks.

KW - Crowdsourcing

KW - Face verification

KW - Human-AI joint decision systems

UR - http://www.scopus.com/inward/record.url?scp=85120952485&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85120952485&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85120952485

T3 - IISE Annual Conference and Expo 2021

SP - 175

EP - 180

BT - IISE Annual Conference and Expo 2021

A2 - Ghate, A.

A2 - Krishnaiyer, K.

A2 - Paynabar, K.

PB - Institute of Industrial and Systems Engineers, IISE

Y2 - 22 May 2021 through 25 May 2021

ER -

Challenges of data collection on mturk: A human-AI joint face-matching task

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this