Establishing Human Observer Criterion in Evaluating Artificial Social Intelligence Agents in a Search and Rescue Task

Lixiao Huang; Jared Freeman; Nancy J. Cooke; Myke C. Cohen; Xiaoyun Yin; Jeska Clark; Matt Wood; Verica Buchanan; Christopher Corral; Federico Scholcover; Anagha Mudigonda; Lovein Thomas; Aaron Teo; John Colonna-Romano

doi:10.1111/tops.12648

Establishing Human Observer Criterion in Evaluating Artificial Social Intelligence Agents in a Search and Rescue Task

Lixiao Huang, Jared Freeman, Nancy J. Cooke, Myke C. Cohen, Xiaoyun Yin, Jeska Clark, Matt Wood, Verica Buchanan, Christopher Corral, Federico Scholcover, Anagha Mudigonda, Lovein Thomas, Aaron Teo, John Colonna-Romano

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Article › peer-review

Abstract

Artificial social intelligence (ASI) agents have great potential to aid the success of individuals, human–human teams, and human–artificial intelligence teams. To develop helpful ASI agents, we created an urban search and rescue task environment in Minecraft to evaluate ASI agents’ ability to infer participants’ knowledge training conditions and predict participants’ next victim type to be rescued. We evaluated ASI agents’ capabilities in three ways: (a) comparison to ground truth—the actual knowledge training condition and participant actions; (b) comparison among different ASI agents; and (c) comparison to a human observer criterion, whose accuracy served as a reference point. The human observers and the ASI agents used video data and timestamped event messages from the testbed, respectively, to make inferences about the same participants and topic (knowledge training condition) and the same instances of participant actions (rescue of victims). Overall, ASI agents performed better than human observers in inferring knowledge training conditions and predicting actions. Refining the human criterion can guide the design and evaluation of ASI agents for complex task environments and team composition.

Original language	English (US)
Journal	Topics in Cognitive Science
DOIs	https://doi.org/10.1111/tops.12648
State	Accepted/In press - 2023

Keywords

Artificial social intelligence
Baseline
Evaluation
Human observer criterion
Minecraft
Search and rescue
Theory of mind

ASJC Scopus subject areas

Experimental and Cognitive Psychology
Artificial Intelligence
Cognitive Neuroscience
Human-Computer Interaction
Linguistics and Language

Access to Document

10.1111/tops.12648

Cite this

Huang, L., Freeman, J., Cooke, N. J., Cohen, M. C., Yin, X., Clark, J., Wood, M., Buchanan, V., Corral, C., Scholcover, F., Mudigonda, A., Thomas, L., Teo, A., & Colonna-Romano, J. (Accepted/In press). Establishing Human Observer Criterion in Evaluating Artificial Social Intelligence Agents in a Search and Rescue Task. Topics in Cognitive Science. https://doi.org/10.1111/tops.12648

@article{818c4768ccea4714bc5ec14e5695641e,

title = "Establishing Human Observer Criterion in Evaluating Artificial Social Intelligence Agents in a Search and Rescue Task",

abstract = "Artificial social intelligence (ASI) agents have great potential to aid the success of individuals, human–human teams, and human–artificial intelligence teams. To develop helpful ASI agents, we created an urban search and rescue task environment in Minecraft to evaluate ASI agents{\textquoteright} ability to infer participants{\textquoteright} knowledge training conditions and predict participants{\textquoteright} next victim type to be rescued. We evaluated ASI agents{\textquoteright} capabilities in three ways: (a) comparison to ground truth—the actual knowledge training condition and participant actions; (b) comparison among different ASI agents; and (c) comparison to a human observer criterion, whose accuracy served as a reference point. The human observers and the ASI agents used video data and timestamped event messages from the testbed, respectively, to make inferences about the same participants and topic (knowledge training condition) and the same instances of participant actions (rescue of victims). Overall, ASI agents performed better than human observers in inferring knowledge training conditions and predicting actions. Refining the human criterion can guide the design and evaluation of ASI agents for complex task environments and team composition.",

keywords = "Artificial social intelligence, Baseline, Evaluation, Human observer criterion, Minecraft, Search and rescue, Theory of mind",

author = "Lixiao Huang and Jared Freeman and Cooke, {Nancy J.} and Cohen, {Myke C.} and Xiaoyun Yin and Jeska Clark and Matt Wood and Verica Buchanan and Christopher Corral and Federico Scholcover and Anagha Mudigonda and Lovein Thomas and Aaron Teo and John Colonna-Romano",

note = "Funding Information: This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001119C0130. All performer teams contributed to the study design. Robert Hoffman provided the gaming proficiency items and scoring sheet. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency. Publisher Copyright: {\textcopyright} 2023 Cognitive Science Society LLC.",

year = "2023",

doi = "10.1111/tops.12648",

language = "English (US)",

journal = "Topics in Cognitive Science",

issn = "1756-8757",

publisher = "Wiley-Blackwell",

}

TY - JOUR

T1 - Establishing Human Observer Criterion in Evaluating Artificial Social Intelligence Agents in a Search and Rescue Task

AU - Huang, Lixiao

AU - Freeman, Jared

AU - Cooke, Nancy J.

AU - Cohen, Myke C.

AU - Yin, Xiaoyun

AU - Clark, Jeska

AU - Wood, Matt

AU - Buchanan, Verica

AU - Corral, Christopher

AU - Scholcover, Federico

AU - Mudigonda, Anagha

AU - Thomas, Lovein

AU - Teo, Aaron

AU - Colonna-Romano, John

N1 - Funding Information: This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001119C0130. All performer teams contributed to the study design. Robert Hoffman provided the gaming proficiency items and scoring sheet. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency. Publisher Copyright: © 2023 Cognitive Science Society LLC.

PY - 2023

Y1 - 2023

N2 - Artificial social intelligence (ASI) agents have great potential to aid the success of individuals, human–human teams, and human–artificial intelligence teams. To develop helpful ASI agents, we created an urban search and rescue task environment in Minecraft to evaluate ASI agents’ ability to infer participants’ knowledge training conditions and predict participants’ next victim type to be rescued. We evaluated ASI agents’ capabilities in three ways: (a) comparison to ground truth—the actual knowledge training condition and participant actions; (b) comparison among different ASI agents; and (c) comparison to a human observer criterion, whose accuracy served as a reference point. The human observers and the ASI agents used video data and timestamped event messages from the testbed, respectively, to make inferences about the same participants and topic (knowledge training condition) and the same instances of participant actions (rescue of victims). Overall, ASI agents performed better than human observers in inferring knowledge training conditions and predicting actions. Refining the human criterion can guide the design and evaluation of ASI agents for complex task environments and team composition.

AB - Artificial social intelligence (ASI) agents have great potential to aid the success of individuals, human–human teams, and human–artificial intelligence teams. To develop helpful ASI agents, we created an urban search and rescue task environment in Minecraft to evaluate ASI agents’ ability to infer participants’ knowledge training conditions and predict participants’ next victim type to be rescued. We evaluated ASI agents’ capabilities in three ways: (a) comparison to ground truth—the actual knowledge training condition and participant actions; (b) comparison among different ASI agents; and (c) comparison to a human observer criterion, whose accuracy served as a reference point. The human observers and the ASI agents used video data and timestamped event messages from the testbed, respectively, to make inferences about the same participants and topic (knowledge training condition) and the same instances of participant actions (rescue of victims). Overall, ASI agents performed better than human observers in inferring knowledge training conditions and predicting actions. Refining the human criterion can guide the design and evaluation of ASI agents for complex task environments and team composition.

KW - Artificial social intelligence

KW - Baseline

KW - Evaluation

KW - Human observer criterion

KW - Minecraft

KW - Search and rescue

KW - Theory of mind

UR - http://www.scopus.com/inward/record.url?scp=85152798769&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85152798769&partnerID=8YFLogxK

U2 - 10.1111/tops.12648

DO - 10.1111/tops.12648

M3 - Article

AN - SCOPUS:85152798769

SN - 1756-8757

JO - Topics in Cognitive Science

JF - Topics in Cognitive Science

ER -

Establishing Human Observer Criterion in Evaluating Artificial Social Intelligence Agents in a Search and Rescue Task

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this