TY - GEN
T1 - Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes
AU - Heath, Corey D.C.
AU - McDaniel, Troy
AU - Venkateswara, Hemanth
AU - Panchanathan, Sethuraman
N1 - Funding Information:
Acknowledgements. The authors thank Arizona State University and the National Science Foundation for their funding support. This material is partially based upon work supported by the National Science Foundation under Grant No. 1069125 and 1828010.
Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Training parents, and other primary caregivers, in pivotal response treatment (PRT) has been shown to help children with autism increase their communication skills. This is most effective when the parent maintains a high degree of fidelity to the PRT methodology. Evaluation of a parent’s implementation is currently limited to manual review of PRT sessions by a trained clinician. This process is time consuming and limited in the amount of feedback that can be provided. It also makes long term support for parents who have undergone training difficult. Providing automated data extraction and analysis would alleviate the costs of providing feedback to parents. Since vocal communication is of the most common target skills for PRT implementation, audio analysis is critical to a successful feedback system. Speech patterns in PRT sessions are atypical to common speech that provide a change for audio analysis systems. Adults involved in the treatment often use child-directed language and over exaggerated exclamations as a means of engaging the child. Child speech recognition is a difficult problem that is compounded when children have limited vocal expression. Additionally, PRT sessions depict joint play activities, often producing loud, sustained noise. To address these challenges, audio classification techniques were explored to determine a methodology for labeling audio segments in videos of PRT sessions. By implementing separate support vector machine (SVM) implementations for speech activity, and speaker separation, an average accuracy of 79% was achieved.
AB - Training parents, and other primary caregivers, in pivotal response treatment (PRT) has been shown to help children with autism increase their communication skills. This is most effective when the parent maintains a high degree of fidelity to the PRT methodology. Evaluation of a parent’s implementation is currently limited to manual review of PRT sessions by a trained clinician. This process is time consuming and limited in the amount of feedback that can be provided. It also makes long term support for parents who have undergone training difficult. Providing automated data extraction and analysis would alleviate the costs of providing feedback to parents. Since vocal communication is of the most common target skills for PRT implementation, audio analysis is critical to a successful feedback system. Speech patterns in PRT sessions are atypical to common speech that provide a change for audio analysis systems. Adults involved in the treatment often use child-directed language and over exaggerated exclamations as a means of engaging the child. Child speech recognition is a difficult problem that is compounded when children have limited vocal expression. Additionally, PRT sessions depict joint play activities, often producing loud, sustained noise. To address these challenges, audio classification techniques were explored to determine a methodology for labeling audio segments in videos of PRT sessions. By implementing separate support vector machine (SVM) implementations for speech activity, and speaker separation, an average accuracy of 79% was achieved.
KW - Autism spectrum disorder
KW - Child speech detection
KW - Dyadic audio analysis
KW - Pivotal response treatment
KW - Speaker separation
KW - Vocal activity detection
UR - http://www.scopus.com/inward/record.url?scp=85069828381&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069828381&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-21817-1_21
DO - 10.1007/978-3-030-21817-1_21
M3 - Conference contribution
AN - SCOPUS:85069828381
SN - 9783030218164
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 270
EP - 286
BT - Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings
A2 - Zaphiris, Panayiotis
A2 - Ioannou, Andri
PB - Springer Verlag
T2 - 6th International Conference on Learning and Collaboration Technologies, LCT 2019, held as part of the 21st International Conference on Human-Computer Interaction, HCI International 2019
Y2 - 26 July 2019 through 31 July 2019
ER -