Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes

Corey D.C. Heath; Troy McDaniel; Hemanth Venkateswara; Sethuraman Panchanathan

doi:10.1007/978-3-030-21817-1_21

Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes

Corey D.C. Heath, Troy McDaniel, Hemanth Venkateswara, Sethuraman Panchanathan

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

3 Scopus citations

Abstract

Training parents, and other primary caregivers, in pivotal response treatment (PRT) has been shown to help children with autism increase their communication skills. This is most effective when the parent maintains a high degree of fidelity to the PRT methodology. Evaluation of a parent’s implementation is currently limited to manual review of PRT sessions by a trained clinician. This process is time consuming and limited in the amount of feedback that can be provided. It also makes long term support for parents who have undergone training difficult. Providing automated data extraction and analysis would alleviate the costs of providing feedback to parents. Since vocal communication is of the most common target skills for PRT implementation, audio analysis is critical to a successful feedback system. Speech patterns in PRT sessions are atypical to common speech that provide a change for audio analysis systems. Adults involved in the treatment often use child-directed language and over exaggerated exclamations as a means of engaging the child. Child speech recognition is a difficult problem that is compounded when children have limited vocal expression. Additionally, PRT sessions depict joint play activities, often producing loud, sustained noise. To address these challenges, audio classification techniques were explored to determine a methodology for labeling audio segments in videos of PRT sessions. By implementing separate support vector machine (SVM) implementations for speech activity, and speaker separation, an average accuracy of 79% was achieved.

Original language	English (US)
Title of host publication	Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings
Editors	Panayiotis Zaphiris, Andri Ioannou
Publisher	Springer Verlag
Pages	270-286
Number of pages	17
ISBN (Print)	9783030218164
DOIs	https://doi.org/10.1007/978-3-030-21817-1_21
State	Published - 2019
Event	6th International Conference on Learning and Collaboration Technologies, LCT 2019, held as part of the 21st International Conference on Human-Computer Interaction, HCI International 2019 - Orlando, United States Duration: Jul 26 2019 → Jul 31 2019

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11591 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	6th International Conference on Learning and Collaboration Technologies, LCT 2019, held as part of the 21st International Conference on Human-Computer Interaction, HCI International 2019
Country/Territory	United States
City	Orlando
Period	7/26/19 → 7/31/19

Keywords

Autism spectrum disorder
Child speech detection
Dyadic audio analysis
Pivotal response treatment
Speaker separation
Vocal activity detection

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-030-21817-1_21

Cite this

Heath, C. D. C., McDaniel, T., Venkateswara, H., & Panchanathan, S. (2019). Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes. In P. Zaphiris, & A. Ioannou (Eds.), Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings (pp. 270-286). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11591 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-21817-1_21

Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes. / Heath, Corey D.C.; McDaniel, Troy; Venkateswara, Hemanth et al.
Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings. ed. / Panayiotis Zaphiris; Andri Ioannou. Springer Verlag, 2019. p. 270-286 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11591 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Heath, CDC, McDaniel, T, Venkateswara, H & Panchanathan, S 2019, Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes. in P Zaphiris & A Ioannou (eds), Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11591 LNCS, Springer Verlag, pp. 270-286, 6th International Conference on Learning and Collaboration Technologies, LCT 2019, held as part of the 21st International Conference on Human-Computer Interaction, HCI International 2019, Orlando, United States, 7/26/19. https://doi.org/10.1007/978-3-030-21817-1_21

Heath CDC, McDaniel T, Venkateswara H, Panchanathan S. Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes. In Zaphiris P, Ioannou A, editors, Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings. Springer Verlag. 2019. p. 270-286. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-21817-1_21

Heath, Corey D.C. ; McDaniel, Troy ; Venkateswara, Hemanth et al. / Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes. Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings. editor / Panayiotis Zaphiris ; Andri Ioannou. Springer Verlag, 2019. pp. 270-286 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{e3b33beb521a4d6a958830c3ae4bb203,

title = "Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes",

abstract = "Training parents, and other primary caregivers, in pivotal response treatment (PRT) has been shown to help children with autism increase their communication skills. This is most effective when the parent maintains a high degree of fidelity to the PRT methodology. Evaluation of a parent{\textquoteright}s implementation is currently limited to manual review of PRT sessions by a trained clinician. This process is time consuming and limited in the amount of feedback that can be provided. It also makes long term support for parents who have undergone training difficult. Providing automated data extraction and analysis would alleviate the costs of providing feedback to parents. Since vocal communication is of the most common target skills for PRT implementation, audio analysis is critical to a successful feedback system. Speech patterns in PRT sessions are atypical to common speech that provide a change for audio analysis systems. Adults involved in the treatment often use child-directed language and over exaggerated exclamations as a means of engaging the child. Child speech recognition is a difficult problem that is compounded when children have limited vocal expression. Additionally, PRT sessions depict joint play activities, often producing loud, sustained noise. To address these challenges, audio classification techniques were explored to determine a methodology for labeling audio segments in videos of PRT sessions. By implementing separate support vector machine (SVM) implementations for speech activity, and speaker separation, an average accuracy of 79% was achieved.",

keywords = "Autism spectrum disorder, Child speech detection, Dyadic audio analysis, Pivotal response treatment, Speaker separation, Vocal activity detection",

author = "Heath, {Corey D.C.} and Troy McDaniel and Hemanth Venkateswara and Sethuraman Panchanathan",

note = "Funding Information: Acknowledgements. The authors thank Arizona State University and the National Science Foundation for their funding support. This material is partially based upon work supported by the National Science Foundation under Grant No. 1069125 and 1828010. Publisher Copyright: {\textcopyright} 2019, Springer Nature Switzerland AG.; 6th International Conference on Learning and Collaboration Technologies, LCT 2019, held as part of the 21st International Conference on Human-Computer Interaction, HCI International 2019 ; Conference date: 26-07-2019 Through 31-07-2019",

year = "2019",

doi = "10.1007/978-3-030-21817-1_21",

language = "English (US)",

isbn = "9783030218164",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "270--286",

editor = "Panayiotis Zaphiris and Andri Ioannou",

booktitle = "Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings",

}

TY - GEN

T1 - Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes

AU - Heath, Corey D.C.

AU - McDaniel, Troy

AU - Venkateswara, Hemanth

AU - Panchanathan, Sethuraman

N1 - Funding Information: Acknowledgements. The authors thank Arizona State University and the National Science Foundation for their funding support. This material is partially based upon work supported by the National Science Foundation under Grant No. 1069125 and 1828010. Publisher Copyright: © 2019, Springer Nature Switzerland AG.

PY - 2019

Y1 - 2019

N2 - Training parents, and other primary caregivers, in pivotal response treatment (PRT) has been shown to help children with autism increase their communication skills. This is most effective when the parent maintains a high degree of fidelity to the PRT methodology. Evaluation of a parent’s implementation is currently limited to manual review of PRT sessions by a trained clinician. This process is time consuming and limited in the amount of feedback that can be provided. It also makes long term support for parents who have undergone training difficult. Providing automated data extraction and analysis would alleviate the costs of providing feedback to parents. Since vocal communication is of the most common target skills for PRT implementation, audio analysis is critical to a successful feedback system. Speech patterns in PRT sessions are atypical to common speech that provide a change for audio analysis systems. Adults involved in the treatment often use child-directed language and over exaggerated exclamations as a means of engaging the child. Child speech recognition is a difficult problem that is compounded when children have limited vocal expression. Additionally, PRT sessions depict joint play activities, often producing loud, sustained noise. To address these challenges, audio classification techniques were explored to determine a methodology for labeling audio segments in videos of PRT sessions. By implementing separate support vector machine (SVM) implementations for speech activity, and speaker separation, an average accuracy of 79% was achieved.

AB - Training parents, and other primary caregivers, in pivotal response treatment (PRT) has been shown to help children with autism increase their communication skills. This is most effective when the parent maintains a high degree of fidelity to the PRT methodology. Evaluation of a parent’s implementation is currently limited to manual review of PRT sessions by a trained clinician. This process is time consuming and limited in the amount of feedback that can be provided. It also makes long term support for parents who have undergone training difficult. Providing automated data extraction and analysis would alleviate the costs of providing feedback to parents. Since vocal communication is of the most common target skills for PRT implementation, audio analysis is critical to a successful feedback system. Speech patterns in PRT sessions are atypical to common speech that provide a change for audio analysis systems. Adults involved in the treatment often use child-directed language and over exaggerated exclamations as a means of engaging the child. Child speech recognition is a difficult problem that is compounded when children have limited vocal expression. Additionally, PRT sessions depict joint play activities, often producing loud, sustained noise. To address these challenges, audio classification techniques were explored to determine a methodology for labeling audio segments in videos of PRT sessions. By implementing separate support vector machine (SVM) implementations for speech activity, and speaker separation, an average accuracy of 79% was achieved.

KW - Autism spectrum disorder

KW - Child speech detection

KW - Dyadic audio analysis

KW - Pivotal response treatment

KW - Speaker separation

KW - Vocal activity detection

UR - http://www.scopus.com/inward/record.url?scp=85069828381&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069828381&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-21817-1_21

DO - 10.1007/978-3-030-21817-1_21

M3 - Conference contribution

AN - SCOPUS:85069828381

SN - 9783030218164

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 270

EP - 286

BT - Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings

A2 - Zaphiris, Panayiotis

A2 - Ioannou, Andri

PB - Springer Verlag

T2 - 6th International Conference on Learning and Collaboration Technologies, LCT 2019, held as part of the 21st International Conference on Human-Computer Interaction, HCI International 2019

Y2 - 26 July 2019 through 31 July 2019

ER -

Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this