Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Training parents, and other primary caregivers, in pivotal response treatment (PRT) has been shown to help children with autism increase their communication skills. This is most effective when the parent maintains a high degree of fidelity to the PRT methodology. Evaluation of a parent’s implementation is currently limited to manual review of PRT sessions by a trained clinician. This process is time consuming and limited in the amount of feedback that can be provided. It also makes long term support for parents who have undergone training difficult. Providing automated data extraction and analysis would alleviate the costs of providing feedback to parents. Since vocal communication is of the most common target skills for PRT implementation, audio analysis is critical to a successful feedback system. Speech patterns in PRT sessions are atypical to common speech that provide a change for audio analysis systems. Adults involved in the treatment often use child-directed language and over exaggerated exclamations as a means of engaging the child. Child speech recognition is a difficult problem that is compounded when children have limited vocal expression. Additionally, PRT sessions depict joint play activities, often producing loud, sustained noise. To address these challenges, audio classification techniques were explored to determine a methodology for labeling audio segments in videos of PRT sessions. By implementing separate support vector machine (SVM) implementations for speech activity, and speaker separation, an average accuracy of 79% was achieved.

Original languageEnglish (US)
Title of host publicationLearning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings
EditorsPanayiotis Zaphiris, Andri Ioannou
PublisherSpringer Verlag
Pages270-286
Number of pages17
ISBN (Print)9783030218164
DOIs
StatePublished - Jan 1 2019
Event6th International Conference on Learning and Collaboration Technologies, LCT 2019, held as part of the 21st International Conference on Human-Computer Interaction, HCI International 2019 - Orlando, United States
Duration: Jul 26 2019Jul 31 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11591 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference6th International Conference on Learning and Collaboration Technologies, LCT 2019, held as part of the 21st International Conference on Human-Computer Interaction, HCI International 2019
CountryUnited States
CityOrlando
Period7/26/197/31/19

Fingerprint

Voice Activity Detection
Probe
Feedback
Communication
Speech recognition
Labeling
Support vector machines
Children
Methodology
Costs
Feedback Systems
Speech Recognition
Systems Analysis
Fidelity
Support Vector Machine
Target

Keywords

  • Autism spectrum disorder
  • Child speech detection
  • Dyadic audio analysis
  • Pivotal response treatment
  • Speaker separation
  • Vocal activity detection

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Heath, C. D. C., McDaniel, T., Venkateswara, H., & Panchanathan, S. (2019). Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes. In P. Zaphiris, & A. Ioannou (Eds.), Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings (pp. 270-286). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11591 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-21817-1_21

Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes. / Heath, Corey D.C.; McDaniel, Troy; Venkateswara, Hemanth; Panchanathan, Sethuraman.

Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings. ed. / Panayiotis Zaphiris; Andri Ioannou. Springer Verlag, 2019. p. 270-286 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11591 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Heath, CDC, McDaniel, T, Venkateswara, H & Panchanathan, S 2019, Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes. in P Zaphiris & A Ioannou (eds), Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11591 LNCS, Springer Verlag, pp. 270-286, 6th International Conference on Learning and Collaboration Technologies, LCT 2019, held as part of the 21st International Conference on Human-Computer Interaction, HCI International 2019, Orlando, United States, 7/26/19. https://doi.org/10.1007/978-3-030-21817-1_21
Heath CDC, McDaniel T, Venkateswara H, Panchanathan S. Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes. In Zaphiris P, Ioannou A, editors, Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings. Springer Verlag. 2019. p. 270-286. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-21817-1_21
Heath, Corey D.C. ; McDaniel, Troy ; Venkateswara, Hemanth ; Panchanathan, Sethuraman. / Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes. Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings. editor / Panayiotis Zaphiris ; Andri Ioannou. Springer Verlag, 2019. pp. 270-286 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{e3b33beb521a4d6a958830c3ae4bb203,
title = "Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes",
abstract = "Training parents, and other primary caregivers, in pivotal response treatment (PRT) has been shown to help children with autism increase their communication skills. This is most effective when the parent maintains a high degree of fidelity to the PRT methodology. Evaluation of a parent’s implementation is currently limited to manual review of PRT sessions by a trained clinician. This process is time consuming and limited in the amount of feedback that can be provided. It also makes long term support for parents who have undergone training difficult. Providing automated data extraction and analysis would alleviate the costs of providing feedback to parents. Since vocal communication is of the most common target skills for PRT implementation, audio analysis is critical to a successful feedback system. Speech patterns in PRT sessions are atypical to common speech that provide a change for audio analysis systems. Adults involved in the treatment often use child-directed language and over exaggerated exclamations as a means of engaging the child. Child speech recognition is a difficult problem that is compounded when children have limited vocal expression. Additionally, PRT sessions depict joint play activities, often producing loud, sustained noise. To address these challenges, audio classification techniques were explored to determine a methodology for labeling audio segments in videos of PRT sessions. By implementing separate support vector machine (SVM) implementations for speech activity, and speaker separation, an average accuracy of 79{\%} was achieved.",
keywords = "Autism spectrum disorder, Child speech detection, Dyadic audio analysis, Pivotal response treatment, Speaker separation, Vocal activity detection",
author = "Heath, {Corey D.C.} and Troy McDaniel and Hemanth Venkateswara and Sethuraman Panchanathan",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-21817-1_21",
language = "English (US)",
isbn = "9783030218164",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "270--286",
editor = "Panayiotis Zaphiris and Andri Ioannou",
booktitle = "Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings",

}

TY - GEN

T1 - Parent and Child Voice Activity Detection in Pivotal Response Treatment Video Probes

AU - Heath, Corey D.C.

AU - McDaniel, Troy

AU - Venkateswara, Hemanth

AU - Panchanathan, Sethuraman

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Training parents, and other primary caregivers, in pivotal response treatment (PRT) has been shown to help children with autism increase their communication skills. This is most effective when the parent maintains a high degree of fidelity to the PRT methodology. Evaluation of a parent’s implementation is currently limited to manual review of PRT sessions by a trained clinician. This process is time consuming and limited in the amount of feedback that can be provided. It also makes long term support for parents who have undergone training difficult. Providing automated data extraction and analysis would alleviate the costs of providing feedback to parents. Since vocal communication is of the most common target skills for PRT implementation, audio analysis is critical to a successful feedback system. Speech patterns in PRT sessions are atypical to common speech that provide a change for audio analysis systems. Adults involved in the treatment often use child-directed language and over exaggerated exclamations as a means of engaging the child. Child speech recognition is a difficult problem that is compounded when children have limited vocal expression. Additionally, PRT sessions depict joint play activities, often producing loud, sustained noise. To address these challenges, audio classification techniques were explored to determine a methodology for labeling audio segments in videos of PRT sessions. By implementing separate support vector machine (SVM) implementations for speech activity, and speaker separation, an average accuracy of 79% was achieved.

AB - Training parents, and other primary caregivers, in pivotal response treatment (PRT) has been shown to help children with autism increase their communication skills. This is most effective when the parent maintains a high degree of fidelity to the PRT methodology. Evaluation of a parent’s implementation is currently limited to manual review of PRT sessions by a trained clinician. This process is time consuming and limited in the amount of feedback that can be provided. It also makes long term support for parents who have undergone training difficult. Providing automated data extraction and analysis would alleviate the costs of providing feedback to parents. Since vocal communication is of the most common target skills for PRT implementation, audio analysis is critical to a successful feedback system. Speech patterns in PRT sessions are atypical to common speech that provide a change for audio analysis systems. Adults involved in the treatment often use child-directed language and over exaggerated exclamations as a means of engaging the child. Child speech recognition is a difficult problem that is compounded when children have limited vocal expression. Additionally, PRT sessions depict joint play activities, often producing loud, sustained noise. To address these challenges, audio classification techniques were explored to determine a methodology for labeling audio segments in videos of PRT sessions. By implementing separate support vector machine (SVM) implementations for speech activity, and speaker separation, an average accuracy of 79% was achieved.

KW - Autism spectrum disorder

KW - Child speech detection

KW - Dyadic audio analysis

KW - Pivotal response treatment

KW - Speaker separation

KW - Vocal activity detection

UR - http://www.scopus.com/inward/record.url?scp=85069828381&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85069828381&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-21817-1_21

DO - 10.1007/978-3-030-21817-1_21

M3 - Conference contribution

AN - SCOPUS:85069828381

SN - 9783030218164

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 270

EP - 286

BT - Learning and Collaboration Technologies. Ubiquitous and Virtual Environments for Learning and Collaboration - 6th International Conference, LCT 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Proceedings

A2 - Zaphiris, Panayiotis

A2 - Ioannou, Andri

PB - Springer Verlag

ER -