Abstract
Building clinical speech analytics models that will reliably translate in-clinic requires a realistic characterization of their performance. So, how well do we estimate the accuracy of published models in the literature? We evaluate the relationship between sample size and reported accuracy across 77 journal publications that use speech to classify between healthy controls and patients with dementia. The studies are combined across three meta-analyses that use the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocol. The results show that reported accuracy declines as a function of increasing sample size, with small sample size studies yielding an overoptimistic estimate of the accuracy. For correctly trained models, this is unexpected as the ability of a machine learning model to predict group membership ought to remain the same or improve with additional training data. We posit that the overoptimism is the result of a combination of publication bias and overfitting and suggest mitigation strategies.
Original language | English (US) |
---|---|
Pages (from-to) | 2453-2457 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2022-September |
DOIs | |
State | Published - 2022 |
Event | 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: Sep 18 2022 → Sep 22 2022 |
Keywords
- clinical speech analytics
- dementia
- MCI
- natural language processing
- robust machine learning
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation