Abstract

In regression analysis, outliers in the data can induce a bias in the learned function, resulting in larger errors. In this paper we derive an empirically estimable bound on the regression error based on a Euclidean minimum spanning tree generated from the data. Using this bound as motivation, we propose an iterative approach to remove data with noisy responses from the training set. We evaluate the performance of the algorithm on experiments with real-world pathological speech (speech from individuals with neurogenic disorders). Comparative results show that removing noisy examples during training using the proposed approach yields better predictive performance on out-of-sample data.

Original languageEnglish (US)
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2066-2070
Number of pages5
Volume2015-August
ISBN (Print)9781467369978
DOIs
StatePublished - Aug 4 2015
Event40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Brisbane, Australia
Duration: Apr 19 2014Apr 24 2014

Other

Other40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
CountryAustralia
CityBrisbane
Period4/19/144/24/14

Fingerprint

Regression analysis
Experiments

Keywords

  • Friedman-Rafsky statistic
  • minimum spanning tree
  • noisy data
  • outlier removal
  • robust regression

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Wisler, A., Berisha, V., Ramamurthy, K., Spanias, A., & Liss, J. (2015). Removing data with noisy responses in regression analysis. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2015-August, pp. 2066-2070). [7178334] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2015.7178334

Removing data with noisy responses in regression analysis. / Wisler, Alan; Berisha, Visar; Ramamurthy, Karthikeyan; Spanias, Andreas; Liss, Julie.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 2015-August Institute of Electrical and Electronics Engineers Inc., 2015. p. 2066-2070 7178334.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wisler, A, Berisha, V, Ramamurthy, K, Spanias, A & Liss, J 2015, Removing data with noisy responses in regression analysis. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. vol. 2015-August, 7178334, Institute of Electrical and Electronics Engineers Inc., pp. 2066-2070, 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015, Brisbane, Australia, 4/19/14. https://doi.org/10.1109/ICASSP.2015.7178334
Wisler A, Berisha V, Ramamurthy K, Spanias A, Liss J. Removing data with noisy responses in regression analysis. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 2015-August. Institute of Electrical and Electronics Engineers Inc. 2015. p. 2066-2070. 7178334 https://doi.org/10.1109/ICASSP.2015.7178334
Wisler, Alan ; Berisha, Visar ; Ramamurthy, Karthikeyan ; Spanias, Andreas ; Liss, Julie. / Removing data with noisy responses in regression analysis. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 2015-August Institute of Electrical and Electronics Engineers Inc., 2015. pp. 2066-2070
@inproceedings{d746c08129ee40489f9631095e7fcf60,
title = "Removing data with noisy responses in regression analysis",
abstract = "In regression analysis, outliers in the data can induce a bias in the learned function, resulting in larger errors. In this paper we derive an empirically estimable bound on the regression error based on a Euclidean minimum spanning tree generated from the data. Using this bound as motivation, we propose an iterative approach to remove data with noisy responses from the training set. We evaluate the performance of the algorithm on experiments with real-world pathological speech (speech from individuals with neurogenic disorders). Comparative results show that removing noisy examples during training using the proposed approach yields better predictive performance on out-of-sample data.",
keywords = "Friedman-Rafsky statistic, minimum spanning tree, noisy data, outlier removal, robust regression",
author = "Alan Wisler and Visar Berisha and Karthikeyan Ramamurthy and Andreas Spanias and Julie Liss",
year = "2015",
month = "8",
day = "4",
doi = "10.1109/ICASSP.2015.7178334",
language = "English (US)",
isbn = "9781467369978",
volume = "2015-August",
pages = "2066--2070",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Removing data with noisy responses in regression analysis

AU - Wisler, Alan

AU - Berisha, Visar

AU - Ramamurthy, Karthikeyan

AU - Spanias, Andreas

AU - Liss, Julie

PY - 2015/8/4

Y1 - 2015/8/4

N2 - In regression analysis, outliers in the data can induce a bias in the learned function, resulting in larger errors. In this paper we derive an empirically estimable bound on the regression error based on a Euclidean minimum spanning tree generated from the data. Using this bound as motivation, we propose an iterative approach to remove data with noisy responses from the training set. We evaluate the performance of the algorithm on experiments with real-world pathological speech (speech from individuals with neurogenic disorders). Comparative results show that removing noisy examples during training using the proposed approach yields better predictive performance on out-of-sample data.

AB - In regression analysis, outliers in the data can induce a bias in the learned function, resulting in larger errors. In this paper we derive an empirically estimable bound on the regression error based on a Euclidean minimum spanning tree generated from the data. Using this bound as motivation, we propose an iterative approach to remove data with noisy responses from the training set. We evaluate the performance of the algorithm on experiments with real-world pathological speech (speech from individuals with neurogenic disorders). Comparative results show that removing noisy examples during training using the proposed approach yields better predictive performance on out-of-sample data.

KW - Friedman-Rafsky statistic

KW - minimum spanning tree

KW - noisy data

KW - outlier removal

KW - robust regression

UR - http://www.scopus.com/inward/record.url?scp=84946046733&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946046733&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2015.7178334

DO - 10.1109/ICASSP.2015.7178334

M3 - Conference contribution

SN - 9781467369978

VL - 2015-August

SP - 2066

EP - 2070

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -