Abstract

Feature selection is a necessary preprocessing step in data analytics. Most distribution-based feature selection algorithms are parametric approaches that assume a normal distribution for the data. Often times, however, real world data do not follow a normal distribution, instead following a lognormal distribution. This is especially true in biology where latent factors often dictate distribution patterns. Parametric-based approaches are not well suited for this type of distribution. We propose the Maximum Distance Minimum Error (MDME) method, a non-parametric approach capable of handling both normal and log-normal data sets. The MDME method is based on the Kolmogorov-Smirnov test, which is well known for its ability to accurately test the dependency between two distributions without normal distribution assumptions. We test our MDME method on multiple datasets and demonstrate that our approach performs comparable to and often times better than the traditional parametric-based approaches.

Original languageEnglish (US)
Title of host publication2017 Intelligent Systems Conference, IntelliSys 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages670-677
Number of pages8
Volume2018-January
ISBN (Electronic)9781509064359
DOIs
StatePublished - Mar 23 2018
Event2017 Intelligent Systems Conference, IntelliSys 2017 - London, United Kingdom
Duration: Sep 7 2017Sep 8 2017

Other

Other2017 Intelligent Systems Conference, IntelliSys 2017
CountryUnited Kingdom
CityLondon
Period9/7/179/8/17

Fingerprint

Normal distribution
Minimum Distance
Feature Selection
Screening
Feature extraction
Gaussian distribution
Kolmogorov-Smirnov Test
Log Normal Distribution
Biology
Preprocessing
Necessary
Demonstrate

Keywords

  • Feature Selection
  • High Content Screening
  • Kolmogorov-Smirnov Test

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Control and Optimization

Cite this

Trevino, R. P., Lamkin, T. J., Smith, R., Kawamoto, S. A., & Liu, H. (2018). Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data. In 2017 Intelligent Systems Conference, IntelliSys 2017 (Vol. 2018-January, pp. 670-677). [8324366] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IntelliSys.2017.8324366

Maximum Distance Minimum Error (MDME) : A non-parametric approach to feature selection for image-based high content screening data. / Trevino, Robert P.; Lamkin, Thomas J.; Smith, Ross; Kawamoto, Steve A.; Liu, Huan.

2017 Intelligent Systems Conference, IntelliSys 2017. Vol. 2018-January Institute of Electrical and Electronics Engineers Inc., 2018. p. 670-677 8324366.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Trevino, RP, Lamkin, TJ, Smith, R, Kawamoto, SA & Liu, H 2018, Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data. in 2017 Intelligent Systems Conference, IntelliSys 2017. vol. 2018-January, 8324366, Institute of Electrical and Electronics Engineers Inc., pp. 670-677, 2017 Intelligent Systems Conference, IntelliSys 2017, London, United Kingdom, 9/7/17. https://doi.org/10.1109/IntelliSys.2017.8324366
Trevino RP, Lamkin TJ, Smith R, Kawamoto SA, Liu H. Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data. In 2017 Intelligent Systems Conference, IntelliSys 2017. Vol. 2018-January. Institute of Electrical and Electronics Engineers Inc. 2018. p. 670-677. 8324366 https://doi.org/10.1109/IntelliSys.2017.8324366
Trevino, Robert P. ; Lamkin, Thomas J. ; Smith, Ross ; Kawamoto, Steve A. ; Liu, Huan. / Maximum Distance Minimum Error (MDME) : A non-parametric approach to feature selection for image-based high content screening data. 2017 Intelligent Systems Conference, IntelliSys 2017. Vol. 2018-January Institute of Electrical and Electronics Engineers Inc., 2018. pp. 670-677
@inproceedings{3d120af082b041f39bd8e1f99da5d4b3,
title = "Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data",
abstract = "Feature selection is a necessary preprocessing step in data analytics. Most distribution-based feature selection algorithms are parametric approaches that assume a normal distribution for the data. Often times, however, real world data do not follow a normal distribution, instead following a lognormal distribution. This is especially true in biology where latent factors often dictate distribution patterns. Parametric-based approaches are not well suited for this type of distribution. We propose the Maximum Distance Minimum Error (MDME) method, a non-parametric approach capable of handling both normal and log-normal data sets. The MDME method is based on the Kolmogorov-Smirnov test, which is well known for its ability to accurately test the dependency between two distributions without normal distribution assumptions. We test our MDME method on multiple datasets and demonstrate that our approach performs comparable to and often times better than the traditional parametric-based approaches.",
keywords = "Feature Selection, High Content Screening, Kolmogorov-Smirnov Test",
author = "Trevino, {Robert P.} and Lamkin, {Thomas J.} and Ross Smith and Kawamoto, {Steve A.} and Huan Liu",
year = "2018",
month = "3",
day = "23",
doi = "10.1109/IntelliSys.2017.8324366",
language = "English (US)",
volume = "2018-January",
pages = "670--677",
booktitle = "2017 Intelligent Systems Conference, IntelliSys 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Maximum Distance Minimum Error (MDME)

T2 - A non-parametric approach to feature selection for image-based high content screening data

AU - Trevino, Robert P.

AU - Lamkin, Thomas J.

AU - Smith, Ross

AU - Kawamoto, Steve A.

AU - Liu, Huan

PY - 2018/3/23

Y1 - 2018/3/23

N2 - Feature selection is a necessary preprocessing step in data analytics. Most distribution-based feature selection algorithms are parametric approaches that assume a normal distribution for the data. Often times, however, real world data do not follow a normal distribution, instead following a lognormal distribution. This is especially true in biology where latent factors often dictate distribution patterns. Parametric-based approaches are not well suited for this type of distribution. We propose the Maximum Distance Minimum Error (MDME) method, a non-parametric approach capable of handling both normal and log-normal data sets. The MDME method is based on the Kolmogorov-Smirnov test, which is well known for its ability to accurately test the dependency between two distributions without normal distribution assumptions. We test our MDME method on multiple datasets and demonstrate that our approach performs comparable to and often times better than the traditional parametric-based approaches.

AB - Feature selection is a necessary preprocessing step in data analytics. Most distribution-based feature selection algorithms are parametric approaches that assume a normal distribution for the data. Often times, however, real world data do not follow a normal distribution, instead following a lognormal distribution. This is especially true in biology where latent factors often dictate distribution patterns. Parametric-based approaches are not well suited for this type of distribution. We propose the Maximum Distance Minimum Error (MDME) method, a non-parametric approach capable of handling both normal and log-normal data sets. The MDME method is based on the Kolmogorov-Smirnov test, which is well known for its ability to accurately test the dependency between two distributions without normal distribution assumptions. We test our MDME method on multiple datasets and demonstrate that our approach performs comparable to and often times better than the traditional parametric-based approaches.

KW - Feature Selection

KW - High Content Screening

KW - Kolmogorov-Smirnov Test

UR - http://www.scopus.com/inward/record.url?scp=85051074530&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051074530&partnerID=8YFLogxK

U2 - 10.1109/IntelliSys.2017.8324366

DO - 10.1109/IntelliSys.2017.8324366

M3 - Conference contribution

VL - 2018-January

SP - 670

EP - 677

BT - 2017 Intelligent Systems Conference, IntelliSys 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -