Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data

Robert P. Trevino; Thomas J. Lamkin; Ross Smith; Steve A. Kawamoto; Huan Liu

doi:10.1109/IntelliSys.2017.8324366

Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data

Robert P. Trevino, Thomas J. Lamkin, Ross Smith, Steve A. Kawamoto, Huan Liu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

Feature selection is a necessary preprocessing step in data analytics. Most distribution-based feature selection algorithms are parametric approaches that assume a normal distribution for the data. Often times, however, real world data do not follow a normal distribution, instead following a lognormal distribution. This is especially true in biology where latent factors often dictate distribution patterns. Parametric-based approaches are not well suited for this type of distribution. We propose the Maximum Distance Minimum Error (MDME) method, a non-parametric approach capable of handling both normal and log-normal data sets. The MDME method is based on the Kolmogorov-Smirnov test, which is well known for its ability to accurately test the dependency between two distributions without normal distribution assumptions. We test our MDME method on multiple datasets and demonstrate that our approach performs comparable to and often times better than the traditional parametric-based approaches.

Original language	English (US)
Title of host publication	2017 Intelligent Systems Conference, IntelliSys 2017
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	670-677
Number of pages	8
ISBN (Electronic)	9781509064359
DOIs	https://doi.org/10.1109/IntelliSys.2017.8324366
State	Published - Mar 23 2018
Event	2017 Intelligent Systems Conference, IntelliSys 2017 - London, United Kingdom Duration: Sep 7 2017 → Sep 8 2017

Publication series

Name	2017 Intelligent Systems Conference, IntelliSys 2017
Volume	2018-January

Other

Other	2017 Intelligent Systems Conference, IntelliSys 2017
Country/Territory	United Kingdom
City	London
Period	9/7/17 → 9/8/17

Keywords

Feature Selection
High Content Screening
Kolmogorov-Smirnov Test

ASJC Scopus subject areas

Computer Science Applications
Computer Networks and Communications
Artificial Intelligence
Computer Vision and Pattern Recognition
Control and Optimization

Access to Document

10.1109/IntelliSys.2017.8324366

Cite this

Trevino, R. P., Lamkin, T. J., Smith, R., Kawamoto, S. A., & Liu, H. (2018). Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data. In 2017 Intelligent Systems Conference, IntelliSys 2017 (pp. 670-677). Article 8324366 (2017 Intelligent Systems Conference, IntelliSys 2017; Vol. 2018-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IntelliSys.2017.8324366

Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data. / Trevino, Robert P.; Lamkin, Thomas J.; Smith, Ross et al.
2017 Intelligent Systems Conference, IntelliSys 2017. Institute of Electrical and Electronics Engineers Inc., 2018. p. 670-677 8324366 (2017 Intelligent Systems Conference, IntelliSys 2017; Vol. 2018-January).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Trevino, RP, Lamkin, TJ, Smith, R, Kawamoto, SA & Liu, H 2018, Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data. in 2017 Intelligent Systems Conference, IntelliSys 2017., 8324366, 2017 Intelligent Systems Conference, IntelliSys 2017, vol. 2018-January, Institute of Electrical and Electronics Engineers Inc., pp. 670-677, 2017 Intelligent Systems Conference, IntelliSys 2017, London, United Kingdom, 9/7/17. https://doi.org/10.1109/IntelliSys.2017.8324366

Trevino RP, Lamkin TJ, Smith R, Kawamoto SA, Liu H. Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data. In 2017 Intelligent Systems Conference, IntelliSys 2017. Institute of Electrical and Electronics Engineers Inc. 2018. p. 670-677. 8324366. (2017 Intelligent Systems Conference, IntelliSys 2017). doi: 10.1109/IntelliSys.2017.8324366

Trevino, Robert P. ; Lamkin, Thomas J. ; Smith, Ross et al. / Maximum Distance Minimum Error (MDME) : A non-parametric approach to feature selection for image-based high content screening data. 2017 Intelligent Systems Conference, IntelliSys 2017. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 670-677 (2017 Intelligent Systems Conference, IntelliSys 2017).

@inproceedings{3d120af082b041f39bd8e1f99da5d4b3,

title = "Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data",

abstract = "Feature selection is a necessary preprocessing step in data analytics. Most distribution-based feature selection algorithms are parametric approaches that assume a normal distribution for the data. Often times, however, real world data do not follow a normal distribution, instead following a lognormal distribution. This is especially true in biology where latent factors often dictate distribution patterns. Parametric-based approaches are not well suited for this type of distribution. We propose the Maximum Distance Minimum Error (MDME) method, a non-parametric approach capable of handling both normal and log-normal data sets. The MDME method is based on the Kolmogorov-Smirnov test, which is well known for its ability to accurately test the dependency between two distributions without normal distribution assumptions. We test our MDME method on multiple datasets and demonstrate that our approach performs comparable to and often times better than the traditional parametric-based approaches.",

keywords = "Feature Selection, High Content Screening, Kolmogorov-Smirnov Test",

author = "Trevino, {Robert P.} and Lamkin, {Thomas J.} and Ross Smith and Kawamoto, {Steve A.} and Huan Liu",

note = "Publisher Copyright: {\textcopyright} 2017 IEEE.; 2017 Intelligent Systems Conference, IntelliSys 2017 ; Conference date: 07-09-2017 Through 08-09-2017",

year = "2018",

month = mar,

day = "23",

doi = "10.1109/IntelliSys.2017.8324366",

language = "English (US)",

series = "2017 Intelligent Systems Conference, IntelliSys 2017",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "670--677",

booktitle = "2017 Intelligent Systems Conference, IntelliSys 2017",

}

TY - GEN

T1 - Maximum Distance Minimum Error (MDME)

T2 - 2017 Intelligent Systems Conference, IntelliSys 2017

AU - Trevino, Robert P.

AU - Lamkin, Thomas J.

AU - Smith, Ross

AU - Kawamoto, Steve A.

AU - Liu, Huan

PY - 2018/3/23

Y1 - 2018/3/23

N2 - Feature selection is a necessary preprocessing step in data analytics. Most distribution-based feature selection algorithms are parametric approaches that assume a normal distribution for the data. Often times, however, real world data do not follow a normal distribution, instead following a lognormal distribution. This is especially true in biology where latent factors often dictate distribution patterns. Parametric-based approaches are not well suited for this type of distribution. We propose the Maximum Distance Minimum Error (MDME) method, a non-parametric approach capable of handling both normal and log-normal data sets. The MDME method is based on the Kolmogorov-Smirnov test, which is well known for its ability to accurately test the dependency between two distributions without normal distribution assumptions. We test our MDME method on multiple datasets and demonstrate that our approach performs comparable to and often times better than the traditional parametric-based approaches.

AB - Feature selection is a necessary preprocessing step in data analytics. Most distribution-based feature selection algorithms are parametric approaches that assume a normal distribution for the data. Often times, however, real world data do not follow a normal distribution, instead following a lognormal distribution. This is especially true in biology where latent factors often dictate distribution patterns. Parametric-based approaches are not well suited for this type of distribution. We propose the Maximum Distance Minimum Error (MDME) method, a non-parametric approach capable of handling both normal and log-normal data sets. The MDME method is based on the Kolmogorov-Smirnov test, which is well known for its ability to accurately test the dependency between two distributions without normal distribution assumptions. We test our MDME method on multiple datasets and demonstrate that our approach performs comparable to and often times better than the traditional parametric-based approaches.

KW - Feature Selection

KW - High Content Screening

KW - Kolmogorov-Smirnov Test

UR - http://www.scopus.com/inward/record.url?scp=85051074530&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051074530&partnerID=8YFLogxK

U2 - 10.1109/IntelliSys.2017.8324366

DO - 10.1109/IntelliSys.2017.8324366

M3 - Conference contribution

AN - SCOPUS:85051074530

T3 - 2017 Intelligent Systems Conference, IntelliSys 2017

SP - 670

EP - 677

BT - 2017 Intelligent Systems Conference, IntelliSys 2017

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 7 September 2017 through 8 September 2017

ER -

Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this