Abstract

Feature selection is a necessary preprocessing step in data analytics. Most distribution-based feature selection algorithms are parametric approaches that assume a normal distribution for the data. Often times, however, real world data do not follow a normal distribution, instead following a lognormal distribution. This is especially true in biology where latent factors often dictate distribution patterns. Parametric-based approaches are not well suited for this type of distribution. We propose the Maximum Distance Minimum Error (MDME) method, a non-parametric approach capable of handling both normal and log-normal data sets. The MDME method is based on the Kolmogorov-Smirnov test, which is well known for its ability to accurately test the dependency between two distributions without normal distribution assumptions. We test our MDME method on multiple datasets and demonstrate that our approach performs comparable to and often times better than the traditional parametric-based approaches.

Original languageEnglish (US)
Title of host publication2017 Intelligent Systems Conference, IntelliSys 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages670-677
Number of pages8
Volume2018-January
ISBN (Electronic)9781509064359
DOIs
StatePublished - Mar 23 2018
Event2017 Intelligent Systems Conference, IntelliSys 2017 - London, United Kingdom
Duration: Sep 7 2017Sep 8 2017

Other

Other2017 Intelligent Systems Conference, IntelliSys 2017
CountryUnited Kingdom
CityLondon
Period9/7/179/8/17

Keywords

  • Feature Selection
  • High Content Screening
  • Kolmogorov-Smirnov Test

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Control and Optimization

Fingerprint Dive into the research topics of 'Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data'. Together they form a unique fingerprint.

  • Cite this

    Trevino, R. P., Lamkin, T. J., Smith, R., Kawamoto, S. A., & Liu, H. (2018). Maximum Distance Minimum Error (MDME): A non-parametric approach to feature selection for image-based high content screening data. In 2017 Intelligent Systems Conference, IntelliSys 2017 (Vol. 2018-January, pp. 670-677). [8324366] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IntelliSys.2017.8324366