Development of a web-enabled SVR-based machine learning platform and its application on modeling transgene expression activity of aminoglycoside-derived polycations

Zhuo Zhen, Thrimoorthy Potta, Nicholas A. Lanzillo, Kaushal Rege, Curt M. Breneman

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Objective: Support Vector Regression (SVR) has become increasingly popular in cheminformatics modeling. As a result, SVR-based machine learning algorithms, including Fuzzy-SVR and Least Square-SVR (LS-SVR) have been developed and applied in various research areas. However, at present, few downloadable packages or public-domain software are available for these algorithms. To address this need, we developed the Support vector regression-based Online Learning Equipment (SOLE) web tool (available at http://reccr.chem.rpi.edu/SOLE/index.html) as an online learning system to support predictive cheminformatics and materials informatics studies. Results: In this work, we employed the SOLE system to model transgene expression efficacy of polymers obtained from aminoglycoside antibiotics, which allowed the results of several modeling approaches to be easily compared. All models had test set r2 of 0.96-0.98 and test set R2 of 0.79-0.84. Y-scrambling test showed the models were stable and not over-fitted. Conclusion: SOLE has a user-friendly interface and includes routine elements of performing QSAR/QSPR studies that can be applied in various research areas. It utilizes rational and sophisticated feature selection, model selection and model evaluation processes.

Original languageEnglish (US)
Pages (from-to)41-55
Number of pages15
JournalCombinatorial Chemistry and High Throughput Screening
Volume20
Issue number1
DOIs
StatePublished - Jan 1 2017

Fingerprint

Aminoglycosides
Transgenes
Learning systems
Learning
Online Systems
Informatics
Quantitative Structure-Activity Relationship
Public Sector
Least-Squares Analysis
Research
Polymers
Software
Anti-Bacterial Agents
Equipment and Supplies
Antibiotics
Learning algorithms
User interfaces
Feature extraction
polycations
Machine Learning

Keywords

  • Machine learning
  • QSAR
  • QSPR
  • Regression
  • Software
  • Support vector machine

ASJC Scopus subject areas

  • Drug Discovery
  • Computer Science Applications
  • Organic Chemistry

Cite this

Development of a web-enabled SVR-based machine learning platform and its application on modeling transgene expression activity of aminoglycoside-derived polycations. / Zhen, Zhuo; Potta, Thrimoorthy; Lanzillo, Nicholas A.; Rege, Kaushal; Breneman, Curt M.

In: Combinatorial Chemistry and High Throughput Screening, Vol. 20, No. 1, 01.01.2017, p. 41-55.

Research output: Contribution to journalArticle

@article{b34c630d9d714cadb9e0be6171b444ec,
title = "Development of a web-enabled SVR-based machine learning platform and its application on modeling transgene expression activity of aminoglycoside-derived polycations",
abstract = "Objective: Support Vector Regression (SVR) has become increasingly popular in cheminformatics modeling. As a result, SVR-based machine learning algorithms, including Fuzzy-SVR and Least Square-SVR (LS-SVR) have been developed and applied in various research areas. However, at present, few downloadable packages or public-domain software are available for these algorithms. To address this need, we developed the Support vector regression-based Online Learning Equipment (SOLE) web tool (available at http://reccr.chem.rpi.edu/SOLE/index.html) as an online learning system to support predictive cheminformatics and materials informatics studies. Results: In this work, we employed the SOLE system to model transgene expression efficacy of polymers obtained from aminoglycoside antibiotics, which allowed the results of several modeling approaches to be easily compared. All models had test set r2 of 0.96-0.98 and test set R2 of 0.79-0.84. Y-scrambling test showed the models were stable and not over-fitted. Conclusion: SOLE has a user-friendly interface and includes routine elements of performing QSAR/QSPR studies that can be applied in various research areas. It utilizes rational and sophisticated feature selection, model selection and model evaluation processes.",
keywords = "Machine learning, QSAR, QSPR, Regression, Software, Support vector machine",
author = "Zhuo Zhen and Thrimoorthy Potta and Lanzillo, {Nicholas A.} and Kaushal Rege and Breneman, {Curt M.}",
year = "2017",
month = "1",
day = "1",
doi = "10.2174/1386207319666161228124214",
language = "English (US)",
volume = "20",
pages = "41--55",
journal = "Combinatorial Chemistry and High Throughput Screening",
issn = "1386-2073",
publisher = "Bentham Science Publishers B.V.",
number = "1",

}

TY - JOUR

T1 - Development of a web-enabled SVR-based machine learning platform and its application on modeling transgene expression activity of aminoglycoside-derived polycations

AU - Zhen, Zhuo

AU - Potta, Thrimoorthy

AU - Lanzillo, Nicholas A.

AU - Rege, Kaushal

AU - Breneman, Curt M.

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Objective: Support Vector Regression (SVR) has become increasingly popular in cheminformatics modeling. As a result, SVR-based machine learning algorithms, including Fuzzy-SVR and Least Square-SVR (LS-SVR) have been developed and applied in various research areas. However, at present, few downloadable packages or public-domain software are available for these algorithms. To address this need, we developed the Support vector regression-based Online Learning Equipment (SOLE) web tool (available at http://reccr.chem.rpi.edu/SOLE/index.html) as an online learning system to support predictive cheminformatics and materials informatics studies. Results: In this work, we employed the SOLE system to model transgene expression efficacy of polymers obtained from aminoglycoside antibiotics, which allowed the results of several modeling approaches to be easily compared. All models had test set r2 of 0.96-0.98 and test set R2 of 0.79-0.84. Y-scrambling test showed the models were stable and not over-fitted. Conclusion: SOLE has a user-friendly interface and includes routine elements of performing QSAR/QSPR studies that can be applied in various research areas. It utilizes rational and sophisticated feature selection, model selection and model evaluation processes.

AB - Objective: Support Vector Regression (SVR) has become increasingly popular in cheminformatics modeling. As a result, SVR-based machine learning algorithms, including Fuzzy-SVR and Least Square-SVR (LS-SVR) have been developed and applied in various research areas. However, at present, few downloadable packages or public-domain software are available for these algorithms. To address this need, we developed the Support vector regression-based Online Learning Equipment (SOLE) web tool (available at http://reccr.chem.rpi.edu/SOLE/index.html) as an online learning system to support predictive cheminformatics and materials informatics studies. Results: In this work, we employed the SOLE system to model transgene expression efficacy of polymers obtained from aminoglycoside antibiotics, which allowed the results of several modeling approaches to be easily compared. All models had test set r2 of 0.96-0.98 and test set R2 of 0.79-0.84. Y-scrambling test showed the models were stable and not over-fitted. Conclusion: SOLE has a user-friendly interface and includes routine elements of performing QSAR/QSPR studies that can be applied in various research areas. It utilizes rational and sophisticated feature selection, model selection and model evaluation processes.

KW - Machine learning

KW - QSAR

KW - QSPR

KW - Regression

KW - Software

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=85018517866&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018517866&partnerID=8YFLogxK

U2 - 10.2174/1386207319666161228124214

DO - 10.2174/1386207319666161228124214

M3 - Article

VL - 20

SP - 41

EP - 55

JO - Combinatorial Chemistry and High Throughput Screening

JF - Combinatorial Chemistry and High Throughput Screening

SN - 1386-2073

IS - 1

ER -