Technology of informative feature selection for immunosignature analysis

Alexander A. Koshechkin; O. V. Romanovich; D. Stamate; S. A. Johnston; A. V. Zamyatin

doi:10.17691/stm2020.12.5.02

Technology of informative feature selection for immunosignature analysis

Alexander A. Koshechkin, O. V. Romanovich, D. Stamate, S. A. Johnston, A. V. Zamyatin

Life Sciences, School of (SOLS)

Research output: Contribution to journal › Article › peer-review

Abstract

The main difficulty in practical work with data obtained via immunosignature analysis is high dimensionality and the presence of a significant number of uninformative or false-informative features due to the specific character of the technology. To ensure practically relevant quality of data analysis and classification, it is necessary to take due account of this specific character. The aim of the study is to create and test the technology for effective reduction of immunosignature data dimensionality, which provides practically relevant and high quality of classification with due regard for the properties of the data obtained. Materials and Methods. The study involved the use of two normalized data sets obtained from the public biomedical repository and containing the results of immunosignature analysis. The technology for selecting informative features was proposed within the framework of the study. It consisted of three successive steps: 1) breaking a multiclass task into a series of binary tasks using the “one vs all” strategy; 2) screening of false-informative features is performed for each binary comparison by comparing the values of the median of the sets “one” and “all”; 3) ranking of the remaining features according to their informative value and selection of the most informative ones for each binary comparison. To assess the quality of the proposed technology for informative feature selection, we used the results obtained after application of classification based on the filtered data. Support vector method that proved itself in the problems of high-dimensional data classification was used as a classification model. Results. Effectiveness of the proposed technology for informative feature selection was determined. This technology allows us to provide high quality of classification while significantly reducing the feature space. The number of features eliminated in the second step is approximately 50% for each data set under consideration, which greatly simplifies subsequent data analysis. After the third step, when the feature space is reduced to 15 features, the quality of classification by the macro-average F1-score metric is assessed as 98.9% for the GSE52581 dataset. For the GSE52581 dataset, with the feature space reduced to 266 features, the quality of classification by the macro-average F1-score metric is 91.3%. Conclusion. The results of the work demonstrate the promising outlook of the proposed technology for informative feature selection as applied to the data of immunosignature analysis.

Original language	English (US)
Pages (from-to)	19-27
Number of pages	9
Journal	Sovremennye Tehnologii v Medicine
Volume	12
Issue number	5
DOIs	https://doi.org/10.17691/stm2020.12.5.02
State	Published - 2020

Keywords

Early diagnosis of diseases
Feature selection in the sample
Immunosignature
Machine learning

ASJC Scopus subject areas

General Biochemistry, Genetics and Molecular Biology

Access to Document

10.17691/stm2020.12.5.02

Cite this

@article{c3c0f9a40e8a4951a1a49d3205aa236b,

title = "Technology of informative feature selection for immunosignature analysis",

abstract = "The main difficulty in practical work with data obtained via immunosignature analysis is high dimensionality and the presence of a significant number of uninformative or false-informative features due to the specific character of the technology. To ensure practically relevant quality of data analysis and classification, it is necessary to take due account of this specific character. The aim of the study is to create and test the technology for effective reduction of immunosignature data dimensionality, which provides practically relevant and high quality of classification with due regard for the properties of the data obtained. Materials and Methods. The study involved the use of two normalized data sets obtained from the public biomedical repository and containing the results of immunosignature analysis. The technology for selecting informative features was proposed within the framework of the study. It consisted of three successive steps: 1) breaking a multiclass task into a series of binary tasks using the “one vs all” strategy; 2) screening of false-informative features is performed for each binary comparison by comparing the values of the median of the sets “one” and “all”; 3) ranking of the remaining features according to their informative value and selection of the most informative ones for each binary comparison. To assess the quality of the proposed technology for informative feature selection, we used the results obtained after application of classification based on the filtered data. Support vector method that proved itself in the problems of high-dimensional data classification was used as a classification model. Results. Effectiveness of the proposed technology for informative feature selection was determined. This technology allows us to provide high quality of classification while significantly reducing the feature space. The number of features eliminated in the second step is approximately 50% for each data set under consideration, which greatly simplifies subsequent data analysis. After the third step, when the feature space is reduced to 15 features, the quality of classification by the macro-average F1-score metric is assessed as 98.9% for the GSE52581 dataset. For the GSE52581 dataset, with the feature space reduced to 266 features, the quality of classification by the macro-average F1-score metric is 91.3%. Conclusion. The results of the work demonstrate the promising outlook of the proposed technology for informative feature selection as applied to the data of immunosignature analysis.",

keywords = "Early diagnosis of diseases, Feature selection in the sample, Immunosignature, Machine learning",

author = "Koshechkin, {Alexander A.} and Romanovich, {O. V.} and D. Stamate and Johnston, {S. A.} and Zamyatin, {A. V.}",

year = "2020",

doi = "10.17691/stm2020.12.5.02",

language = "English (US)",

volume = "12",

pages = "19--27",

journal = "Sovremennye Tehnologii v Medicine",

issn = "2076-4243",

publisher = "Nizhny Novgorod State Medical Academy of the Ministry of Health of the Russian Federation",

number = "5",

}

TY - JOUR

T1 - Technology of informative feature selection for immunosignature analysis

AU - Koshechkin, Alexander A.

AU - Romanovich, O. V.

AU - Stamate, D.

AU - Johnston, S. A.

AU - Zamyatin, A. V.

PY - 2020

Y1 - 2020

N2 - The main difficulty in practical work with data obtained via immunosignature analysis is high dimensionality and the presence of a significant number of uninformative or false-informative features due to the specific character of the technology. To ensure practically relevant quality of data analysis and classification, it is necessary to take due account of this specific character. The aim of the study is to create and test the technology for effective reduction of immunosignature data dimensionality, which provides practically relevant and high quality of classification with due regard for the properties of the data obtained. Materials and Methods. The study involved the use of two normalized data sets obtained from the public biomedical repository and containing the results of immunosignature analysis. The technology for selecting informative features was proposed within the framework of the study. It consisted of three successive steps: 1) breaking a multiclass task into a series of binary tasks using the “one vs all” strategy; 2) screening of false-informative features is performed for each binary comparison by comparing the values of the median of the sets “one” and “all”; 3) ranking of the remaining features according to their informative value and selection of the most informative ones for each binary comparison. To assess the quality of the proposed technology for informative feature selection, we used the results obtained after application of classification based on the filtered data. Support vector method that proved itself in the problems of high-dimensional data classification was used as a classification model. Results. Effectiveness of the proposed technology for informative feature selection was determined. This technology allows us to provide high quality of classification while significantly reducing the feature space. The number of features eliminated in the second step is approximately 50% for each data set under consideration, which greatly simplifies subsequent data analysis. After the third step, when the feature space is reduced to 15 features, the quality of classification by the macro-average F1-score metric is assessed as 98.9% for the GSE52581 dataset. For the GSE52581 dataset, with the feature space reduced to 266 features, the quality of classification by the macro-average F1-score metric is 91.3%. Conclusion. The results of the work demonstrate the promising outlook of the proposed technology for informative feature selection as applied to the data of immunosignature analysis.

AB - The main difficulty in practical work with data obtained via immunosignature analysis is high dimensionality and the presence of a significant number of uninformative or false-informative features due to the specific character of the technology. To ensure practically relevant quality of data analysis and classification, it is necessary to take due account of this specific character. The aim of the study is to create and test the technology for effective reduction of immunosignature data dimensionality, which provides practically relevant and high quality of classification with due regard for the properties of the data obtained. Materials and Methods. The study involved the use of two normalized data sets obtained from the public biomedical repository and containing the results of immunosignature analysis. The technology for selecting informative features was proposed within the framework of the study. It consisted of three successive steps: 1) breaking a multiclass task into a series of binary tasks using the “one vs all” strategy; 2) screening of false-informative features is performed for each binary comparison by comparing the values of the median of the sets “one” and “all”; 3) ranking of the remaining features according to their informative value and selection of the most informative ones for each binary comparison. To assess the quality of the proposed technology for informative feature selection, we used the results obtained after application of classification based on the filtered data. Support vector method that proved itself in the problems of high-dimensional data classification was used as a classification model. Results. Effectiveness of the proposed technology for informative feature selection was determined. This technology allows us to provide high quality of classification while significantly reducing the feature space. The number of features eliminated in the second step is approximately 50% for each data set under consideration, which greatly simplifies subsequent data analysis. After the third step, when the feature space is reduced to 15 features, the quality of classification by the macro-average F1-score metric is assessed as 98.9% for the GSE52581 dataset. For the GSE52581 dataset, with the feature space reduced to 266 features, the quality of classification by the macro-average F1-score metric is 91.3%. Conclusion. The results of the work demonstrate the promising outlook of the proposed technology for informative feature selection as applied to the data of immunosignature analysis.

KW - Early diagnosis of diseases

KW - Feature selection in the sample

KW - Immunosignature

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=85094919351&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85094919351&partnerID=8YFLogxK

U2 - 10.17691/stm2020.12.5.02

DO - 10.17691/stm2020.12.5.02

M3 - Article

C2 - 34796001

AN - SCOPUS:85094919351

SN - 2076-4243

VL - 12

SP - 19

EP - 27

JO - Sovremennye Tehnologii v Medicine

JF - Sovremennye Tehnologii v Medicine

IS - 5

ER -

Technology of informative feature selection for immunosignature analysis

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this