Drosophila gene expression pattern annotation through multi-instance multi-label learning

Ying Xin Li, Shuiwang Ji, Sudhir Kumar, Jieping Ye, Zhi Hua Zhou

Research output: Contribution to journalArticle

50 Citations (Scopus)

Abstract

In the studies of Drosophila embryogenesis, a large number of two-dimensional digital images of gene expression patterns have been produced to build an atlas of spatio-temporal gene expression dynamics across developmental time. Gene expressions captured in these images have been manually annotated with anatomical and developmental ontology terms using a controlled vocabulary (CV), which are useful in research aimed at understanding gene functions, interactions, and networks. With the rapid accumulation of images, the process of manual annotation has become increasingly cumbersome, and computational methods to automate this task are urgently needed. However, the automated annotation of embryo images is challenging. This is because the annotation terms spatially correspond to local expression patterns of images, yet they are assigned collectively to groups of images and it is unknown which term corresponds to which region of which image in the group. In this paper, we address this problem using a new machine learning framework, Multi-Instance Multi-Label (MIML) learning. We first show that the underlying nature of the annotation task is a typical MIML learning problem. Then, we propose two support vector machine algorithms under the MIML framework for the task. Experimental results on the FlyExpress database (a digital library of standardized Drosophila gene expression pattern images) reveal that the exploitation of MIML framework leads to significant performance improvement over state-of-the-art approaches.

Original languageEnglish (US)
Article number5753882
Pages (from-to)98-112
Number of pages15
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume9
Issue number1
DOIs
StatePublished - 2012

Fingerprint

Drosophilidae
Gene expression
Drosophila
Gene Expression
Annotation
Labels
Learning
Controlled Vocabulary
Digital Libraries
Thesauri
Digital libraries
Atlases
Computational methods
Term
Embryonic Development
Support vector machines
Ontology
Learning systems
Embryonic Structures
Genes

Keywords

  • Drosophila
  • Gene expression pattern
  • image annotation
  • machine learning
  • multi-instance multi-label (MIML) learning
  • support vector machine

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics
  • Medicine(all)

Cite this

Drosophila gene expression pattern annotation through multi-instance multi-label learning. / Li, Ying Xin; Ji, Shuiwang; Kumar, Sudhir; Ye, Jieping; Zhou, Zhi Hua.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 9, No. 1, 5753882, 2012, p. 98-112.

Research output: Contribution to journalArticle

Li, Ying Xin ; Ji, Shuiwang ; Kumar, Sudhir ; Ye, Jieping ; Zhou, Zhi Hua. / Drosophila gene expression pattern annotation through multi-instance multi-label learning. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2012 ; Vol. 9, No. 1. pp. 98-112.
@article{9f4f2edbb1774b5bb396dbb528b8f415,
title = "Drosophila gene expression pattern annotation through multi-instance multi-label learning",
abstract = "In the studies of Drosophila embryogenesis, a large number of two-dimensional digital images of gene expression patterns have been produced to build an atlas of spatio-temporal gene expression dynamics across developmental time. Gene expressions captured in these images have been manually annotated with anatomical and developmental ontology terms using a controlled vocabulary (CV), which are useful in research aimed at understanding gene functions, interactions, and networks. With the rapid accumulation of images, the process of manual annotation has become increasingly cumbersome, and computational methods to automate this task are urgently needed. However, the automated annotation of embryo images is challenging. This is because the annotation terms spatially correspond to local expression patterns of images, yet they are assigned collectively to groups of images and it is unknown which term corresponds to which region of which image in the group. In this paper, we address this problem using a new machine learning framework, Multi-Instance Multi-Label (MIML) learning. We first show that the underlying nature of the annotation task is a typical MIML learning problem. Then, we propose two support vector machine algorithms under the MIML framework for the task. Experimental results on the FlyExpress database (a digital library of standardized Drosophila gene expression pattern images) reveal that the exploitation of MIML framework leads to significant performance improvement over state-of-the-art approaches.",
keywords = "Drosophila, Gene expression pattern, image annotation, machine learning, multi-instance multi-label (MIML) learning, support vector machine",
author = "Li, {Ying Xin} and Shuiwang Ji and Sudhir Kumar and Jieping Ye and Zhou, {Zhi Hua}",
year = "2012",
doi = "10.1109/TCBB.2011.73",
language = "English (US)",
volume = "9",
pages = "98--112",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

TY - JOUR

T1 - Drosophila gene expression pattern annotation through multi-instance multi-label learning

AU - Li, Ying Xin

AU - Ji, Shuiwang

AU - Kumar, Sudhir

AU - Ye, Jieping

AU - Zhou, Zhi Hua

PY - 2012

Y1 - 2012

N2 - In the studies of Drosophila embryogenesis, a large number of two-dimensional digital images of gene expression patterns have been produced to build an atlas of spatio-temporal gene expression dynamics across developmental time. Gene expressions captured in these images have been manually annotated with anatomical and developmental ontology terms using a controlled vocabulary (CV), which are useful in research aimed at understanding gene functions, interactions, and networks. With the rapid accumulation of images, the process of manual annotation has become increasingly cumbersome, and computational methods to automate this task are urgently needed. However, the automated annotation of embryo images is challenging. This is because the annotation terms spatially correspond to local expression patterns of images, yet they are assigned collectively to groups of images and it is unknown which term corresponds to which region of which image in the group. In this paper, we address this problem using a new machine learning framework, Multi-Instance Multi-Label (MIML) learning. We first show that the underlying nature of the annotation task is a typical MIML learning problem. Then, we propose two support vector machine algorithms under the MIML framework for the task. Experimental results on the FlyExpress database (a digital library of standardized Drosophila gene expression pattern images) reveal that the exploitation of MIML framework leads to significant performance improvement over state-of-the-art approaches.

AB - In the studies of Drosophila embryogenesis, a large number of two-dimensional digital images of gene expression patterns have been produced to build an atlas of spatio-temporal gene expression dynamics across developmental time. Gene expressions captured in these images have been manually annotated with anatomical and developmental ontology terms using a controlled vocabulary (CV), which are useful in research aimed at understanding gene functions, interactions, and networks. With the rapid accumulation of images, the process of manual annotation has become increasingly cumbersome, and computational methods to automate this task are urgently needed. However, the automated annotation of embryo images is challenging. This is because the annotation terms spatially correspond to local expression patterns of images, yet they are assigned collectively to groups of images and it is unknown which term corresponds to which region of which image in the group. In this paper, we address this problem using a new machine learning framework, Multi-Instance Multi-Label (MIML) learning. We first show that the underlying nature of the annotation task is a typical MIML learning problem. Then, we propose two support vector machine algorithms under the MIML framework for the task. Experimental results on the FlyExpress database (a digital library of standardized Drosophila gene expression pattern images) reveal that the exploitation of MIML framework leads to significant performance improvement over state-of-the-art approaches.

KW - Drosophila

KW - Gene expression pattern

KW - image annotation

KW - machine learning

KW - multi-instance multi-label (MIML) learning

KW - support vector machine

UR - http://www.scopus.com/inward/record.url?scp=81455143767&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=81455143767&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2011.73

DO - 10.1109/TCBB.2011.73

M3 - Article

VL - 9

SP - 98

EP - 112

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 1

M1 - 5753882

ER -