Automated annotation of Drosophila gene expression patterns using a controlled vocabulary

Shuiwang Ji, Liang Sun, Rong Jin, Sudhir Kumar, Jieping Ye

Research output: Contribution to journalArticle

32 Citations (Scopus)

Abstract

Motivation: Regulation of gene expression in space and time directs its localization to a specific subset of cells during development. Systematic determination of the spatiotemporal dynamics of gene expression plays an important role in understanding the regulatory networks driving development. An atlas for the gene expression patterns of fruit fly Drosophila melanogaster has been created by whole-mount in situ hybridization, and it documents the dynamic changes of gene expression pattern during Drosophila embryogenesis. The spatial and temporal patterns of gene expression are integrated by anatomical terms from a controlled vocabulary linking together intermediate tissues developed from one another. Currently, the terms are assigned to patterns manually. However, the number of patterns generated by high-throughput in situ hybridization is rapidly increasing. It is, therefore, tempting to approach this problem by employing computational methods. Results: In this article, we present a novel computational framework for annotating gene expression patterns using a controlled vocabulary. In the currently available high-throughput data, annotation terms are assigned to groups of patterns rather than to individual images. We propose to extract invariant features from images, and construct pyramid match kernels to measure the similarity between sets of patterns. To exploit the complementary information conveyed by different features and incorporate the correlation among patterns sharing common structures, we propose efficient convex formulations to integrate the kernels derived from various features. The proposed framework is evaluated by comparing its annotation with that of human curators, and promising performance in terms of F1 score has been reported.

Original languageEnglish (US)
Pages (from-to)1881-1888
Number of pages8
JournalBioinformatics
Volume24
Issue number17
DOIs
StatePublished - Sep 2008

Fingerprint

Controlled Vocabulary
Thesauri
Drosophilidae
Gene expression
Drosophila
Gene Expression
Annotation
In Situ Hybridization
Throughput
Atlases
Gene Expression Regulation
High Throughput
Drosophila melanogaster
Diptera
Embryonic Development
Term
Computational methods
Fruits
Fruit
kernel

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Automated annotation of Drosophila gene expression patterns using a controlled vocabulary. / Ji, Shuiwang; Sun, Liang; Jin, Rong; Kumar, Sudhir; Ye, Jieping.

In: Bioinformatics, Vol. 24, No. 17, 09.2008, p. 1881-1888.

Research output: Contribution to journalArticle

Ji, Shuiwang ; Sun, Liang ; Jin, Rong ; Kumar, Sudhir ; Ye, Jieping. / Automated annotation of Drosophila gene expression patterns using a controlled vocabulary. In: Bioinformatics. 2008 ; Vol. 24, No. 17. pp. 1881-1888.
@article{ec53f2c317a04085ba0bc257a252e5c2,
title = "Automated annotation of Drosophila gene expression patterns using a controlled vocabulary",
abstract = "Motivation: Regulation of gene expression in space and time directs its localization to a specific subset of cells during development. Systematic determination of the spatiotemporal dynamics of gene expression plays an important role in understanding the regulatory networks driving development. An atlas for the gene expression patterns of fruit fly Drosophila melanogaster has been created by whole-mount in situ hybridization, and it documents the dynamic changes of gene expression pattern during Drosophila embryogenesis. The spatial and temporal patterns of gene expression are integrated by anatomical terms from a controlled vocabulary linking together intermediate tissues developed from one another. Currently, the terms are assigned to patterns manually. However, the number of patterns generated by high-throughput in situ hybridization is rapidly increasing. It is, therefore, tempting to approach this problem by employing computational methods. Results: In this article, we present a novel computational framework for annotating gene expression patterns using a controlled vocabulary. In the currently available high-throughput data, annotation terms are assigned to groups of patterns rather than to individual images. We propose to extract invariant features from images, and construct pyramid match kernels to measure the similarity between sets of patterns. To exploit the complementary information conveyed by different features and incorporate the correlation among patterns sharing common structures, we propose efficient convex formulations to integrate the kernels derived from various features. The proposed framework is evaluated by comparing its annotation with that of human curators, and promising performance in terms of F1 score has been reported.",
author = "Shuiwang Ji and Liang Sun and Rong Jin and Sudhir Kumar and Jieping Ye",
year = "2008",
month = "9",
doi = "10.1093/bioinformatics/btn347",
language = "English (US)",
volume = "24",
pages = "1881--1888",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "17",

}

TY - JOUR

T1 - Automated annotation of Drosophila gene expression patterns using a controlled vocabulary

AU - Ji, Shuiwang

AU - Sun, Liang

AU - Jin, Rong

AU - Kumar, Sudhir

AU - Ye, Jieping

PY - 2008/9

Y1 - 2008/9

N2 - Motivation: Regulation of gene expression in space and time directs its localization to a specific subset of cells during development. Systematic determination of the spatiotemporal dynamics of gene expression plays an important role in understanding the regulatory networks driving development. An atlas for the gene expression patterns of fruit fly Drosophila melanogaster has been created by whole-mount in situ hybridization, and it documents the dynamic changes of gene expression pattern during Drosophila embryogenesis. The spatial and temporal patterns of gene expression are integrated by anatomical terms from a controlled vocabulary linking together intermediate tissues developed from one another. Currently, the terms are assigned to patterns manually. However, the number of patterns generated by high-throughput in situ hybridization is rapidly increasing. It is, therefore, tempting to approach this problem by employing computational methods. Results: In this article, we present a novel computational framework for annotating gene expression patterns using a controlled vocabulary. In the currently available high-throughput data, annotation terms are assigned to groups of patterns rather than to individual images. We propose to extract invariant features from images, and construct pyramid match kernels to measure the similarity between sets of patterns. To exploit the complementary information conveyed by different features and incorporate the correlation among patterns sharing common structures, we propose efficient convex formulations to integrate the kernels derived from various features. The proposed framework is evaluated by comparing its annotation with that of human curators, and promising performance in terms of F1 score has been reported.

AB - Motivation: Regulation of gene expression in space and time directs its localization to a specific subset of cells during development. Systematic determination of the spatiotemporal dynamics of gene expression plays an important role in understanding the regulatory networks driving development. An atlas for the gene expression patterns of fruit fly Drosophila melanogaster has been created by whole-mount in situ hybridization, and it documents the dynamic changes of gene expression pattern during Drosophila embryogenesis. The spatial and temporal patterns of gene expression are integrated by anatomical terms from a controlled vocabulary linking together intermediate tissues developed from one another. Currently, the terms are assigned to patterns manually. However, the number of patterns generated by high-throughput in situ hybridization is rapidly increasing. It is, therefore, tempting to approach this problem by employing computational methods. Results: In this article, we present a novel computational framework for annotating gene expression patterns using a controlled vocabulary. In the currently available high-throughput data, annotation terms are assigned to groups of patterns rather than to individual images. We propose to extract invariant features from images, and construct pyramid match kernels to measure the similarity between sets of patterns. To exploit the complementary information conveyed by different features and incorporate the correlation among patterns sharing common structures, we propose efficient convex formulations to integrate the kernels derived from various features. The proposed framework is evaluated by comparing its annotation with that of human curators, and promising performance in terms of F1 score has been reported.

UR - http://www.scopus.com/inward/record.url?scp=50549093271&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=50549093271&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btn347

DO - 10.1093/bioinformatics/btn347

M3 - Article

VL - 24

SP - 1881

EP - 1888

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 17

ER -