A bag-of-words approach for Drosophila gene expression pattern annotation

Shuiwang Ji, Ying Xin Li, Zhi Hua Zhou, Sudhir Kumar, Jieping Ye

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

Background: Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task. Results: We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods. Conclusion: The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.

Original languageEnglish (US)
Article number119
JournalBMC Bioinformatics
Volume10
DOIs
StatePublished - Apr 21 2009

Fingerprint

Drosophilidae
Gene expression
Drosophila
Gene Expression
Annotation
Thesauri
Computational methods
Genes
Controlled Vocabulary
Throughput
Computational Methods
Genome
Term
Embryogenesis
Embryonic Development
Regulatory Networks
Embryo
Embryonic Structures
Comparative Analysis
High Throughput

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Structural Biology
  • Applied Mathematics

Cite this

A bag-of-words approach for Drosophila gene expression pattern annotation. / Ji, Shuiwang; Li, Ying Xin; Zhou, Zhi Hua; Kumar, Sudhir; Ye, Jieping.

In: BMC Bioinformatics, Vol. 10, 119, 21.04.2009.

Research output: Contribution to journalArticle

Ji, Shuiwang ; Li, Ying Xin ; Zhou, Zhi Hua ; Kumar, Sudhir ; Ye, Jieping. / A bag-of-words approach for Drosophila gene expression pattern annotation. In: BMC Bioinformatics. 2009 ; Vol. 10.
@article{18ff7c48935d432e85fab572aaced7a2,
title = "A bag-of-words approach for Drosophila gene expression pattern annotation",
abstract = "Background: Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task. Results: We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods. Conclusion: The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.",
author = "Shuiwang Ji and Li, {Ying Xin} and Zhou, {Zhi Hua} and Sudhir Kumar and Jieping Ye",
year = "2009",
month = "4",
day = "21",
doi = "10.1186/1471-2105-10-119",
language = "English (US)",
volume = "10",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - A bag-of-words approach for Drosophila gene expression pattern annotation

AU - Ji, Shuiwang

AU - Li, Ying Xin

AU - Zhou, Zhi Hua

AU - Kumar, Sudhir

AU - Ye, Jieping

PY - 2009/4/21

Y1 - 2009/4/21

N2 - Background: Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task. Results: We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods. Conclusion: The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.

AB - Background: Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task. Results: We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods. Conclusion: The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.

UR - http://www.scopus.com/inward/record.url?scp=65549129744&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=65549129744&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-10-119

DO - 10.1186/1471-2105-10-119

M3 - Article

C2 - 19383139

AN - SCOPUS:65549129744

VL - 10

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 119

ER -