Drosophila gene expression pattern annotation using sparse features and term-term interactions

Shuiwang Ji, Lei Yuan, Ying Xin Li, Zhi Hua Zhou, Sudhir Kumar, Jieping Ye

Research output: Chapter in Book/Report/Conference proceedingConference contribution

25 Citations (Scopus)

Abstract

The Drosophila gene expression pattern images document the spatial and temporal dynamics of gene expression and they are valuable tools for explicating the gene functions, interaction, and networks during Drosophila embryogenesis. To provide text-based pattern searching, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with ontology terms manually by human curators. We present a systematic approach for automating this task, because the number of images needing text descriptions is now rapidly increasing. We consider both improved feature representation and novel learning formulation to boost the annotation performance. For feature representation, we adapt the bag-of-words scheme commonly used in visual recognition problems so that the image group information in the BDGP study is retained. Moreover, images from multiple views can be integrated naturally in this representation. To reduce the quantization error caused by the bag-of-words representation, we propose an improved feature representation scheme based on the sparse learning technique. In the design of learning formulation, we propose a local regularization framework that can incorporate the correlations among terms explicitly. We further show that the resulting optimization problem admits an analytical solution. Experimental results show that the representation based on sparse learning outperforms the bag-of-words representation significantly. Results also show that incorporation of the term-term correlations improves the annotation performance consistently.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages407-415
Number of pages9
DOIs
StatePublished - 2009
Event15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09 - Paris, France
Duration: Jun 28 2009Jul 1 2009

Other

Other15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09
CountryFrance
CityParis
Period6/28/097/1/09

Fingerprint

Gene expression
Genes
Ontology

Keywords

  • Gene expression pattern
  • Image annotation, bag-of-words
  • Regularization
  • Sparse learning

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Ji, S., Yuan, L., Li, Y. X., Zhou, Z. H., Kumar, S., & Ye, J. (2009). Drosophila gene expression pattern annotation using sparse features and term-term interactions. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 407-415) https://doi.org/10.1145/1557019.1557068

Drosophila gene expression pattern annotation using sparse features and term-term interactions. / Ji, Shuiwang; Yuan, Lei; Li, Ying Xin; Zhou, Zhi Hua; Kumar, Sudhir; Ye, Jieping.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009. p. 407-415.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ji, S, Yuan, L, Li, YX, Zhou, ZH, Kumar, S & Ye, J 2009, Drosophila gene expression pattern annotation using sparse features and term-term interactions. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 407-415, 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, Paris, France, 6/28/09. https://doi.org/10.1145/1557019.1557068
Ji S, Yuan L, Li YX, Zhou ZH, Kumar S, Ye J. Drosophila gene expression pattern annotation using sparse features and term-term interactions. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009. p. 407-415 https://doi.org/10.1145/1557019.1557068
Ji, Shuiwang ; Yuan, Lei ; Li, Ying Xin ; Zhou, Zhi Hua ; Kumar, Sudhir ; Ye, Jieping. / Drosophila gene expression pattern annotation using sparse features and term-term interactions. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009. pp. 407-415
@inproceedings{bf97495f628e4ce28221ddc4673fc0eb,
title = "Drosophila gene expression pattern annotation using sparse features and term-term interactions",
abstract = "The Drosophila gene expression pattern images document the spatial and temporal dynamics of gene expression and they are valuable tools for explicating the gene functions, interaction, and networks during Drosophila embryogenesis. To provide text-based pattern searching, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with ontology terms manually by human curators. We present a systematic approach for automating this task, because the number of images needing text descriptions is now rapidly increasing. We consider both improved feature representation and novel learning formulation to boost the annotation performance. For feature representation, we adapt the bag-of-words scheme commonly used in visual recognition problems so that the image group information in the BDGP study is retained. Moreover, images from multiple views can be integrated naturally in this representation. To reduce the quantization error caused by the bag-of-words representation, we propose an improved feature representation scheme based on the sparse learning technique. In the design of learning formulation, we propose a local regularization framework that can incorporate the correlations among terms explicitly. We further show that the resulting optimization problem admits an analytical solution. Experimental results show that the representation based on sparse learning outperforms the bag-of-words representation significantly. Results also show that incorporation of the term-term correlations improves the annotation performance consistently.",
keywords = "Gene expression pattern, Image annotation, bag-of-words, Regularization, Sparse learning",
author = "Shuiwang Ji and Lei Yuan and Li, {Ying Xin} and Zhou, {Zhi Hua} and Sudhir Kumar and Jieping Ye",
year = "2009",
doi = "10.1145/1557019.1557068",
language = "English (US)",
isbn = "9781605584959",
pages = "407--415",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Drosophila gene expression pattern annotation using sparse features and term-term interactions

AU - Ji, Shuiwang

AU - Yuan, Lei

AU - Li, Ying Xin

AU - Zhou, Zhi Hua

AU - Kumar, Sudhir

AU - Ye, Jieping

PY - 2009

Y1 - 2009

N2 - The Drosophila gene expression pattern images document the spatial and temporal dynamics of gene expression and they are valuable tools for explicating the gene functions, interaction, and networks during Drosophila embryogenesis. To provide text-based pattern searching, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with ontology terms manually by human curators. We present a systematic approach for automating this task, because the number of images needing text descriptions is now rapidly increasing. We consider both improved feature representation and novel learning formulation to boost the annotation performance. For feature representation, we adapt the bag-of-words scheme commonly used in visual recognition problems so that the image group information in the BDGP study is retained. Moreover, images from multiple views can be integrated naturally in this representation. To reduce the quantization error caused by the bag-of-words representation, we propose an improved feature representation scheme based on the sparse learning technique. In the design of learning formulation, we propose a local regularization framework that can incorporate the correlations among terms explicitly. We further show that the resulting optimization problem admits an analytical solution. Experimental results show that the representation based on sparse learning outperforms the bag-of-words representation significantly. Results also show that incorporation of the term-term correlations improves the annotation performance consistently.

AB - The Drosophila gene expression pattern images document the spatial and temporal dynamics of gene expression and they are valuable tools for explicating the gene functions, interaction, and networks during Drosophila embryogenesis. To provide text-based pattern searching, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with ontology terms manually by human curators. We present a systematic approach for automating this task, because the number of images needing text descriptions is now rapidly increasing. We consider both improved feature representation and novel learning formulation to boost the annotation performance. For feature representation, we adapt the bag-of-words scheme commonly used in visual recognition problems so that the image group information in the BDGP study is retained. Moreover, images from multiple views can be integrated naturally in this representation. To reduce the quantization error caused by the bag-of-words representation, we propose an improved feature representation scheme based on the sparse learning technique. In the design of learning formulation, we propose a local regularization framework that can incorporate the correlations among terms explicitly. We further show that the resulting optimization problem admits an analytical solution. Experimental results show that the representation based on sparse learning outperforms the bag-of-words representation significantly. Results also show that incorporation of the term-term correlations improves the annotation performance consistently.

KW - Gene expression pattern

KW - Image annotation, bag-of-words

KW - Regularization

KW - Sparse learning

UR - http://www.scopus.com/inward/record.url?scp=70350677092&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70350677092&partnerID=8YFLogxK

U2 - 10.1145/1557019.1557068

DO - 10.1145/1557019.1557068

M3 - Conference contribution

AN - SCOPUS:70350677092

SN - 9781605584959

SP - 407

EP - 415

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ER -