The GNAT library for local and remote gene mention normalization

Jörg Hakenberg, Martin Gerner, Maximilian Haeussler, Illés Solt, Conrad Plake, Michael Schroeder, Graciela Gonzalez, Goran Nenadic, Casey M. Bergman

Research output: Contribution to journalArticle

39 Citations (Scopus)

Abstract

Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the GNAT Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of GNAT achieves a Tap-20 score of 0.1987.

Original languageEnglish (US)
Article numberbtr455
Pages (from-to)2769-2771
Number of pages3
JournalBioinformatics
Volume27
Issue number19
DOIs
StatePublished - Oct 2011

Fingerprint

Data Mining
Text Mining
Libraries
Normalization
Genes
Gene
Web services
Proteins
Protein
Data mining
Named Entity Recognition
Text Retrieval
Pipelines
Java
Web Services
Data analysis
Names
Databases

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

Hakenberg, J., Gerner, M., Haeussler, M., Solt, I., Plake, C., Schroeder, M., ... Bergman, C. M. (2011). The GNAT library for local and remote gene mention normalization. Bioinformatics, 27(19), 2769-2771. [btr455]. https://doi.org/10.1093/bioinformatics/btr455

The GNAT library for local and remote gene mention normalization. / Hakenberg, Jörg; Gerner, Martin; Haeussler, Maximilian; Solt, Illés; Plake, Conrad; Schroeder, Michael; Gonzalez, Graciela; Nenadic, Goran; Bergman, Casey M.

In: Bioinformatics, Vol. 27, No. 19, btr455, 10.2011, p. 2769-2771.

Research output: Contribution to journalArticle

Hakenberg, J, Gerner, M, Haeussler, M, Solt, I, Plake, C, Schroeder, M, Gonzalez, G, Nenadic, G & Bergman, CM 2011, 'The GNAT library for local and remote gene mention normalization', Bioinformatics, vol. 27, no. 19, btr455, pp. 2769-2771. https://doi.org/10.1093/bioinformatics/btr455
Hakenberg J, Gerner M, Haeussler M, Solt I, Plake C, Schroeder M et al. The GNAT library for local and remote gene mention normalization. Bioinformatics. 2011 Oct;27(19):2769-2771. btr455. https://doi.org/10.1093/bioinformatics/btr455
Hakenberg, Jörg ; Gerner, Martin ; Haeussler, Maximilian ; Solt, Illés ; Plake, Conrad ; Schroeder, Michael ; Gonzalez, Graciela ; Nenadic, Goran ; Bergman, Casey M. / The GNAT library for local and remote gene mention normalization. In: Bioinformatics. 2011 ; Vol. 27, No. 19. pp. 2769-2771.
@article{7a5e0d9b0b0a47f69e50a59531cb1b02,
title = "The GNAT library for local and remote gene mention normalization",
abstract = "Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the GNAT Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of GNAT achieves a Tap-20 score of 0.1987.",
author = "J{\"o}rg Hakenberg and Martin Gerner and Maximilian Haeussler and Ill{\'e}s Solt and Conrad Plake and Michael Schroeder and Graciela Gonzalez and Goran Nenadic and Bergman, {Casey M.}",
year = "2011",
month = "10",
doi = "10.1093/bioinformatics/btr455",
language = "English (US)",
volume = "27",
pages = "2769--2771",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "19",

}

TY - JOUR

T1 - The GNAT library for local and remote gene mention normalization

AU - Hakenberg, Jörg

AU - Gerner, Martin

AU - Haeussler, Maximilian

AU - Solt, Illés

AU - Plake, Conrad

AU - Schroeder, Michael

AU - Gonzalez, Graciela

AU - Nenadic, Goran

AU - Bergman, Casey M.

PY - 2011/10

Y1 - 2011/10

N2 - Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the GNAT Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of GNAT achieves a Tap-20 score of 0.1987.

AB - Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the GNAT Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of GNAT achieves a Tap-20 score of 0.1987.

UR - http://www.scopus.com/inward/record.url?scp=80053441509&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053441509&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btr455

DO - 10.1093/bioinformatics/btr455

M3 - Article

VL - 27

SP - 2769

EP - 2771

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 19

M1 - btr455

ER -