The GNAT library for local and remote gene mention normalization

Jörg Hakenberg; Martin Gerner; Maximilian Haeussler; Illés Solt; Conrad Plake; Michael Schroeder; Graciela Gonzalez; Goran Nenadic; Casey M. Bergman

doi:10.1093/bioinformatics/btr455

The GNAT library for local and remote gene mention normalization

Jörg Hakenberg, Martin Gerner, Maximilian Haeussler, Illés Solt, Conrad Plake, Michael Schroeder, Graciela Gonzalez, Goran Nenadic, Casey M. Bergman

Research output: Contribution to journal › Article › peer-review

53 Scopus citations

Abstract

Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the GNAT Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of GNAT achieves a Tap-20 score of 0.1987.

Original language	English (US)
Article number	btr455
Pages (from-to)	2769-2771
Number of pages	3
Journal	Bioinformatics
Volume	27
Issue number	19
DOIs	https://doi.org/10.1093/bioinformatics/btr455
State	Published - Oct 2011

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btr455

Cite this

@article{7a5e0d9b0b0a47f69e50a59531cb1b02,

title = "The GNAT library for local and remote gene mention normalization",

abstract = "Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the GNAT Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of GNAT achieves a Tap-20 score of 0.1987.",

author = "J{\"o}rg Hakenberg and Martin Gerner and Maximilian Haeussler and Ill{\'e}s Solt and Conrad Plake and Michael Schroeder and Graciela Gonzalez and Goran Nenadic and Bergman, {Casey M.}",

note = "Funding Information: Funding: Biotechnology and Biological Sciences Research Council (CASE studentship to M.G., grant BB/G000093/1 to C.M.B., G.N.); the European Commission (grant HEALTH-F4-2008-223210 to C.M.B.); German Academic Exchange Service (DAAD) to I.S.",

year = "2011",

month = oct,

doi = "10.1093/bioinformatics/btr455",

language = "English (US)",

volume = "27",

pages = "2769--2771",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "19",

}

TY - JOUR

T1 - The GNAT library for local and remote gene mention normalization

AU - Hakenberg, Jörg

AU - Gerner, Martin

AU - Haeussler, Maximilian

AU - Solt, Illés

AU - Plake, Conrad

AU - Schroeder, Michael

AU - Gonzalez, Graciela

AU - Nenadic, Goran

AU - Bergman, Casey M.

N1 - Funding Information: Funding: Biotechnology and Biological Sciences Research Council (CASE studentship to M.G., grant BB/G000093/1 to C.M.B., G.N.); the European Commission (grant HEALTH-F4-2008-223210 to C.M.B.); German Academic Exchange Service (DAAD) to I.S.

PY - 2011/10

Y1 - 2011/10

N2 - Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the GNAT Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of GNAT achieves a Tap-20 score of 0.1987.

AB - Summary: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the GNAT Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of GNAT achieves a Tap-20 score of 0.1987.

UR - http://www.scopus.com/inward/record.url?scp=80053441509&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053441509&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btr455

DO - 10.1093/bioinformatics/btr455

M3 - Article

C2 - 21813477

AN - SCOPUS:80053441509

SN - 1367-4803

VL - 27

SP - 2769

EP - 2771

JO - Bioinformatics

JF - Bioinformatics

IS - 19

M1 - btr455

ER -

The GNAT library for local and remote gene mention normalization

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this