A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions

Jörg Hakenberg; Dmitry Voronov; Võ Hà Nguyên; Shanshan Liang; Saadat Anwar; Barry Lumpkin; Robert Leaman; Luis Tari; Chitta Baral

doi:10.1016/j.jbi.2012.04.006

A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions

Jörg Hakenberg, Dmitry Voronov, Võ Hà Nguyên, Shanshan Liang, Saadat Anwar, Barry Lumpkin, Robert Leaman, Luis Tari, Chitta Baral

Research output: Contribution to journal › Article › peer-review

33 Scopus citations

Abstract

Motivation: Genetic factors determine differences in pharmacokinetics, drug efficacy, and drug responses between individuals and sub-populations. Wrong dosages of drugs can lead to severe adverse drug reactions in individuals whose drug metabolism drastically differs from the " assumed average" Databases such as PharmGKB are excellent sources of pharmacogenetic information on enzymes, genetic variants, and drug response affected by changes in enzymatic activity. Here, we seek to aid researchers, database curators, and clinicians in their search for relevant information by automatically extracting these data from literature. Approach: We automatically populate a repository of information on genetic variants, relations to drugs, occurrence in sub-populations, and associations with disease. We mine textual data from PubMed abstracts to discover such genotype-phenotype associations, focusing on SNPs that can be associated with variations in drug response. The overall repository covers relations found between genes, variants, alleles, drugs, diseases, adverse drug reactions, populations, and allele frequencies. We cross-reference these data to EntrezGene, PharmGKB, PubChem, and others. Results: The performance regarding entity recognition and relation extraction yields a precision of 90-92% for the major entity types (gene, drug, disease), and 76-84% for relations involving these types. Comparison of our repository to PharmGKB reveals a coverage of 93% of gene-drug associations in PharmGKB and 97% of the gene-variant mappings based on 180,000 PubMed abstracts. Availability: http://bioai4core.fulton.asu.edu/snpshot.

Original language	English (US)
Pages (from-to)	842-850
Number of pages	9
Journal	Journal of Biomedical Informatics
Volume	45
Issue number	5
DOIs	https://doi.org/10.1016/j.jbi.2012.04.006
State	Published - Oct 2012

Keywords

Databases
Information extraction
Pharmacogenomics
Text mining

ASJC Scopus subject areas

Health Informatics
Computer Science Applications

Access to Document

10.1016/j.jbi.2012.04.006

Cite this

@article{d01e3815d3004958a6ed9a3373dcade8,

title = "A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions",

abstract = "Motivation: Genetic factors determine differences in pharmacokinetics, drug efficacy, and drug responses between individuals and sub-populations. Wrong dosages of drugs can lead to severe adverse drug reactions in individuals whose drug metabolism drastically differs from the {"} assumed average{"} Databases such as PharmGKB are excellent sources of pharmacogenetic information on enzymes, genetic variants, and drug response affected by changes in enzymatic activity. Here, we seek to aid researchers, database curators, and clinicians in their search for relevant information by automatically extracting these data from literature. Approach: We automatically populate a repository of information on genetic variants, relations to drugs, occurrence in sub-populations, and associations with disease. We mine textual data from PubMed abstracts to discover such genotype-phenotype associations, focusing on SNPs that can be associated with variations in drug response. The overall repository covers relations found between genes, variants, alleles, drugs, diseases, adverse drug reactions, populations, and allele frequencies. We cross-reference these data to EntrezGene, PharmGKB, PubChem, and others. Results: The performance regarding entity recognition and relation extraction yields a precision of 90-92% for the major entity types (gene, drug, disease), and 76-84% for relations involving these types. Comparison of our repository to PharmGKB reveals a coverage of 93% of gene-drug associations in PharmGKB and 97% of the gene-variant mappings based on 180,000 PubMed abstracts. Availability: http://bioai4core.fulton.asu.edu/snpshot.",

keywords = "Databases, Information extraction, Pharmacogenomics, Text mining",

author = "J{\"o}rg Hakenberg and Dmitry Voronov and Nguy{\^e}n, {V{\~o} H{\`a}} and Shanshan Liang and Saadat Anwar and Barry Lumpkin and Robert Leaman and Luis Tari and Chitta Baral",

note = "Funding Information: We kindly acknowledge funding by the National Science Foundation (VN, SL, BL), Science Foundation Arizona (LT, RL), Fulbright International Student Program Russia (DV), and Arizona State University (JH, CB). We would like to thank the anonymous reviewers whose suggestions helped to improve this manuscript.",

year = "2012",

month = oct,

doi = "10.1016/j.jbi.2012.04.006",

language = "English (US)",

volume = "45",

pages = "842--850",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

number = "5",

}

TY - JOUR

T1 - A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions

AU - Hakenberg, Jörg

AU - Voronov, Dmitry

AU - Nguyên, Võ Hà

AU - Liang, Shanshan

AU - Anwar, Saadat

AU - Lumpkin, Barry

AU - Leaman, Robert

AU - Tari, Luis

AU - Baral, Chitta

N1 - Funding Information: We kindly acknowledge funding by the National Science Foundation (VN, SL, BL), Science Foundation Arizona (LT, RL), Fulbright International Student Program Russia (DV), and Arizona State University (JH, CB). We would like to thank the anonymous reviewers whose suggestions helped to improve this manuscript.

PY - 2012/10

Y1 - 2012/10

N2 - Motivation: Genetic factors determine differences in pharmacokinetics, drug efficacy, and drug responses between individuals and sub-populations. Wrong dosages of drugs can lead to severe adverse drug reactions in individuals whose drug metabolism drastically differs from the " assumed average" Databases such as PharmGKB are excellent sources of pharmacogenetic information on enzymes, genetic variants, and drug response affected by changes in enzymatic activity. Here, we seek to aid researchers, database curators, and clinicians in their search for relevant information by automatically extracting these data from literature. Approach: We automatically populate a repository of information on genetic variants, relations to drugs, occurrence in sub-populations, and associations with disease. We mine textual data from PubMed abstracts to discover such genotype-phenotype associations, focusing on SNPs that can be associated with variations in drug response. The overall repository covers relations found between genes, variants, alleles, drugs, diseases, adverse drug reactions, populations, and allele frequencies. We cross-reference these data to EntrezGene, PharmGKB, PubChem, and others. Results: The performance regarding entity recognition and relation extraction yields a precision of 90-92% for the major entity types (gene, drug, disease), and 76-84% for relations involving these types. Comparison of our repository to PharmGKB reveals a coverage of 93% of gene-drug associations in PharmGKB and 97% of the gene-variant mappings based on 180,000 PubMed abstracts. Availability: http://bioai4core.fulton.asu.edu/snpshot.

AB - Motivation: Genetic factors determine differences in pharmacokinetics, drug efficacy, and drug responses between individuals and sub-populations. Wrong dosages of drugs can lead to severe adverse drug reactions in individuals whose drug metabolism drastically differs from the " assumed average" Databases such as PharmGKB are excellent sources of pharmacogenetic information on enzymes, genetic variants, and drug response affected by changes in enzymatic activity. Here, we seek to aid researchers, database curators, and clinicians in their search for relevant information by automatically extracting these data from literature. Approach: We automatically populate a repository of information on genetic variants, relations to drugs, occurrence in sub-populations, and associations with disease. We mine textual data from PubMed abstracts to discover such genotype-phenotype associations, focusing on SNPs that can be associated with variations in drug response. The overall repository covers relations found between genes, variants, alleles, drugs, diseases, adverse drug reactions, populations, and allele frequencies. We cross-reference these data to EntrezGene, PharmGKB, PubChem, and others. Results: The performance regarding entity recognition and relation extraction yields a precision of 90-92% for the major entity types (gene, drug, disease), and 76-84% for relations involving these types. Comparison of our repository to PharmGKB reveals a coverage of 93% of gene-drug associations in PharmGKB and 97% of the gene-variant mappings based on 180,000 PubMed abstracts. Availability: http://bioai4core.fulton.asu.edu/snpshot.

KW - Databases

KW - Information extraction

KW - Pharmacogenomics

KW - Text mining

UR - http://www.scopus.com/inward/record.url?scp=84865981409&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84865981409&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2012.04.006

DO - 10.1016/j.jbi.2012.04.006

M3 - Article

C2 - 22564364

AN - SCOPUS:84865981409

SN - 1532-0464

VL - 45

SP - 842

EP - 850

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

IS - 5

ER -

A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this