Deep neural networks and distant supervision for geographic location mention extraction

Arjun Magge, Davy Weissenbacher, Abeed Sarker, Matthew Scotch, Graciela Gonzalez-Hernandez

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

Motivation: Virus phylogeographers rely on DNA sequences of viruses and the locations of the infected hosts found in public sequence databases like GenBank for modeling virus spread. However, the locations in GenBank records are often only at the country or state level, and may require phylogeographers to scan the journal articles associated with the records to identify more localized geographic areas. To automate this process, we present a named entity recognizer (NER) for detecting locations in biomedical literature. We built the NER using a deep feedforward neural network to determine whether a given token is a toponym or not. To overcome the limited human annotated data available for training, we use distant supervision techniques to generate additional samples to train our NER. Results: Our NER achieves an F1-score of 0.910 and significantly outperforms the previous stateof- the-art system. Using the additional data generated through distant supervision further boosts the performance of the NER achieving an F1-score of 0.927. The NER presented in this research improves over previous systems significantly. Our experiments also demonstrate the NER?s capability to embed external features to further boost the system?s performance. We believe that the same methodology can be applied for recognizing similar biomedical entities in scientific literature.

Original languageEnglish (US)
Pages (from-to)i565-i573
JournalBioinformatics
Volume34
Issue number13
DOIs
StatePublished - Jul 1 2018

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Deep neural networks and distant supervision for geographic location mention extraction'. Together they form a unique fingerprint.

Cite this