@article{36aea5cb3cb048f488eb9a4aeb8ff649,
title = "Deep neural networks and distant supervision for geographic location mention extraction",
abstract = "Motivation: Virus phylogeographers rely on DNA sequences of viruses and the locations of the infected hosts found in public sequence databases like GenBank for modeling virus spread. However, the locations in GenBank records are often only at the country or state level, and may require phylogeographers to scan the journal articles associated with the records to identify more localized geographic areas. To automate this process, we present a named entity recognizer (NER) for detecting locations in biomedical literature. We built the NER using a deep feedforward neural network to determine whether a given token is a toponym or not. To overcome the limited human annotated data available for training, we use distant supervision techniques to generate additional samples to train our NER. Results: Our NER achieves an F1-score of 0.910 and significantly outperforms the previous stateof- the-art system. Using the additional data generated through distant supervision further boosts the performance of the NER achieving an F1-score of 0.927. The NER presented in this research improves over previous systems significantly. Our experiments also demonstrate the NER?s capability to embed external features to further boost the system?s performance. We believe that the same methodology can be applied for recognizing similar biomedical entities in scientific literature.",
author = "Arjun Magge and Davy Weissenbacher and Abeed Sarker and Matthew Scotch and Graciela Gonzalez-Hernandez",
note = "Funding Information: Thus, current G1-like viruses in southern China might have originally been introduced from Middle Eastern countries, or it is also likely that the virus spread the other way around, similar to the transmission of FIG. This work was supported by a Natural Sciences and Engineering Research Council of Canada discovery grant. Abbreviations: BJ and Bei, Beijing; Ck, chicken; Dk, duck. Virus Group State of isolation Date of isolation A/chicken/Nigeria/1071-1/2007 EMA1/ EMA2-2: 6-R07 Plateau Jan 2 A/chicken/Nigeria/1071-3/2007 EMA2 Sokoto Jan 5. The characterization of the swH3N2 / pH1N1 reassortant viruses from swine in the prov-ince of Quebec indicates that reassortment of gene segments had occurred between the North American swine H3N2. Centers for Disease Control and Prevention, Atlanta, Ga. Funding Information: Research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases (NIAID) of the National Institutes of Health (NIH) under grant number R01AI117011. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. Funding Information: AM designed and trained the neural network, ran the experiments, performed the error analysis and wrote most of the manuscript. DW proposed the idea of using of distant supervision for improving the CRF NER{\textquoteright}s performance in the previous manuscript, created the distant supervision dataset, supervised the experiments and wrote revisions of the manuscript. AS reviewed, restructured and contributed many sections and revisions of the manuscript. MS and GG provided overall guidance on the work and edited the final manuscript. The authors would also like to acknowledge Karen O{\textquoteright}Connor, Megan Rorison and Briana Trevino for their efforts in the annotation processes. The authors are grateful to the anonymous reviewers for their valuable feedback and comments to improve the quality of the paper. The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research. Publisher Copyright: {\textcopyright} The Author(s) 2018. Published by Oxford University Press. All rights reserved.",
year = "2018",
month = jul,
day = "1",
doi = "10.1093/bioinformatics/bty273",
language = "English (US)",
volume = "34",
pages = "i565--i573",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "13",
}