A distributional semantics approach to simultaneous recognition of multiple classes of named entities

Siddhartha Jonnalagadda, Robert Leaman, Trevor Cohen, Graciela Gonzalez

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    7 Scopus citations

    Abstract

    Named Entity Recognition and Classification is being studied for last two decades. Since semantic features take huge amount of training time and are slow in inference, the existing tools apply features and rules mainly at the word level or use lexicons. Recent advances in distributional semantics allow us to efficiently create paradigmatic models that encode word order. We used Sahlgren et al's permutation-based variant of the Random Indexing model to create a scalable and efficient system to simultaneously recognize multiple entity classes mentioned in natural language, which is validated on the GENIA corpus which has annotations for 46 biomedical entity classes and supports nested entities. Using distributional semantics features only, it achieves an overall micro-averaged Fmeasure of 67.3% based on fragment matching with performance ranging from 7.4% for "DNA substructure" to 80.7% for "Bioentity".

    Original languageEnglish (US)
    Title of host publicationComputational Linguistics and Intelligent Text Processing - 11th International Conference, CICLing 2010, Proceedings
    Pages224-235
    Number of pages12
    DOIs
    StatePublished - Dec 29 2010
    Event11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010 - Iasi, Romania
    Duration: Mar 21 2010Mar 27 2010

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume6008 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010
    CountryRomania
    CityIasi
    Period3/21/103/27/10

      Fingerprint

    Keywords

    • Biomedical
    • Classification
    • Distributional
    • Entity
    • GENIA
    • Multiple
    • Named
    • Recognition
    • Semantics

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Jonnalagadda, S., Leaman, R., Cohen, T., & Gonzalez, G. (2010). A distributional semantics approach to simultaneous recognition of multiple classes of named entities. In Computational Linguistics and Intelligent Text Processing - 11th International Conference, CICLing 2010, Proceedings (pp. 224-235). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6008 LNCS). https://doi.org/10.1007/978-3-642-12116-6_19