A distributional semantics approach to simultaneous recognition of multiple classes of named entities

Siddhartha Jonnalagadda, Robert Leaman, Trevor Cohen, Graciela Gonzalez

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Named Entity Recognition and Classification is being studied for last two decades. Since semantic features take huge amount of training time and are slow in inference, the existing tools apply features and rules mainly at the word level or use lexicons. Recent advances in distributional semantics allow us to efficiently create paradigmatic models that encode word order. We used Sahlgren et al's permutation-based variant of the Random Indexing model to create a scalable and efficient system to simultaneously recognize multiple entity classes mentioned in natural language, which is validated on the GENIA corpus which has annotations for 46 biomedical entity classes and supports nested entities. Using distributional semantics features only, it achieves an overall micro-averaged Fmeasure of 67.3% based on fragment matching with performance ranging from 7.4% for "DNA substructure" to 80.7% for "Bioentity".

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages224-235
Number of pages12
Volume6008 LNCS
DOIs
StatePublished - 2010
Event11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010 - Iasi
Duration: Mar 21 2010Mar 27 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6008 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010
CityIasi
Period3/21/103/27/10

Fingerprint

Semantics
Named Entity Recognition
Substructure
Indexing
Natural Language
Annotation
Fragment
Permutation
DNA
Model
Class
Corpus
Training

Keywords

  • Biomedical
  • Classification
  • Distributional
  • Entity
  • GENIA
  • Multiple
  • Named
  • Recognition
  • Semantics

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Jonnalagadda, S., Leaman, R., Cohen, T., & Gonzalez, G. (2010). A distributional semantics approach to simultaneous recognition of multiple classes of named entities. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6008 LNCS, pp. 224-235). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6008 LNCS). https://doi.org/10.1007/978-3-642-12116-6_19

A distributional semantics approach to simultaneous recognition of multiple classes of named entities. / Jonnalagadda, Siddhartha; Leaman, Robert; Cohen, Trevor; Gonzalez, Graciela.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6008 LNCS 2010. p. 224-235 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6008 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jonnalagadda, S, Leaman, R, Cohen, T & Gonzalez, G 2010, A distributional semantics approach to simultaneous recognition of multiple classes of named entities. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 6008 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6008 LNCS, pp. 224-235, 11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010, Iasi, 3/21/10. https://doi.org/10.1007/978-3-642-12116-6_19
Jonnalagadda S, Leaman R, Cohen T, Gonzalez G. A distributional semantics approach to simultaneous recognition of multiple classes of named entities. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6008 LNCS. 2010. p. 224-235. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-12116-6_19
Jonnalagadda, Siddhartha ; Leaman, Robert ; Cohen, Trevor ; Gonzalez, Graciela. / A distributional semantics approach to simultaneous recognition of multiple classes of named entities. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6008 LNCS 2010. pp. 224-235 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{b8b2363c350a4822b42c8a835da87b61,
title = "A distributional semantics approach to simultaneous recognition of multiple classes of named entities",
abstract = "Named Entity Recognition and Classification is being studied for last two decades. Since semantic features take huge amount of training time and are slow in inference, the existing tools apply features and rules mainly at the word level or use lexicons. Recent advances in distributional semantics allow us to efficiently create paradigmatic models that encode word order. We used Sahlgren et al's permutation-based variant of the Random Indexing model to create a scalable and efficient system to simultaneously recognize multiple entity classes mentioned in natural language, which is validated on the GENIA corpus which has annotations for 46 biomedical entity classes and supports nested entities. Using distributional semantics features only, it achieves an overall micro-averaged Fmeasure of 67.3{\%} based on fragment matching with performance ranging from 7.4{\%} for {"}DNA substructure{"} to 80.7{\%} for {"}Bioentity{"}.",
keywords = "Biomedical, Classification, Distributional, Entity, GENIA, Multiple, Named, Recognition, Semantics",
author = "Siddhartha Jonnalagadda and Robert Leaman and Trevor Cohen and Graciela Gonzalez",
year = "2010",
doi = "10.1007/978-3-642-12116-6_19",
language = "English (US)",
isbn = "3642121152",
volume = "6008 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "224--235",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - A distributional semantics approach to simultaneous recognition of multiple classes of named entities

AU - Jonnalagadda, Siddhartha

AU - Leaman, Robert

AU - Cohen, Trevor

AU - Gonzalez, Graciela

PY - 2010

Y1 - 2010

N2 - Named Entity Recognition and Classification is being studied for last two decades. Since semantic features take huge amount of training time and are slow in inference, the existing tools apply features and rules mainly at the word level or use lexicons. Recent advances in distributional semantics allow us to efficiently create paradigmatic models that encode word order. We used Sahlgren et al's permutation-based variant of the Random Indexing model to create a scalable and efficient system to simultaneously recognize multiple entity classes mentioned in natural language, which is validated on the GENIA corpus which has annotations for 46 biomedical entity classes and supports nested entities. Using distributional semantics features only, it achieves an overall micro-averaged Fmeasure of 67.3% based on fragment matching with performance ranging from 7.4% for "DNA substructure" to 80.7% for "Bioentity".

AB - Named Entity Recognition and Classification is being studied for last two decades. Since semantic features take huge amount of training time and are slow in inference, the existing tools apply features and rules mainly at the word level or use lexicons. Recent advances in distributional semantics allow us to efficiently create paradigmatic models that encode word order. We used Sahlgren et al's permutation-based variant of the Random Indexing model to create a scalable and efficient system to simultaneously recognize multiple entity classes mentioned in natural language, which is validated on the GENIA corpus which has annotations for 46 biomedical entity classes and supports nested entities. Using distributional semantics features only, it achieves an overall micro-averaged Fmeasure of 67.3% based on fragment matching with performance ranging from 7.4% for "DNA substructure" to 80.7% for "Bioentity".

KW - Biomedical

KW - Classification

KW - Distributional

KW - Entity

KW - GENIA

KW - Multiple

KW - Named

KW - Recognition

KW - Semantics

UR - http://www.scopus.com/inward/record.url?scp=78650456992&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650456992&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-12116-6_19

DO - 10.1007/978-3-642-12116-6_19

M3 - Conference contribution

SN - 3642121152

SN - 9783642121159

VL - 6008 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 224

EP - 235

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -