Evaluating distributional semantic and feature selection for extracting relationships from biological text

Ehsan Emadzadeh, Siddhartha Jonnalagadda, Graciela Gonzalez

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    The constant flow of biomolecular findings being published each day challenges our ability to develop methods to automatically extract the knowledge expressed in text to potentially influence new discoveries. Finding relations between the biological entities (e.g. proteins and genes) in text is a challenging task. To facilitate the extraction process, a relation can be decomposed into a trigger and the complementary arguments (e.g. theme, site). Several approaches have been proposed based on machine learning which generally use a common set of features for all trigger types. Here we evaluate the impact of applying a feature selection method for trigger classification. Our proposed method uses a greedy feature selection algorithm to find an optimal set of attributes for each trigger type. We show that using the customized set of features can improve classification results significantly (up to 53.96% in f-measure). In addition, we evaluated different settings for including semantic features in the classifiers. We found that using semantic features can improve classification results and found the best setting for each trigger type.

    Original languageEnglish (US)
    Title of host publicationProceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011
    Pages66-71
    Number of pages6
    DOIs
    StatePublished - Dec 1 2011
    Event10th International Conference on Machine Learning and Applications, ICMLA 2011 - Honolulu, HI, United States
    Duration: Dec 18 2011Dec 21 2011

    Publication series

    NameProceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011
    Volume2

    Other

    Other10th International Conference on Machine Learning and Applications, ICMLA 2011
    CountryUnited States
    CityHonolulu, HI
    Period12/18/1112/21/11

      Fingerprint

    Keywords

    • Distributional Semantic
    • Feature selection
    • NLP
    • Relation Extraction

    ASJC Scopus subject areas

    • Computer Science Applications
    • Human-Computer Interaction

    Cite this

    Emadzadeh, E., Jonnalagadda, S., & Gonzalez, G. (2011). Evaluating distributional semantic and feature selection for extracting relationships from biological text. In Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011 (pp. 66-71). [6147050] (Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011; Vol. 2). https://doi.org/10.1109/ICMLA.2011.65