Semi-Automated Clinical Lexicon Induction and Its Use in Cohort Selection from Clinical Notes

Samarth Rawal, Ashok Prakash, Soumya Adhya, Sidharth Kulkarni, Saadat Anwar, Chitta Baral, Murthy Devarakonda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Special purpose lexicons are invaluable in biomedical natural language processing. They are especially crucial for a task such as the 13-criteria based cohort identification from clinical notes, process in N2C2 2018 Track 1 Challenge. While manually developed lexicons helped us achieve high performance, the process was ad hoc and nonreproducible. This paper presents a semi-Automated lexicon induction method, using Logistic Regression (LR) and word embeddings, which brings rigor to the process. The key idea was to use n-grams in the training corpus as features of LR and identify those features (n-grams) with the most impact on the outcome as the lexicon. The semi-Automatically generated lexicons achieved overall F measure of 0.9166 versus 0.9003 with manually generated lexicons. Therefore, this study shows that lexicons generated using a rigorous, semi-Automated approach can retain performance while bringing rigor to the process.

Original languageEnglish (US)
Title of host publication2020 IEEE International Conference on Healthcare Informatics, ICHI 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728153827
DOIs
StatePublished - Nov 2020
Event8th IEEE International Conference on Healthcare Informatics, ICHI 2020 - Virtual, Oldenburg, Germany
Duration: Nov 30 2020Dec 3 2020

Publication series

Name2020 IEEE International Conference on Healthcare Informatics, ICHI 2020

Conference

Conference8th IEEE International Conference on Healthcare Informatics, ICHI 2020
CountryGermany
CityVirtual, Oldenburg
Period11/30/2012/3/20

Keywords

  • clinical text
  • cohort selection
  • hybrid methods
  • lexicon induction
  • semi-Automated

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Decision Sciences (miscellaneous)
  • Modeling and Simulation
  • Medicine (miscellaneous)
  • Health Informatics
  • Health(social science)

Fingerprint Dive into the research topics of 'Semi-Automated Clinical Lexicon Induction and Its Use in Cohort Selection from Clinical Notes'. Together they form a unique fingerprint.

Cite this