Using natural language processing and machine learning to classify health literacy from secure messages: The ECLIPPSE study

Renu Balyan; Scott A. Crossley; William Brown; Andrew J. Karter; Danielle McNamara; Jennifer Y. Liu; Courtney R. Lyles; Dean Schillinger

doi:10.1371/journal.pone.0212488

Using natural language processing and machine learning to classify health literacy from secure messages: The ECLIPPSE study

Renu Balyan, Scott A. Crossley, William Brown, Andrew J. Karter, Danielle McNamara, Jennifer Y. Liu, Courtney R. Lyles, Dean Schillinger

Research output: Contribution to journal › Article › peer-review

19 Scopus citations

Abstract

Limited health literacy is a barrier to optimal healthcare delivery and outcomes. Current measures requiring patients to self-report limitations are time-consuming and may be considered intrusive by some. This makes widespread classification of patient health literacy challenging. The objective of this study was to develop and validate “literacy profiles” as automated indicators of patients’ health literacy to facilitate a non-intrusive, economic and more comprehensive characterization of health literacy among a health care delivery system’s membership. To this end, three literacy profiles were generated based on natural language processing (combining computational linguistics and machine learning) using a sample of 283,216 secure messages sent from 6,941 patients to their primary care physicians. All patients were participants in Kaiser Permanente Northern California’s DISTANCE Study. Performance of the three literacy profiles were compared against a gold standard of patient self-reported health literacy. Associations were analyzed between each literacy profile and patient demographics, health outcomes and healthcare utilization. T-tests were used for numeric data such as A1C, Charlson comorbidity index and healthcare utilization rates, and chi-square tests for categorical data such as sex, race, poor adherence and severe hypoglycemia. Literacy profiles varied in their test characteristics, with C-statistics ranging from 0.61–0.74. Relations between literacy profiles and health outcomes revealed patterns consistent with previous health literacy research: patients identified via literacy profiles indicative of limited health literacy: (a) were older and more likely of minority status; (b) had poorer medication adherence and glycemic control; and (c) exhibited higher rates of hypoglycemia, comorbidities and healthcare utilization. This represents the first successful attempt to employ natural language processing to estimate health literacy. Literacy profiles can offer an automated and economical way to identify patients with limited health literacy and greater vulnerability to poor health outcomes.

Original language	English (US)
Article number	e0212488
Journal	PloS one
Volume	14
Issue number	2
DOIs	https://doi.org/10.1371/journal.pone.0212488
State	Published - Feb 2019

ASJC Scopus subject areas

General

Access to Document

10.1371/journal.pone.0212488

Cite this

@article{5a3b7c7833a94b8aa1e3e696b9b8cc5c,

title = "Using natural language processing and machine learning to classify health literacy from secure messages: The ECLIPPSE study",

abstract = "Limited health literacy is a barrier to optimal healthcare delivery and outcomes. Current measures requiring patients to self-report limitations are time-consuming and may be considered intrusive by some. This makes widespread classification of patient health literacy challenging. The objective of this study was to develop and validate “literacy profiles” as automated indicators of patients{\textquoteright} health literacy to facilitate a non-intrusive, economic and more comprehensive characterization of health literacy among a health care delivery system{\textquoteright}s membership. To this end, three literacy profiles were generated based on natural language processing (combining computational linguistics and machine learning) using a sample of 283,216 secure messages sent from 6,941 patients to their primary care physicians. All patients were participants in Kaiser Permanente Northern California{\textquoteright}s DISTANCE Study. Performance of the three literacy profiles were compared against a gold standard of patient self-reported health literacy. Associations were analyzed between each literacy profile and patient demographics, health outcomes and healthcare utilization. T-tests were used for numeric data such as A1C, Charlson comorbidity index and healthcare utilization rates, and chi-square tests for categorical data such as sex, race, poor adherence and severe hypoglycemia. Literacy profiles varied in their test characteristics, with C-statistics ranging from 0.61–0.74. Relations between literacy profiles and health outcomes revealed patterns consistent with previous health literacy research: patients identified via literacy profiles indicative of limited health literacy: (a) were older and more likely of minority status; (b) had poorer medication adherence and glycemic control; and (c) exhibited higher rates of hypoglycemia, comorbidities and healthcare utilization. This represents the first successful attempt to employ natural language processing to estimate health literacy. Literacy profiles can offer an automated and economical way to identify patients with limited health literacy and greater vulnerability to poor health outcomes.",

author = "Renu Balyan and Crossley, {Scott A.} and William Brown and Karter, {Andrew J.} and Danielle McNamara and Liu, {Jennifer Y.} and Lyles, {Courtney R.} and Dean Schillinger",

note = "Publisher Copyright: {\textcopyright} 2019 Balyan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.",

year = "2019",

month = feb,

doi = "10.1371/journal.pone.0212488",

language = "English (US)",

volume = "14",

journal = "PloS one",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "2",

}

TY - JOUR

T1 - Using natural language processing and machine learning to classify health literacy from secure messages

T2 - The ECLIPPSE study

AU - Balyan, Renu

AU - Crossley, Scott A.

AU - Brown, William

AU - Karter, Andrew J.

AU - McNamara, Danielle

AU - Liu, Jennifer Y.

AU - Lyles, Courtney R.

AU - Schillinger, Dean

N1 - Publisher Copyright: © 2019 Balyan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PY - 2019/2

Y1 - 2019/2

N2 - Limited health literacy is a barrier to optimal healthcare delivery and outcomes. Current measures requiring patients to self-report limitations are time-consuming and may be considered intrusive by some. This makes widespread classification of patient health literacy challenging. The objective of this study was to develop and validate “literacy profiles” as automated indicators of patients’ health literacy to facilitate a non-intrusive, economic and more comprehensive characterization of health literacy among a health care delivery system’s membership. To this end, three literacy profiles were generated based on natural language processing (combining computational linguistics and machine learning) using a sample of 283,216 secure messages sent from 6,941 patients to their primary care physicians. All patients were participants in Kaiser Permanente Northern California’s DISTANCE Study. Performance of the three literacy profiles were compared against a gold standard of patient self-reported health literacy. Associations were analyzed between each literacy profile and patient demographics, health outcomes and healthcare utilization. T-tests were used for numeric data such as A1C, Charlson comorbidity index and healthcare utilization rates, and chi-square tests for categorical data such as sex, race, poor adherence and severe hypoglycemia. Literacy profiles varied in their test characteristics, with C-statistics ranging from 0.61–0.74. Relations between literacy profiles and health outcomes revealed patterns consistent with previous health literacy research: patients identified via literacy profiles indicative of limited health literacy: (a) were older and more likely of minority status; (b) had poorer medication adherence and glycemic control; and (c) exhibited higher rates of hypoglycemia, comorbidities and healthcare utilization. This represents the first successful attempt to employ natural language processing to estimate health literacy. Literacy profiles can offer an automated and economical way to identify patients with limited health literacy and greater vulnerability to poor health outcomes.

AB - Limited health literacy is a barrier to optimal healthcare delivery and outcomes. Current measures requiring patients to self-report limitations are time-consuming and may be considered intrusive by some. This makes widespread classification of patient health literacy challenging. The objective of this study was to develop and validate “literacy profiles” as automated indicators of patients’ health literacy to facilitate a non-intrusive, economic and more comprehensive characterization of health literacy among a health care delivery system’s membership. To this end, three literacy profiles were generated based on natural language processing (combining computational linguistics and machine learning) using a sample of 283,216 secure messages sent from 6,941 patients to their primary care physicians. All patients were participants in Kaiser Permanente Northern California’s DISTANCE Study. Performance of the three literacy profiles were compared against a gold standard of patient self-reported health literacy. Associations were analyzed between each literacy profile and patient demographics, health outcomes and healthcare utilization. T-tests were used for numeric data such as A1C, Charlson comorbidity index and healthcare utilization rates, and chi-square tests for categorical data such as sex, race, poor adherence and severe hypoglycemia. Literacy profiles varied in their test characteristics, with C-statistics ranging from 0.61–0.74. Relations between literacy profiles and health outcomes revealed patterns consistent with previous health literacy research: patients identified via literacy profiles indicative of limited health literacy: (a) were older and more likely of minority status; (b) had poorer medication adherence and glycemic control; and (c) exhibited higher rates of hypoglycemia, comorbidities and healthcare utilization. This represents the first successful attempt to employ natural language processing to estimate health literacy. Literacy profiles can offer an automated and economical way to identify patients with limited health literacy and greater vulnerability to poor health outcomes.

UR - http://www.scopus.com/inward/record.url?scp=85061967086&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061967086&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0212488

DO - 10.1371/journal.pone.0212488

M3 - Article

C2 - 30794616

AN - SCOPUS:85061967086

SN - 1932-6203

VL - 14

JO - PloS one

JF - PloS one

IS - 2

M1 - e0212488

ER -

Using natural language processing and machine learning to classify health literacy from secure messages: The ECLIPPSE study

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this