All you have to do is change one word in your research and you would have done something useful. Prediction to Prevention.
Prediction of 30-Day Readmission After Stroke Using Machine Learning and Natural Language Processing
- 1Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
- 2Department of Neurology, Biological Sciences, Division and Center for Healthcare Delivery Science and Innovation, University of Chicago, Chicago, IL, United States
- 3Department of Neurology, University of Chicago, Chicago, IL, United States
Background and Purpose: This study aims to determine whether machine learning (ML) and natural language processing (NLP) from electronic health records (EHR) improve the prediction of 30-day readmission after stroke.
Methods: Among index stroke admissions between 2011 and 2016 at an academic medical center, we abstracted discrete data from the EHR on demographics, risk factors, medications, hospital complications, and discharge destination and unstructured textual data from clinician notes. Readmission was defined as any unplanned hospital admission within 30 days of discharge. We developed models to predict two separate outcomes, as follows: (1) 30-day all-cause readmission and (2) 30-day stroke readmission. We compared the performance of logistic regression with advanced ML algorithms. We used several NLP methods to generate additional features from unstructured textual reports. We evaluated the performance of prediction models using a five-fold validation and tested the best model in a held-out test dataset. Areas under the curve (AUCs) were used to compare discrimination of each model.
Results: In a held-out test dataset, advanced ML methods along with NLP features out performed logistic regression for all-cause readmission (AUC, 0.64 vs. 0.58; p < 0.001) and stroke readmission prediction (AUC, 0.62 vs. 0.52; p < 0.001).
Conclusion: NLP-enhanced machine learning models potentially advance our ability to predict readmission after stroke. However, further improvement is necessary before being implemented in clinical practice given the weak discrimination.
Introduction
Nearly 800,000 patients experience a stroke each year in the USA (1). The cost of initial admissions for stroke averages US$20,000 while readmissions cost on average US$10,000 (1–3). Reduction in readmission is, thus, an important target to reduce healthcare costs and improve patient care. However, several studies have demonstrated that available prediction models for readmission perform modestly (4, 5). A better understanding of the causes leading to readmission and better prediction tools may allow hospital systems to better allocate resources to the patients who are most at risk for readmission (6, 7).
Prior efforts to stratify risk of readmission have utilized basic statistical models, such as logistic regression, with modest results (AUC range: 0.53–0.67) (5, 7, 8). However, these studies do not report results on a separate held out dataset thereby not addressing the generalizability of these results. Also, since these methods are trained and validated on the same datasets, the results are highly prone to be inflated due to overfitting. Furthermore, logistic regression base models are incapable of properly weighing the interactions between the complex variables in additive analyses (4, 9).
Machine learning (10) (ML) has emerged as a new statistical approach to overcome the limitation of non-linearity and improve predictive analysis in healthcare. Advanced ML methods have shown to be superior for predicting readmission in patients with heart failure (11). Furthermore, natural language processing (NLP) methods can be utilized to automatically extract much of the rich but difficult-to-access medical information that is often buried in unstructured text notes within electronic health records (EHR). There has been widespread interest to use ML in conjunction with NLP to build clinical tools for cohort construction, clinical trials, and clinical decision support (9, 12). There has been, however, no study to use NLP of clinical notes and ML to predict readmissions after stroke. We, therefore, sought to evaluate advanced ML algorithms that incorporate NLP features of textual data in the EHR to improve prediction of 30-day readmission after stroke. We also seek to evaluate our models on a separate held out dataset in order to test the generalizability of our results.
More at link.
No comments:
Post a Comment