Deans' stroke musings: Prediction of 30-Day Readmission After Stroke Using Machine Learning and Natural Language Processing

Changing stroke rehab and research worldwide now.Time is Brain! trillions and trillions of neurons that DIE each day because there are NO effective hyperacute therapies besides tPA(only 12% effective). I have 523 posts on hyperacute therapy, enough for researchers to spend decades proving them out. These are my personal ideas and blog on stroke rehabilitation and stroke research. Do not attempt any of these without checking with your medical provider. Unless you join me in agitating, when you need these therapies they won't be there.

What this blog is for:

My blog is not to help survivors recover, it is to have the 10 million yearly stroke survivors light fires underneath their doctors, stroke hospitals and stroke researchers to get stroke solved. 100% recovery. The stroke medical world is completely failing at that goal, they don't even have it as a goal. Shortly after getting out of the hospital and getting NO information on the process or protocols of stroke rehabilitation and recovery I started searching on the internet and found that no other survivor received useful information. This is an attempt to cover all stroke rehabilitation information that should be readily available to survivors so they can talk with informed knowledge to their medical staff. It lays out what needs to be done to get stroke survivors closer to 100% recovery. It's quite disgusting that this information is not available from every stroke association and doctors group.

Thursday, July 15, 2021

Prediction of 30-Day Readmission After Stroke Using Machine Learning and Natural Language Processing

All you have to do is change one word in your research and you would have done something useful. Prediction to Prevention.

Prediction of 30-Day Readmission After Stroke Using Machine Learning and Natural Language Processing

Christina M. Lineback¹,

Ravi Garg²,

Elissa Oh²,

Andrew M. Naidech^1,2,

Jane L. Holl² and

Shyam Prabhakaran³^*

¹Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
²Department of Neurology, Biological Sciences, Division and Center for Healthcare Delivery Science and Innovation, University of Chicago, Chicago, IL, United States
³Department of Neurology, University of Chicago, Chicago, IL, United States

Background and Purpose: This study aims to determine whether machine learning (ML) and natural language processing (NLP) from electronic health records (EHR) improve the prediction of 30-day readmission after stroke.

Methods: Among index stroke admissions between 2011 and 2016 at an academic medical center, we abstracted discrete data from the EHR on demographics, risk factors, medications, hospital complications, and discharge destination and unstructured textual data from clinician notes. Readmission was defined as any unplanned hospital admission within 30 days of discharge. We developed models to predict two separate outcomes, as follows: (1) 30-day all-cause readmission and (2) 30-day stroke readmission. We compared the performance of logistic regression with advanced ML algorithms. We used several NLP methods to generate additional features from unstructured textual reports. We evaluated the performance of prediction models using a five-fold validation and tested the best model in a held-out test dataset. Areas under the curve (AUCs) were used to compare discrimination of each model.

Results: In a held-out test dataset, advanced ML methods along with NLP features out performed logistic regression for all-cause readmission (AUC, 0.64 vs. 0.58; p < 0.001) and stroke readmission prediction (AUC, 0.62 vs. 0.52; p < 0.001).

Conclusion: NLP-enhanced machine learning models potentially advance our ability to predict readmission after stroke. However, further improvement is necessary before being implemented in clinical practice given the weak discrimination.

Introduction

Nearly 800,000 patients experience a stroke each year in the USA (1). The cost of initial admissions for stroke averages US$20,000 while readmissions cost on average US$10,000 (1–3). Reduction in readmission is, thus, an important target to reduce healthcare costs and improve patient care. However, several studies have demonstrated that available prediction models for readmission perform modestly (4, 5). A better understanding of the causes leading to readmission and better prediction tools may allow hospital systems to better allocate resources to the patients who are most at risk for readmission (6, 7).

Prior efforts to stratify risk of readmission have utilized basic statistical models, such as logistic regression, with modest results (AUC range: 0.53–0.67) (5, 7, 8). However, these studies do not report results on a separate held out dataset thereby not addressing the generalizability of these results. Also, since these methods are trained and validated on the same datasets, the results are highly prone to be inflated due to overfitting. Furthermore, logistic regression base models are incapable of properly weighing the interactions between the complex variables in additive analyses (4, 9).

Machine learning (10) (ML) has emerged as a new statistical approach to overcome the limitation of non-linearity and improve predictive analysis in healthcare. Advanced ML methods have shown to be superior for predicting readmission in patients with heart failure (11). Furthermore, natural language processing (NLP) methods can be utilized to automatically extract much of the rich but difficult-to-access medical information that is often buried in unstructured text notes within electronic health records (EHR). There has been widespread interest to use ML in conjunction with NLP to build clinical tools for cohort construction, clinical trials, and clinical decision support (9, 12). There has been, however, no study to use NLP of clinical notes and ML to predict readmissions after stroke. We, therefore, sought to evaluate advanced ML algorithms that incorporate NLP features of textual data in the EHR to improve prediction of 30-day readmission after stroke. We also seek to evaluate our models on a separate held out dataset in order to test the generalizability of our results.