Abstract

The proportional recovery rule asserts that most stroke survivors recover a fixed proportion of lost function. To the extent that this is true, recovery from stroke can be predicted accurately from baseline measures of acute post-stroke impairment alone. Reports that baseline scores explain more than 80%, and sometimes more than 90%, of the variance in the patients’ recoveries, are rapidly accumulating. Here, we show that these headline effect sizes are likely inflated. The key effects in this literature are typically expressed as, or reducible to, correlation coefficients between baseline scores and recovery (outcome scores minus baseline scores). Using formal analyses and simulations, we show that these correlations will be extreme when outcomes are significantly less variable than baselines, which they often will be in practice regardless of the real relationship between outcomes and baselines. We show that these effect sizes are likely to be over-optimistic in every empirical study that we found that reported enough information for us to make the judgement, and argue that the same is likely to be true in other studies as well. The implication is that recovery after stroke may not be as proportional as recent studies suggest.

Introduction

Clinicians and researchers have long known stroke patients’ initial symptom severity is related to their longer term outcomes(So you have known for 33 years that you are failing stroke survivors? What have you done to solve that? NOTHING?) (Jongbloed, 1986). Recent studies have suggested that this relationship is stronger than previously thought: that most patients recover a fixed proportion of lost function. Studies supporting this ‘proportional recovery rule’ are rapidly accumulating (Stinear, 2017): in five studies since 2015 (Byblow et al., 2015; Feng et al., 2015; Winters et al., 2015; Buch et al., 2016; Stinear et al., 2017), researchers used the Fugl-Meyer scale to assess patients’ upper limb motor impairment within 2 weeks of stroke onset (‘baselines’), and then again either 3 or 6 months post-stroke (‘outcomes’). The results were consistent with earlier observations (Prabhakaran et al., 2008; Zarahn et al., 2011) that most patients recovered ∼70% of lost function. Taken together, these studies report highly consistent recovery in over 500 patients, across different countries with different approaches to rehabilitation, regardless of the patients’ ages at stroke onset, stroke type, sex, or therapy dose (Stinear, 2017). And there is increasing evidence that the rule also captures recovery from post-stroke impairments of lower limb function (Smith et al., 2017), attention (Marchi et al., 2017; Winters et al., 2017), and language (Lazar et al., 2010; Marchi et al., 2017), and may even apply generally across cognitive domains (Ramsey et al., 2017). Even rats appear to recover proportionally after stroke (Jeffers et al., 2018).
Strikingly, many of these studies report that the baseline scores predict 80–90%, or more, of the variance in empirical recovery. When predicting behavioural responses in humans, these effect sizes are unprecedented. Recently, Winters and colleagues (2015) reported that recovery predicted from baseline scores explained 94% of the variance in the empirical recovery of 146 stroke patients. Like many related reports (Stinear, 2017), this study also reported a group of (n = 65) ‘non-fitters’, who did not make the predicted recovery. But if non-fitters can be distinguished at the acute stage, as this and other studies suggest (Stinear, 2017), the implication is that we can predict most patients’ recovery near-perfectly, given baseline scores alone. Stroke researchers are used to thinking of recovery as a complex, multi-factorial process (Nelson et al., 2016). If the proportional recovery rule is as powerful as it seems, post-stroke recovery is simpler and more consistent than previously thought.
In what follows, we argue that the empirical support for proportional recovery is weaker than it seems. These results are typically expressed as, or reducible to, correlations between baselines and recovery (outcomes minus baselines). These analyses pose well known challenges that have been discussed by statisticians for decades (Lord, 1956; Oldham, 1962; Cronbach and Furby, 1970; Hayes, 1988; Tu et al., 2005). Much of this discussion is focused on problems induced by measurement noise, and measurement noise was also the focus of the only prior application of that discussion to the proportional recovery rule (Krakauer and Marshall, 2015). Here, we argue that empirical studies of proportional recovery after stroke are likely confounded entirely regardless of measurement noise.
Our argument is that: (i) correlations between baselines and recovery are spurious when they are stronger than correlations between baselines and outcomes; (ii) this is likely when outcomes are less variable than baselines; which (iii) will often happen in practice, whether or not recovery is proportional. This argument follows from a formal analysis of correlations between baselines and recovery, which we introduce below and illustrate with examples. Armed with that analysis, we then re-examine the empirical support for the proportional recovery rule.

The relationships between baselines, outcomes, and recovery

For the sake of brevity, we define ‘baselines’ = X, ‘outcomes’ = Y, and ‘change’ (recovery) = Δ: i.e. Y − X. The ‘correlation between baselines and outcomes’ is r(X,Y), and the ‘correlation between baselines and change’ is r(X,Δ). Finally, we define the ‘variability ratio’ as the ratio of the standard deviation (σ) of Y to the standard deviation of X: σYX.
X and Y are construed as lists of scores, with each entry being the performance of a single patient at the specified time point. We assume that higher scores imply better performance, so r(X,Δ) will be negative if recovery is proportional (to lost function). One can equally substitute ‘lost function’ (e.g. maximum score minus actual score), for ‘baseline score’, but while this makes r(X,Δ) positive if recovery is proportional, it is otherwise equivalent.

Strong correlations imply the potential for accurate predictions

Strong correlations between any two variables typically imply that we can use either variable to predict the other. Out-of-sample predictions should tend toward the least-squares line defined by the original (in-sample) correlation. Some empirical studies use this logic to derive ‘predicted recovery’ (pΔ) from the least-squares line for r(X,Δ), reporting r(pΔ,Δ) instead of r(X,Δ) (Winters et al., 2015; Marchi et al., 2017). Since the magnitudes of r(X,Δ) and r(pΔ,Δ) are the same by definition (see Fig. 1 and Supplementary material, proposition 8 in Appendix A), the preference for either expression over the other is arguably cosmetic.
Figure 1
A canonical example of spurious r(X, Δ). Baselines scores are uncorrelated with outcomes (A), but baseline scores appear to be strongly correlated with recovery (B). That correlation can be used to derive predicted recovery, which is strongly correlated with empirical recovery (C), but predicted outcomes, derived from that predicted recovery, are still uncorrelated with empirical outcomes (D).
A canonical example of spurious r(X, Δ). Baselines scores are uncorrelated with outcomes (A), but baseline scores appear to be strongly correlated with recovery (B). That correlation can be used to derive predicted recovery, which is strongly correlated with empirical recovery (C), but predicted outcomes, derived from that predicted recovery, are still uncorrelated with empirical outcomes (D).
Nevertheless, the correlation between predicted and empirical data is a common measure of predictive accuracy: the stronger the correlation, the better the predictions. Very strong correlations are unusual when predicting behavioural performance in humans—both because behaviour itself is complex, and because of measurement noise in behavioural assessment. Once r(pΔ,Δ) > ∼0.95, for example (Winters et al., 2015), this prognostic problem has seemingly been ‘solved’ more accurately than many might have thought possible.

r(X,Δ) is spurious when (non-trivially) stronger than r(X,Y)

Recovery is precisely the difference between baselines and outcomes. When r(X,Δ) is strong, implying that we can predict recovery accurately given baselines, it is tempting to assume that we can also predict outcomes equally accurately, by simply adding predicted recovery to baselines. More formally, the assumption is that r(X + pΔ,Y) ≈ r(pΔ,Δ). This assumption is wrong.
In fact, r(X + pΔ,Y) ≈ r(X,Y) (see Fig. 1 and Supplementary material, proposition 8 in Appendix A). When recovery is predicted from baselines, the correlation between ‘baselines plus predicted recovery’ and outcomes, is never stronger than the correlation between baselines and outcomes. When r(X,Δ) is (substantially) stronger than r(X,Y), r(X,Δ) is ‘spurious’, because it encourages an over-optimistic impression of how predictable outcomes are, given baselines.

The canonical example of spurious r(X,Δ)

The canonical example of spurious r(X,Δ) is when X and Y are independent random variables with the same variance: σYX ≈ 1 and r(X,Y) ≈ 0, but r(X,Δ) ≈ −0.71 (Oldham, 1962). This r(X,Δ) suggests that we can predict recovery relatively well, but we cannot use ‘predicted recovery’ to predict outcomes equally well (Fig. 1).
Krakauer and Marshall (2015) recently argued that this scenario has little relevance to (most) empirical studies of recovery after stroke. This is because: (i) spurious r(X,Δ) only emerge here when r(X,Y) is weak; and (ii) empirical r(X,Y) are usually strong, because X and Y are dependent, repeated measurements from the same patients. If spurious r(X,Δ) only or mainly emerged when σYX ≈ 1 and r(X,Y) ≈ 0, they might indeed be irrelevant in practice. Unfortunately, spurious r(X,Δ) also emerge in another scenario, which is very common in studies of recovery after stroke.

Spurious r(X,Δ) are likely when σYX is small

For any X and Y, it can be shown that: