So, you goddamn fuckers are trying to justify complete failure in getting survivors to 100% recovery? And proportional recovery is your excuse? Using calculations and graphs to make it seem that your epic failure is logical?
The goal is always 100% recovery, NOT THE FUCKING TYRANNY OF LOW EXPECTATIONS you are trying to pass off as expected. I would have you fired on the spot. Using Fugl-Meyer for anything in stroke is the height of stupidity, nothing objective in it, so nothing is repeatable.
Recovery after stroke: not so proportional after all?
Brain, Volume 142, Issue 1, January 2019, Pages 15–22, https://doi.org/10.1093/brain/awy302
Published:
07 December 2018
(1)
Figure 2

The relationship between r(X,Y), r(X,Δ) and σY/σX. Note that the x-axis is log-transformed to ensure symmetry around 1; when X and Y are equally variable, log(σY/σX) = 0. Supplementary material, proposition 7 in Appendix A, provides a justification for unambiguously using a ratio of standard deviations in this figure, rather than σY and σX as separate axes. The two major regimes of Equation 1
are also marked in red. In Regime 1, Y is more variable than X, so
contributes more variance to Δ, and r(X,Δ) ≈ r(X,Y). In Regime 2, X is
more variable than Y, so X contributes more variance to Δ, and r(X,Δ) ≈
r(X,−X) (i.e. −1). The transition between the two regimes, when the
variability ratio is not dramatically skewed either way, also allows for
spurious r(X,Δ). For the purposes of illustration, the figure also
highlights six points of interest on the surface, marked A–F; examples
of simulated recovery data corresponding to these points are provided in
Fig. 3.

Figure 3
![Exemplar points on the surface in Fig. 2. Simulated recovery data, corresponding to the points A–F marked on the surface in Fig. 1. (A) Baselines and outcomes are entirely independent [r(X,Y) = 0], yet r(X,Δ) is relatively strong; this is the canonical example of mathematical coupling, first introduced by Oldham (1962). (B) Recovery is constant with minimal noise, so baselines and outcomes are equally variable (σY/σX ≈ 1) and recovery is unrelated to baseline scores (r(X, Δ) ≈ 0). (C and D) Outcomes are more variable than baselines (σY/σX ≈ 5), and r(X,Δ) converges to r(X,Y). (E) Recovery is 70% of lost function, so outcomes are less variable than baselines (σY/σX ≈ 0.3); even with shuffled outcomes data (F) baselines and recovery still appear to be strongly correlated.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/brain/142/1/10.1093_brain_awy302/1/m_awy302f3.png?Expires=1569259452&Signature=MW8JIMOS2VHeLZjy~GYmAzIHukZTD7mhAcBpDeozIemPfiYJujK91~bxZOHmaXGPf4pCIrjHIV8MuifvbqhLgcjDYTElMYILUZgr-sAc9oCztDdHMSSUGD9XNjhxiY06gc7WBfZlS9QXEMqk9eQHSU-8H~zzpLWop1sSfOFX0KlW9wjJqkgNLsyDixjCKNDoa0otWhFzBxH9aOzluIbB5ZrwQylqU6Cc0V3QJ4N749gD3tc~XcOdf0ApH08b-sTBlJnc1BGO2pl677Ewa6PlpHSSsiYSw~bp4Eu8PZYI2PHZFa2QEbrfz-xloyD5ObTZ825Yn9K1h2GcetxCYhe9Vw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Exemplar points on the surface in Fig. 2. Simulated recovery data, corresponding to the points A–F marked on the surface in Fig. 1. (A)
Baselines and outcomes are entirely independent [r(X,Y) = 0], yet
r(X,Δ) is relatively strong; this is the canonical example of
mathematical coupling, first introduced by Oldham (1962). (B) Recovery is constant with minimal noise, so baselines and outcomes are equally variable (σY/σX ≈ 1) and recovery is unrelated to baseline scores (r(X, Δ) ≈ 0). (C and D) Outcomes are more variable than baselines (σY/σX ≈ 5), and r(X,Δ) converges to r(X,Y). (E) Recovery is 70% of lost function, so outcomes are less variable than baselines (σY/σX ≈ 0.3); even with shuffled outcomes data (F) baselines and recovery still appear to be strongly correlated.
![Exemplar points on the surface in Fig. 2. Simulated recovery data, corresponding to the points A–F marked on the surface in Fig. 1. (A) Baselines and outcomes are entirely independent [r(X,Y) = 0], yet r(X,Δ) is relatively strong; this is the canonical example of mathematical coupling, first introduced by Oldham (1962). (B) Recovery is constant with minimal noise, so baselines and outcomes are equally variable (σY/σX ≈ 1) and recovery is unrelated to baseline scores (r(X, Δ) ≈ 0). (C and D) Outcomes are more variable than baselines (σY/σX ≈ 5), and r(X,Δ) converges to r(X,Y). (E) Recovery is 70% of lost function, so outcomes are less variable than baselines (σY/σX ≈ 0.3); even with shuffled outcomes data (F) baselines and recovery still appear to be strongly correlated.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/brain/142/1/10.1093_brain_awy302/1/m_awy302f3.png?Expires=1569259452&Signature=MW8JIMOS2VHeLZjy~GYmAzIHukZTD7mhAcBpDeozIemPfiYJujK91~bxZOHmaXGPf4pCIrjHIV8MuifvbqhLgcjDYTElMYILUZgr-sAc9oCztDdHMSSUGD9XNjhxiY06gc7WBfZlS9QXEMqk9eQHSU-8H~zzpLWop1sSfOFX0KlW9wjJqkgNLsyDixjCKNDoa0otWhFzBxH9aOzluIbB5ZrwQylqU6Cc0V3QJ4N749gD3tc~XcOdf0ApH08b-sTBlJnc1BGO2pl677Ewa6PlpHSSsiYSw~bp4Eu8PZYI2PHZFa2QEbrfz-xloyD5ObTZ825Yn9K1h2GcetxCYhe9Vw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
When σY/σX is large, Y contributes more variance to Y − X (Δ), and r(X,Δ) ≈ r(X,Y); this is Regime 1. Points C and D illustrate the convergence (Fig. 3C and D). By contrast, when σY/σX is small, X contributes more variance to Y − X, and r(X,Δ) ≈ r(X, −X): i.e. −1 (Supplementary material, Appendix A, theorem 2); this is Regime 2, where the confound emerges. Point E, near Regime 2, corresponds to data in which all patients recover proportionally (Δ = 70% of lost function; Fig. 2E). Here, σY/σX is already small enough (0.3) to be dangerous: after randomly shuffling Y, r(X,Y) ≈ 0, but r(X,Δ) is almost unaffected (Point F, and Fig. 3F). In other words, if even the proportional recovery rule is approximately right, empirical data may enter territory, on the surface in Fig. 2, where over-optimistic r(X,Δ) are likely.
σY/σX may be small, whether or not recovery is proportional
Proportional recovery implies small σY/σX, but small σY/σX does not imply proportional recovery; for example, constant recovery with ceiling effects will produce the same result. To illustrate this, we ran 1000 simulations in which: (i) 1000 baseline scores are drawn randomly with uniform probability from the range 0–65 (i.e. impaired on the 66-point Fugl-Meyer upper-extremity scale); (ii) outcome scores were calculated as the baseline scores plus half the scale’s range (33); and (iii) outcome scores greater than 66 were set to 66 (i.e. a hard ceiling). Mean r(X,Y) and r(X,Δ) were calculated both before and after shuffling the outcomes data for each simulation. After shuffling, r(X,Y) ≈ 0 and r(X,Δ) = −0.88: ceiling effects make σY/σX small enough to encourage spurious r(X,Δ). And just as importantly, before shuffling, r(X,Y) = 0.89 and r(X,Δ) = −0.90: even when r(X,Δ) is not spurious [because r(X,Y) is similarly strong], we cannot conclude that recovery is really proportional.Re-examining the empirical literature on proportional recovery
The relationships between r(X,Y), r(X,Δ) and σY/σX merit a re-examination of the empirical support for the proportional recovery rule. In the only study we found, which reports individuals’ behavioural data, Zarahn and colleagues (2011) consider 30 patients’ recoveries from hemiparesis after stroke. Across the whole sample, r(X,Y) = 0.80 and r(X,Δ) = −0.49; after removing seven non-fitters: r(X,Y) = 0.75 and r(X,Δ) = −0.95. Removing the non-fitters increases the apparent predictability of recovery but reduces the predictability of outcomes (and reduces σY/σX from 0.88 to 0.36). Notably, the residuals for both correlations are identical (Fig. 4), and in fact this is always true (Supplementary material, proposition 9 in Appendix A,). r(X,Δ) has the same errors as r(X,Y), but a larger effect size: r(X,Δ) is over-optimistic.
Figure 4

r(X,Y) and r(X,Δ) have the same residuals.Left: Least squares linear fits for analyses relating baselines to (top) outcomes and (bottom) recovery, using the fitters’ data reported by Zarahn et al. (2011). Middle: Plots of residuals relative to each least squares line, against the fitted values in each case. Right:
A scatter plot of the residuals from the model relating baselines to
change, against the residuals from the model relating baselines to
outcomes: the two sets of residuals are the same.
Several recent studies report interquartile ranges, rather than standard deviations, for their fitter patients’ baselines and outcomes. Accepting some room for error, we can also estimate σY/σX from those interquartile ranges. In one case (Winters et al., 2015), r(X,Δ) = −0.97 and σY/σX = 0.158, while in another (Veerbeek et al., 2018), σY/σX = 0.438 and r(X,Δ) ≈ −0.88. In both cases, Equation 1 implies that r(X,Δ) would be at least as strong as that reported, regardless of r(X,Y): these reported r(X,Δ) do not tell us how predictable outcomes actually were, given baseline scores.
Many studies in this literature only relate baselines to recovery through multivariable models (Buch et al., 2016; Marchi et al., 2017; Winters et al., 2017); in these studies, we cannot demonstrate confounds directly with Equation 1. Nevertheless, these studies are also probably confounded, because any inflation in one variable’s effect size will inflate the multivariable model’s effect size as well. As discussed in the previous section, empirical studies of recovery after stroke should tend to encourage small σY/σX, whether or not recovery is proportional. Consequently, the null hypothesis will rarely be that r(X,Δ) ≈ 0. For example, in the only multivariable modelling study, which reports IQRs for its fitter-patients’ baselines and outcomes (Stinear et al., 2017), σY/σX ≈ 0.48, which implies that the weakest r(X,Δ) was −0.88, for any positive value of r(X,Y).
Finally, while r(X,Δ) can be misleading if it is extreme relative to r(X,Y), the reverse is also true. One study in this literature, which uses outcomes as the dependent variable rather than recovery (Feng et al., 2015), reports that r(X,Y) ≈ 0.8 and σY/σX = 1.2 in their ‘combined’ group of 76 patients. By Equation 1, r(X,Δ) = −0.05: i.e. recovery was uncorrelated with baseline scores. These authors only reported proportional recovery in a subsample of their patients (but not the information we need to re-examine that claim), but their full sample seems better described by constant recovery (as in Fig. 3B).
Discussion
The proportional recovery rule is striking because it implies that recovery is simple and consistent across patients (non-fitters notwithstanding), and because that implication appears to be justified by strong empirical results (Stinear, 2017). We contend that the empirical support for the rule is weaker than it seems.In summary, our argument is that r(X,Δ) is spurious when stronger than r(X,Y), and that the conditions that encourage spurious r(X,Δ) will be common in empirical studies of recovery after stroke, whether or not recovery is really proportional. Many empirical r(X,Δ) in this literature appear to be spurious in this sense. And in any case, strong r(X,Δ) are insufficient evidence for proportional recovery even if they are not spurious [because r(X,Y) is similarly strong].
The only previous discussion of the risk of spurious r(X,Δ), in analyses of recovery after stroke (Krakauer and Marshall, 2015), concluded that this risk is small provided the tools used to measure post-stroke impairment are reliable: i.e. so long as measurement noise is minimal. Crucially, our analysis applies entirely regardless of measurement noise. We contend that the risk of spurious r(X,Δ) is significant, if there are ceiling effects on the scale used to measure post-stroke impairment, and if most patients improve between baseline and subsequent assessments. These criteria will usually be met in practice, because every practical measurement of post-stroke impairment employs a finite scale, and because non-fitters, who do not make the predicted recovery, are removed prior to calculating r(X,Δ).
We are not suggesting that there is anything wrong with the practice of distinguishing fitters from non-fitters. Indeed, our results prove that this work may be valid regardless of our other concerns. Non-fitters do not recover as predicted; by definition, they contribute the largest, negative residuals to r(X,Δ). Since the residuals for r(X,Y) and r(X,Δ) are identical (Fig. 4 and Supplementary material, proposition 9 in Appendix A), the same patients will be placed in the same subgroups regardless of which correlation is used, and biomarkers that distinguish those subgroups at the acute stage [i.e. avoid the circularity of relying on observed recovery (Stinear, 2017)], will be equally accurate regardless of our other concerns. However, extreme r(X,Δ) for patients classified as fitters, will naturally encourage the assumption that those fitters’ outcomes are largely determined by initial symptom severity. If this assumption is true, therapeutic interventions must be largely ineffective (or at least redundant) for these patients. Our analysis suggests that this assumption is wrong.
Nevertheless, we are not claiming that the proportional recovery rule is wrong. Our analysis suggests that empirical studies to date do not demonstrate that the rule holds, or how well, but we could only confirm that r(X,Δ) was over-optimistic in one study, which reported individual patient data. And while we have also shown that extreme r(X,Δ) and r(X,Y) can result from non-proportional (constant) recovery, this is simply one plausible alternative hypothesis about how patients recover.
Quite how to interpret empirical recovery with confidence in this domain remains an open question: we have articulated a problem here, hoping that recognition of the problem will motivate work to solve it. But we can make some recommendations for future studies in the field.
First, these studies should report r(X,Δ), r(X,Y), and σY/σX, for those patients deemed to recover proportionally. Despite our concerns about r(X,Δ), we do learn something when r(X,Y) is strong, but r(X,Δ) is weak, as in Feng and colleagues’ (2015) results discussed above, which appeared to be better explained by constant recovery than by proportional recovery.
Second, future studies should consider explicitly testing the hypothesis that recovery depends on baseline scores (Oldham, 1962; Hayes, 1988; Tu et al., 2005; Tu and Gilthorpe, 2007; Chiolero et al., 2013). These tests sensibly acknowledge that the null hypothesis is rarely r(X,Δ) ≈ 0 in these analyses. However, they do not address the proper measurement and interpretation of effect sizes, which is our primary concern here; somewhat paradoxically, this means that they may be less useful in larger samples than in smaller samples (Friston, 2012; Lorca-Puls et al., 2018).
Those hypothesis tests will also all be confounded by ceiling effects. We recommend that future studies should measure the impact of such effects, perhaps by reporting the shapes of the distributions of X and Y (greater asymmetry implying more prominent ceiling effects). Future studies should also attempt to minimize ceiling effects. One approach might be to remove patients whose outcomes are at ceiling: though certainly inefficient, this does at least remove the spurious r(X,Δ) in our simulations of constant recovery (see above). However, it may be difficult to determine which patients to remove in practice; the Fugl-Meyer scale, for example, imposes item-level ceiling effects, which could distort σY/σX well below the maximum score. A better, though also more complex alternative, may be to use assessment tools expressly designed to minimize ceiling effects, or to add such tools to those currently in use.
More generally, we may need to replace correlations with alternative methods, which can provide less ambiguous evidence for the proportional recovery rule. One principled alternative might use Bayesian model comparison to adjudicate between different forward or generative models of the data at hand: i.e. using the empirical data to quantify evidence for or against competing hypotheses about the nature of recovery, which may or may not be conserved across patients. We hope that this paper will encourage work to develop such methods, delivering better evidence for (or against) the proportional recovery rule.
No comments:
Post a Comment