More on the "Truth Wearing Off" and my advice to epidemiologists of all ages
Andrew Gelman, a Professor of Statistics at Columbia, has a new post discussing the New Yorker article I mentioned last week. I highly recommend that you look at the the article that he wrote in American Scientist discussing the statistical challenges in estimating small effects.
Thus, when small 'underpowered' studies actually find an effect, it has to be a very large effect to reach statistical significance. So, small studies report overestimates of the effect.
One thing we know about before-after, quasi-experimental studies, which are commonly used in assessing infection prevention interventions, is that they are underpowered and over-estimate the effect compared to randomized trials. Power is derived from sample size, effect size AND study design, among other things.
QE studies in our field also suffer from publication bias since many have been completed by clinicians who won't go through the trouble of reporting negative studies. How many papers have you read in ICHE/AJIC/CID that mentioned ADI for MRSA not working? Even if ADI for MRSA is the greatest control measure ever, which it might be, given a normal distribution of benefit, you would expect some studies to be negative, would you not? Where are they?
Even, when negative studies do appear (e.g. Harbarth JAMA 2008 or Charlie Huskins hopefully soon to be published STAR-ICU trial) they are often not believed or even thought to be flawed! Why? Nothing works 100% of the time and a negative study is NOT an erroneous result. A negative study is certainly not prima facie evidence of a flawed study. You must use all of the data, assess it based on quality and power and look for publication bias. (this is my advice to epidemiologists of all ages)
So, are we over-estimating the benefits of ADI and other interventions used in infection prevention?
Link: Gelman's post
Gelman and Weakliem American Scientist, 2009 (PDF)
My favorite passage: "Statistical power refers to the probability that a study will find a statistically significant effect if one is actually present. For a given true effect size, studies with larger samples have more power. As we have discussed here, “underpowered” studies are unlikely to reach statistical significance and, perhaps more importantly, they drastically overestimate effect size estimates. Simply put, the noise is stronger than the signal."
Thus, when small 'underpowered' studies actually find an effect, it has to be a very large effect to reach statistical significance. So, small studies report overestimates of the effect.
One thing we know about before-after, quasi-experimental studies, which are commonly used in assessing infection prevention interventions, is that they are underpowered and over-estimate the effect compared to randomized trials. Power is derived from sample size, effect size AND study design, among other things.
QE studies in our field also suffer from publication bias since many have been completed by clinicians who won't go through the trouble of reporting negative studies. How many papers have you read in ICHE/AJIC/CID that mentioned ADI for MRSA not working? Even if ADI for MRSA is the greatest control measure ever, which it might be, given a normal distribution of benefit, you would expect some studies to be negative, would you not? Where are they?
Even, when negative studies do appear (e.g. Harbarth JAMA 2008 or Charlie Huskins hopefully soon to be published STAR-ICU trial) they are often not believed or even thought to be flawed! Why? Nothing works 100% of the time and a negative study is NOT an erroneous result. A negative study is certainly not prima facie evidence of a flawed study. You must use all of the data, assess it based on quality and power and look for publication bias. (this is my advice to epidemiologists of all ages)
So, are we over-estimating the benefits of ADI and other interventions used in infection prevention?
Link: Gelman's post
Gelman and Weakliem American Scientist, 2009 (PDF)
Yes, we are.
ReplyDeleteRegression to the mean also plays a huge role...think about what factors must be at work for a hospital to invest in any complex, resource-intensive intervention. Such interventions are almost always performed in response to a big problem (i.e. an outbreak), so the baseline or pre-intervention observations are extreme observations.
All hospital outbreaks eventually run their course, and whatever bundle of interventions we decide to implement in the heat of moment will end up being credited for that. And naturally we are eager to publish the results. It's what Bob Weinstein calls "riding the epidemic curve to glory".