Sunday, May 03, 2015

Replication is hard; understanding what that means is even harder

Bad news for psychology -- only 39 of 100 published findings were replicated in a recent coordinated effort.
Nature | News: An ambitious effort to replicate 100 research findings in psychology ended last week — and the data look worrying. Results posted online on 24 April, which have not yet been peer-reviewed, suggest that key findings from only 39 of the published studies could be reproduced. ...
The article goes on:
But the situation is more nuanced than the top-line numbers suggest (See graphic, 'Reliability test'). Of the 61 non-replicated studies, scientists classed 24 as producing findings at least “moderately similar” to those of the original experiments, even though they did not meet pre-established criteria, such as statistical significance, that would count as a successful replication.  [ Yeah, right. ]
This makes me suspect bounded cognition -- humans trusting their post hoc stories and intuition instead of statistical criteria chosen before planned replication attempts.

The most tragic thing about Ioannidis's work on low replication rates and wasted research funding is that while medical researchers might pay lip service to his results (which are highly cited), they typically have not actually grasped the implications for their own work. In particular, they typically have not updated their posteriors to reflect the low reliability of research results, even in the top journals.


MUltan said...

" they typically have not actually grasped the implications for their own work"

Most of them don't know how to use Bayes theorem even in the simplest case, such as the sensitivity and specificity of a test. Most of them often confuse significance with effect size. Would you please spell out the implications in a post?

Mark Woodley said...

I would recommend reading the article by David Colquhoun that he linked in the comments of the news article:
It's rough contention is that total lack of understanding of what p values mean gives a false discovery rate of 30% in the best case scenario.
As for the claim that 24 of the studies were almost significant are they claiming that even knowing effect size and p value from the first study these new studies were underpowered? If so what was the point if not why start talking about almost replicating.

RealityIsComplicated said...

Always a good read:

outside_observer said...

What does it even mean to say that the non-replicating experiment had "similar results" to the original? Could it be something as simple as that the overall "effect" is in the same direction as in the original? For many designs, wouldn't that happen half the time anyway?

steve hsu said...


I don't know for sure -- let's see what the Science paper says when it comes out.

Blog Archive