Monday, July 20, 2015

What is medicine’s 5 sigma?

Editorial in the Lancet, reflecting on the Symposium on the Reproducibility and Reliability of Biomedical Research held April 2015 by the Wellcome Trust.
What is medicine’s 5 sigma?

... much of the [BIOMEDICAL] scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, [BIOMEDICAL] science has taken a turn towards darkness. As one participant put it, “poor methods get results”. The Academy of Medical Sciences, Medical Research Council, and Biotechnology and Biological Sciences Research Council have now put their reputational weight behind an investigation into these questionable research practices. The apparent endemicity of bad research behaviour is alarming. In their quest for telling a compelling story, scientists too often sculpt data to fit their preferred theory of the world. ...

One of the most convincing proposals came from outside the biomedical community. Tony Weidberg is a Professor of Particle Physics at Oxford. ... the particle physics community ... invests great effort into intensive checking and rechecking of data prior to publication. By filtering results through independent working groups, physicists are encouraged to criticise. Good criticism is rewarded. The goal is a reliable result, and the incentives for scientists are aligned around this goal. Weidberg worried we set the bar for results in biomedicine far too low. In particle physics, significance is set at 5 sigma—a p value of 3 × 10–7 or 1 in 3·5 million (if the result is not true, this is the probability that the data would have been as extreme as they are). The conclusion of the symposium was that something must be done ...
I once invited a famous evolutionary theorist (MacArthur Fellow) at Oregon to give a talk in my institute, to an audience of physicists, theoretical chemists, mathematicians and computer scientists. The Q&A was, from my perspective, friendly and lively. A physicist of Hungarian extraction politely asked the visitor whether his models could ever be falsified, given the available field (ecological) data. I was shocked that he seemed shocked to be asked such a question. Later I sent an email thanking the speaker for his visit and suggesting he come again some day. He replied that he had never been subjected to such aggressive and painful attack and that he would never come back. Which community of scientists is more likely to produce replicable results?

See also Medical Science? and Is Science Self-Correcting?

To answer the question posed in the title of the post / editorial, an example of a statistical threshold which is sufficient for high confidence of replication is the p < 0.5 x 10^{-8} significance requirement in GWAS. This is basically the traditional p < 0.05 threshold corrected for multiple testing of 10^6 SNPs. Early "candidate gene" studies which did not impose this correction have very low replication rates. See comment below for what this implies about the validity of priors based on biological intuition.

I discuss this a bit with John Ioannidis in the video below.


Blog Archive

Labels