Thursday, September 22, 2016

Annals of Reproducibility in Science: Social Psychology and Candidate Gene Studies

Andrew Gelman offers a historical timeline for the reproducibility crisis in Social Psychology, along with some juicy insight into the one funeral at a time manner in which academic science often advances.
OK, that was a pretty detailed timeline. But here’s the point. Almost nothing was happening for a long time, and even after the first revelations and theoretical articles you could still ignore the crisis if you were focused on your research and other responsibilities. ...

Then, all of a sudden, the world turned upside down.

If you’d been deeply invested in the old system, it must be pretty upsetting to think about change. Fiske is in the position of someone who owns stock in a failing enterprise, so no wonder she wants to talk it up. The analogy’s not perfect, though, because there’s no one for her to sell her shares to. What Fiske should really do is cut her losses, admit that she and her colleagues were making a lot of mistakes, and move on. She’s got tenure and she’s got the keys to PPNAS, so she could do it. Short term, though, I guess it’s a lot more comfortable for her to rant about replication terrorists and all that.

... Why do I go into all this detail? Is it simply mudslinging? Fiske attacks science reformers, so science reformers slam Fiske? No, that’s not the point. The issue is not Fiske’s data processing errors or her poor judgment as journal editor; rather, what’s relevant here is that she’s working within a dead paradigm. A paradigm that should’ve been dead back in the 1960s when Meehl was writing on all this, but which in the wake of Simonsohn, Button et al., Nosek et al., is certainly dead today. It’s the paradigm of the open-ended theory, of publication in top journals and promotion in the popular and business press, based on “p less than .05” results obtained using abundant researcher degrees of freedom. It’s the paradigm of the theory that in the words of sociologist Jeremy Freese, is “more vampirical than empirical—unable to be killed by mere data.”

... In her article that was my excuse to write this long post, Fiske expresses concerns for the careers of her friends, careers that may have been damaged by public airing of their research mistakes. Just remember that, for each of these people, there may well be three other young researchers who were doing careful, serious work but then didn’t get picked for a plum job or promotion because it was too hard to compete with other candidates who did sloppy but flashy work that got published in Psych Science or PPNAS. It goes both ways. ...
An old timer who has seen it all before comments.
ex-social psychologist says:
September 21, 2016 at 5:36 pm

Former professor of social psychology here, now happily retired after an early buyout offer. If not so painful, it would almost be funny at how history repeats itself: This is not the first time there has been a “crisis” in social psychology. In the late 1960s and early 1970s there was much hand-wringing over failures of replication and the “fun and games” mentality among researchers; see, for example, Gergen’s 1973 article “Social psychology as history” in JPSP, 26, 309-320, and Ring’s (1967) JESP article, “Experimental social psychology: Some sober questions about some frivolous values.” It doesn’t appear that the field ever truly resolved those issues back when they were first raised–instead, we basically shrugged, said “oh well,” and went about with publishing by any means necessary.

I’m glad to see the renewed scrutiny facing the field. And I agree with those who note that social psychology is not the only field confronting issues of replicability, p-hacking, and outright fraud. These problems don’t have easy solutions, but it seems blindingly obvious that transparency and open communication about the weaknesses in the field–and individual studies–is a necessary first step. Fiske’s strategy of circling the wagons and adhering to a business-as-usual model is both sad and alarming.

I took early retirement for a number of reasons, but my growing disillusionment with my chosen field was certainly a primary one.
Geoffrey Miller also contributes
Geoffrey Miller says:
September 21, 2016 at 8:43 pm

There’s also a political/ideological dimension to social psychology’s methodological problems.

For decades, social psych advocated a particular kind of progressive, liberal, blank-slate ideology. Any new results that seemed to support this ideology were published eagerly and celebrated publicly, regardless of their empirical merit. Any results that challenged it (e.g. by showing the stability or heritability of individual differences in intelligence or personality) were rejected as ‘genetic determinism’, ‘biological reductionism’, or ‘reactionary sociobiology’.

For decades, social psychologists were trained, hired, promoted, and tenured based on two main criteria: (1) flashy, counter-intuitive results published in certain key journals whose editors and reviewers had a poor understanding of statistical pitfalls, (2) adherence to the politically correct ideology that favored certain kinds of results consistent with a blank-slate, situationist theory of human nature, and derogation of any alternative models of human nature (see Steven Pinker’s book ‘The blank slate’).

Meanwhile, less glamorous areas of psychology such as personality, evolutionary, and developmental psychology, intelligence research, and behavior genetics were trundling along making solid cumulative progress, often with hugely greater statistical power and replicability (e.g. many current behavior genetics studies involve tens of thousands of twin pairs across several countries). But do a search for academic positions in the APS job ads for these areas, and you’ll see that they’re not a viable career path, because most psych departments still favor the kind of vivid but unreplicable results found in social psych and cognitive neuroscience.

So, we’re in a situation where the ideologically-driven, methodologically irresponsible field of social psychology has collapsed like a house of cards … but nobody’s changed their hiring, promotion, or tenure priorities in response. It’s still fairly easy to make a good living doing bad social psychology. It’s still very hard to make a living doing good personality, intelligence, behavior genetic, or evolutionary psychology research.

In the title of this post I mention Candidate Gene Studies. Forget, for the moment, about goofy Social Psychology experiments conducted on undergraduates. Much more money was wasted in the early 21st century on under-powered genomics studies that looked for gene-trait associations using small samples. Researchers, overconfident in their vaunted biological or biochemical intuition, performed studies using p < 0.05 thresholds that produced (ultimately false) associations between candidate genes and a variety of traits. According to Ioannidis, almost none of these results replicate (more). When I first became aware of GWAS almost a decade ago, the field was in disarray, with some journals still publishing results at the p < 0.05 threshold, whereas others having adopted the corrected p < 5E-08 = 0.05 x 1E-06 "genome wide significance" threshold (based on multiple testing correction for 1E06 SNPs)! The latter results routinely replicate, as expected.

Clearly, many researchers fundamentally misunderstood basic statistics, or at least were grossly overconfident in their priors for no good reason. But as of today, genomics has corrected its practices and although no one wants to dwell on the 5+ years worth of non-replicable published results, science is at least moving forward. I hope Social Psychology and other problematic areas (such as in biomedical research) can self-correct their practices as genomics has.

See also One funeral at a time?


Bonus Feature!
Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature

Denes Szucs, John PA Ioannidis
doi: http://dx.doi.org/10.1101/071530

We have empirically assessed the distribution of published effect sizes and estimated power by extracting more than 100,000 statistical records from about 10,000 cognitive neuroscience and psychology papers published during the past 5 years. The reported median effect size was d=0.93 (inter-quartile range: 0.64-1.46) for nominally statistically significant results and d=0.24 (0.11-0.42) for non-significant results. Median power to detect small, medium and large effects was 0.12, 0.44 and 0.73, reflecting no improvement through the past half-century. Power was lowest for cognitive neuroscience journals. 14% of papers reported some statistically significant results, although the respective F statistic and degrees of freedom proved that these were non-significant; p value errors positively correlated with journal impact factors. False report probability is likely to exceed 50% for the whole literature. In light of our findings the recently reported low replication success in psychology is realistic and worse performance may be expected for cognitive neuroscience.
From the paper. FRP = False Report Probability = the probability that the null hypothesis is true when we get a statistically significant finding.
... In all, the combination of low power, selective reporting and other biases and errors that we have documented in this large sample of papers in cognitive neuroscience and psychology suggest that high FRP are to be expected in these fields. The low reproducibility rate seen for psychology experimental studies in the recent Open Science Collaboration (Nosek et al. 2015a) is congruent with the picture that emerges from our data. Our data also suggest that cognitive neuroscience may have even higher FRP rates, and this hypothesis is worth evaluating with focused reproducibility checks of published studies. Regardless, efforts to increase sample size, and reduce publication and other biases and errors are likely to be beneficial for the credibility of this important literature.

No comments:

Blog Archive

Labels