Sunday, February 26, 2017

Perverse Incentives and Replication in Science

Here's a depressing but all too common pattern in scientific research:
1. Study reports results which reinforce the dominant, politically correct, narrative.

2. Study is widely cited in other academic work, lionized in the popular press, and used to advance real world agendas.

3. Study fails to replicate, but no one (except a few careful and independent thinkers) notices.
For numerous examples, see, e.g., any of Malcolm Gladwell's books :-(

A recent example: the idea that collective intelligence of groups (i.e., ability to solve problems and accomplish assigned tasks) is not primarily dependent on the cognitive ability of individuals in the group.

It seems plausible to me that by adopting certain best practices for collaboration one can improve group performance, and that diversity of knowledge base and personal experience could also enhance performance on certain tasks. But recent results in this direction were probably oversold, and seem to have failed to replicate.

James Thompson has given a good summary of the situation.

Parts 1 and 2 of our story:
MIT Center for Collective Intelligence: ... group-IQ, or “collective intelligence” is not strongly correlated with the average or maximum individual intelligence of group members but is correlated with the average social sensitivity of group members, the equality in distribution of conversational turn-taking, and the proportion of females in the group.
Is it true? The original paper on this topic, from 2010, has been cited 700+ times. See here for some coverage on this blog when it originally appeared.

Below is the (only independent?) attempt at replication, with strongly negative results. The first author is a regular (and very insightful) commenter here -- I hope he'll add his perspective to the discussion. Have we reached part 3 of the story?
Smart groups of smart people: Evidence for IQ as the origin of collective intelligence in the performance of human groups

Timothy C. Bates a,b,⁎, Shivani Gupta a
a Department of Psychology, University of Edinburgh
b Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh

What allows groups to behave intelligently? One suggestion is that groups exhibit a collective intelligence accounted for by number of women in the group, turn-taking and emotional empathizing, with group-IQ being only weakly-linked to individual IQ (Woolley, Chabris, Pentland, Hashmi, & Malone, 2010). Here we report tests of this model across three studies with 312 people. Contrary to prediction, individual IQ accounted for around 80% of group-IQ differences. Hypotheses that group-IQ increases with number of women in the group and with turn-taking were not supported. Reading the mind in the eyes (RME) performance was associated with individual IQ, and, in one study, with group-IQ factor scores. However, a well-fitting structural model combining data from studies 2 and 3 indicated that RME exerted no influence on the group-IQ latent factor (instead having a modest impact on a single group test). The experiments instead showed that higher individual IQ enhances group performance such that individual IQ determined 100% of latent group-IQ. Implications for future work on group-based achievement are examined.

From the paper:
Given the ubiquitous importance of group activities (Simon, 1997) these results have wide implications. Rather than hiring individuals with high cognitive skill who command higher salaries (Ritchie & Bates, 2013), organizations might select-for or teach social sensitivity thus raising collective intelligence, or even operate a female gender bias with the expectation of substantial performance gains. While the study has over 700 citations and was widely reported to the public (Woolley, Malone, & Chabris, 2015), to our knowledge only one replication has been reported (Engel, Woolley, Jing, Chabris, & Malone, 2014). This study used online (rather than in-person) tasks and did not include individual IQ. We therefore conducted three replication studies, reported below.

... Rather than a small link of individual IQ to group-IQ, we found that the overlap of these two traits was indistinguishable from 100%. Smart groups are (simply) groups of smart people. ... Across the three studies we saw no significant support for the hypothesized effects of women raising (or men lowering) group-IQ: All male, all female and mixed-sex groups performed equally well. Nor did we see any relationship of some members speaking more than others on either higher or lower group-IQ. These findings were weak in the initial reports, failing to survive incorporation of covariates. We attribute these to false positives. ... The present findings cast important doubt on any policy-style conclusions regarding gender composition changes cast as raising cognitive-efficiency. ...

In conclusion, across three studies groups exhibited a robust cognitive g-factor across diverse tasks. As in individuals, this g-factor accounted for approximately 50% of variance in cognition (Spearman, 1904). In structural tests, this group-IQ factor was indistinguishable from average individual IQ, and social sensitivity exerted no effects via latent group-IQ. Considering the present findings, work directed at developing group-IQ tests to predict team effectiveness would be redundant given the extremely high utility, reliability, validity for this task shown by individual IQ tests. Work seeking to raise group-IQ, like re- search to raise individual IQ might find this task achievable at a task- specific level (Ritchie et al., 2013; Ritchie, Bates, & Plomin, 2015), but less amenable to general change than some have anticipated. Our attempt to manipulate scores suggested that such interventions may even decrease group performance. Instead, work understanding the developmental conditions which maximize expression of individual IQ (Bates et al., 2013) as well as on personality and cultural traits supporting cooperation and cumulation in groups should remain a priority if we are to understand and develop cognitive ability. The present experiments thus provide new evidence for a central, positive role of individual IQ in enhanced group-IQ.
Meta-Observation: Given the 1-2-3 pattern described above, one should be highly skeptical of results in many areas of social science and even biomedical science (see link below). Serious researchers (i.e., those who actually aspire to participate in Science) in fields with low replication rates should (as a demonstration of collective intelligence!) do everything possible to improve the situation. Replication should be considered an important research activity, and should be taken seriously.

Most researchers I know in the relevant areas have not yet grasped that there is a serious problem. They might admit that "some studies fail to replicate" but don't realize the fraction might be in the 50 percent range!

More on the replication crisis in certain fields of science.

No comments:

Blog Archive