Friday, August 03, 2012

Correlation, Causation and Personality

A new paper from my collaborator James Lee. Ungated copy here, including commentary from other researchers including Judea Pearl.
Correlation and Causation in the Study of Personality 
Abstract: Personality psychology aims to explain the causes and the consequences of variation in behavioural traits. Because of the observational nature of the pertinent data, this endeavour has provoked many controversies. In recent years, the computer scientist Judea Pearl has used a graphical approach to extend the innovations in causal inference developed by Ronald Fisher and Sewall Wright. Besides shedding much light on the philosophical notion of causality itself, this graphical framework now contains many powerful concepts of relevance to the controversies just mentioned. In this article, some of these concepts are applied to areas of personality research where questions of causation arise, including the analysis of observational data and the genetic sources of individual differences.
From the conclusions:
... This article is in part an effort to unify the contributions of three innovators in causal reasoning: Ronald Fisher, Sewall Wright, and Judea Pearl 
Fisher began his career at a time when the distinction between correlation and causation was poorly understood and indeed scorned by leading intellectuals. Nevertheless, he persisted in valuing this distinction. This led to his insight that randomization of the putative cause—whether by the deliberate introduction of ‘error’, as his biologist colleagues thought of it, or ‘beautifully . . . by the meiotic process’—in fact reveals more than it obscures. His subsequent introduction of the average excess and average effect is perhaps the first explicit use of the distinction between correlation and causation in any formal scientific theory. 
Structural equation modelers will know Wright—Fisher’s great rival in population genetics—as the ingenious inventor of path analysis. Wright’s diagrammatic approach to cause and effect serves as a conceptual bridge toward Pearl’s graphical formalization, which has greatly extended the innovations developed by both of the population-genetic pioneers. 
The fruitfulness of Pearl’s graphical framework when applied to the problems discussed in this article bear out its utility to personality psychology. Perhaps the most surprising instance of the theory’s fruitfulness concerns the role of colliders. Although obscure before Pearl’s seminal work, this role turns out to be obvious in retrospect and a great aid to the understanding of covariate choice, assortative mating, selection bias, and a myriad of other seemingly unrelated problems. This article has surely only scratched the surface of the ramifications following from our recognition of colliders.  
Conspicuous from these accolades by his absence is Charles Spearman—the inventor of factor analysis and thereby a founder of personality psychology. Spearman (1927) did conceive of his g factor as a hidden causal force. However, new and brilliant ideas are often only partially understood, even by their authors. After a century of theoret- ical scrutiny and empirical applications, common factors appear to be more plausibly defended as mild formalizations of folk-psychological terms than as causal forces uncovered by matrix algebra. I have thus advocated a sharp distinction between the measurement of personality traits (factor analysis) and the study of their causal relations (graphical SEM).  
... The puzzle is that by using common factors in our causal explanations, we seem to be retreating from this reductionistic approach. A single node called g sending an arrow to a single node called liberalism is surely an approximation to the true and extraordinarily more complicated graph entangling the various physical mechanisms that underlie mental characteristics. Why this compromise? Is it sensible to test models of ethereal emergent properties shoving and being shoved by corporeal bits of matter—or, perhaps even worse, by other emergent properties? If we are committing to a calculus of causation, should we not also discard the convenient fictions of folk psychology? 
The answer to this puzzle may be that reductionistic decomposition is not always the royal road to scientific understanding. ... [[In physics we refer to "effective descriptions" or "effective degrees of freedom" appropriate to a particular scale of dynamics or organization -- no need to invoke quarks to explain the mechanics of a baseball.]]
From the author's response to commentary:
... It was the genius of Darwin to realize the power of explanation (4): phenotypes and environments cohere in such an uncanny way because nature is a statistician who has allowed only a subset of the logically possible combinations to persist over time. 
Although phenotypes are what nature selects, it cannot be phenotypes alone that preserve the record of natural selection. Phenotypes typically lack the property that variations in them are replicated with high fidelity across an indefinite number of generations. DNA, however, does have this property— hence the memorable phrase “the immortal replicator” (Dawkins, 1976). If DNA is furthermore causally efficacious, such that the possession of one variant rather than another has phenotypic consequences that are reasonably robust, then we have the potential for natural selection to bring about a lasting correlation between environmental demands and the causes of adaptation to those very same demands. 
When statistically controlling fitness, nature does not actually use the [causal] average effect of any allele. If an allele has a positive average excess in [is correlated with] fitness, for any reason whatsoever, it will tend to displace its alternatives. Nevertheless, it seems to be the case that nature correctly picks out alleles for their effects often enough; the results are evident in the living world all around us. Davey Smith and I are confident that where nature has succeeded, patient and ingenious human scientists will be able to follow.
For more on Judea Pearl's work, see the earlier post: Beyond Bayes: causality vs correlation.


  1. This paper is really great one to read, think about, and discuss with colleagues! Alex Weiss and I were lucky enough to write a little commentary. I think that if people take this new philosophy about knowledge onboard (for that is what is being presented), there will be a sea change in how we think about knowledge.

    That's a rather large claim, but I notice that many many thoughtful people, struggling to defend a view only to become stumped by the retort that "You know that we can't prove anything for sure, don't you?", leading them to "feel" that one cannot reasonably defend one conclusion as objectively better than another.

    Two mind-altering findings presented in this paper are, firstly, that a correlation is an "unresolved causal nexus": Crudely, If there's a correlation, there's a cause nearby. The second is that causal models can be compared: even though we will never know if we are at the truth, we can make choices that take us in the right direction.
    So... now the reply to people who say "You know you can never be sure you're right, and that makes comparisons odious?" is "But we know we can reliably determine which of any two ideas is closer to the truth. So let's get back to the argument: you claim..." The post-modern playing field where all concepts are merely constructions is supplanted with an objective playing field, one with room for both humility and the search for truth.
    I thoroughly enjoyed Professor Lee's paper! Most fun of all: to think about how you could know you were going in the right direction, but not know the destination - a recipe for infinite pleasure :-)

  2. It is very weird for a physicist to learn that there are whole fields in which people had given up the idea of causation and retreated wholly to correlation. We work in exactly the opposite limit!

    I think Pearl's contribution is conceptual clarity -- adopting his language makes it easier to communicate about causal models. This is necessary in the complex contexts that arise in social science, but not so necessary in physical science. However, I would be hard pressed to justify any of this as a real breakthrough in epistemology or the scientific method.

    Despite what some people (overly optimistic Bayesians, AI enthusiasts, etc.) might think, the scientific method (i.e., discerning the underlying causal rules in Nature) has yet to be fully systematized and probably cannot really ever be fully mechanized. (How does one systematize selection of priors?) Even "rationality" has never been satisfactorily defined.

    See link for related discussion. I highly recommend Eric Baum's What is Thought?

  3. If I can ever get around to learning how to create commutativity diagrams, I'll send you a little of my craziness. In the mean time, I read this as a tutorial for a statistical-modeling-layperson. Is that what you would say it is? The use of DAGs is a hugely simplifying feature. You suppose from the get go a set of causal relations, then look to the data to validate them. If, on the other hand, you start with a graph on n-vertices [i.e. you start with a set of variables, and assume that there is a correlated relationship between them], and look to the data to exclude directed-edges, how well can you identify your causal edges?

  4. Fred__R9:36 AM

    "causal models can be compared: even though we will never know if we are at the truth, we can make choices that take us in the right direction."

    Although I'm too ignorant to really understand this discussion, this sounds exactly like what I took post-modernists like Richard Rorty to mean.

  5. Fred__R9:38 AM

    Although I suppose Rorty would say that this only holds true given a certain field of choices. If somebody develops a whole new constellation of alternative explanations which are even better, you might find that what was the right direction before turns out to have been the wrong direction. Maybe that's the difference?