Text

Physicist, Startup Founder, Blogger, Dad

Saturday, July 10, 2010

Beyond Bayes: causality vs correlation

A draft paper by Harvard graduate student James Lee (student of Steve Pinker; I'd love to post the paper here but don't know yet if that's OK) got me interested in the work of statistical learning pioneer Judea Pearl. I found the essay Bayesianism and Causality, or, why I am only a half-Bayesian (excerpted below) a concise, and provocative, introduction to his ideas.

Pearl is correct to say that humans think in terms of causal models, rather than in terms of correlation. Our brains favor simple, linear narratives. The effectiveness of physics is a consequence of the fact that descriptions of natural phenomena are compressible into simple causal models. (Or, perhaps it just looks that way to us ;-)

Judea Pearl: I turned Bayesian in 1971, as soon as I began reading Savage’s monograph The Foundations of Statistical Inference [Savage, 1962]. The arguments were unassailable: (i) It is plain silly to ignore what we know, (ii) It is natural and useful to cast what we know in the language of probabilities, and (iii) If our subjective probabilities are erroneous, their impact will get washed out in due time, as the number of observations increases.

Thirty years later, I am still a devout Bayesian in the sense of (i), but I now doubt the wisdom of (ii) and I know that, in general, (iii) is false. Like most Bayesians, I believe that the knowledge we carry in our skulls, be its origin experience, schooling or hearsay, is an invaluable resource in all human activity, and that combining this knowledge with empirical data is the key to scientific enquiry and intelligent behavior. Thus, in this broad sense, I am a still Bayesian. However, in order to be combined with data, our knowledge must first be cast in some formal language, and what I have come to realize in the past ten years is that the language of probability is not suitable for the task; the bulk of human knowledge is organized around causal, not probabilistic relationships, and the grammar of probability calculus is insufficient for capturing those relationships. Specifically, the building blocks of our scientific and everyday knowledge are elementary facts such as “mud does not cause rain” and “symptoms do not cause disease” and those facts, strangely enough, cannot be expressed in the vocabulary of probability calculus. It is for this reason that I consider myself only a half-Bayesian. ...

10 comments:

Hye said...

definitely provocative -- Even the example you gave above has pretty big holes in it. I think a lot of issues stem back to semantics and how prior knowledge is cast. When he claims that "mud does not cause rain" is a "fact", it assumes definitions for "fact" and "cause" that may preclude expressing that statement using standard probability. MY interpretation would be (i) we have not observed a mechanism for mud to "cause" rain and (ii) we have had a lot of possible observations where mud could "cause" rain. Therefore, the prior probability that "mud causes rain" is quite low (how low depends on your belief system).

There is a claim here that the language of probability is not suitable. I would argue that translating human "knowledge" into probability is the difficult part. It is not an issue with mathematical formalism, it is an issue with the way our brains are wired (i.e. they do not naturally use formal logic).

Ben Espen said...

Steve, your first link to Pearl's paper is missing an "h" at the beginning.

silkop said...

I find Pearl's claims about the "lack of expressiveness" of probability distributions for "causal" knowledge unfortunate. Of course you can put up a probability distribution that shows that "symptoms do not cause disease". It's just that it doesn't suffice to examine "the population with symptoms" and "the population with disease" together, rather you must examine "the population made diseased" and "the population not made diseased".

I wish Pearl would show mercy to his readers and was more straightforward about what he is really trying to do with his approach, which seems to be inventing a new formal notation that could be used for a (toy?) "causal algebra" that allows deriving causal statements from other such. The same *can* be done with "classic" probability distributions, depending on what meanings you attach to propositions. Pearl's examples of where the "simple probabilistic reasoning" breaks down and a new "revolutionary" notation is supposedly required appear to be based on misunderstanding what the propositions stand for. That said, I don't deny that his notation might be more convenient once you get used to it. It's just much less a matter of "fundamental philosophy" or a "breakthrough" than he makes it appear to be. I view Pearl as I would view the inventor of a new programming language, not as I would view Turing.

By the way, the straightforward way to express causality seems to me simpler than it might appear to an old philosopher-statistician (such as Pearl). Causality is really a bunch of if-then formulas aka algorithms stored in our heads, constructed based on actions followed by observations. "Causality" is thus a fairly subjective and practical concept and mathematics/physics are doing great without it. It's common for us to discover new "causal factors" (refine models). At other times we consciously drop out known causal factors (simplify models), which is where we are left with the "hand-waving" causality upon which all daily reasoning and engineering depends. There are two basic ways to explore causality: deduce from known causal connections some new "interesting" ones (e.g. run a simulation) or employ controlled experimentation (including the possibility of "nature's" own controlled experiments) to reconstruct the finer "algorithm" from observed data after fiddling with the parts that you are curious about.

silkop said...

Hye, I very much agree with the last part. Probability theory is really just glorified counting. The hard part is to know what (not) to count. Anyway, nothing forbids us to focus on counting "events of type Y which occurred after we did X", and thus getting into probabilistic statements about "causes".

My impression is that Pearl has fallen in love with his ideas so deep that he constructs a reality around them (in particular, making bold ontologic claims about what supposedly "can" and "cannot" be done). Or perhaps he is just sly and putting a shiny wrapper on what would otherwise be a rather boring theoretical framework (I doubt it). Either way, I think he's lost direction, and after 10 more years it might be time to admit it.

Why doesn't the word "simulation" occur a single time in his paper? Why *must* a useful approach for dealing with causality be expressed as a formal algebra and not, more trivially, as a set of programs within a computer? While Pearl sticks to his "cognitively formidable" concepts of the paper-and-pen variety, the rest of the world moves on, computing, tweaking and experimenting whenever it helps to figure out "what happens when" (i.e. causality).

glory said...

http://cscs.umich.edu/~crshalizi/weblog/664.html - "Praxis and Ideology in Bayesian Data Analysis"

cosma shalizi going off on bayesians :P

cheers!

CW said...

It seems to me that anyone who examines everyday causal reasoning ought to spend a significant effort considering its relationship(s) to careful analysis using the laws of physics. It is rather ironic that from a physicist's point of view, everyday causal reasoning is not fundamental, rigorous, or even particularly clear and coherent in many (most?) cases. I should elaborate on this a bit, but I need to get some sleep tonight.

Eric Foss said...

Remember Daniel Pearl - the reporter who got his head cut off after being kidnapped in Pakistan? That's Judea Pearl's son. If you look at the dedication in "Causality", it says something like "for my wonderful Danny".

steve hsu said...

I think what Pearl did is nice, but closer to inventing a notation than to solving any fundamental problem.

I notice no one has commented on his third assertion, which relates to whether to Bayesians must agree in the limit of infinite shared data. (Aumann agreement theorem, intersubjective agreement, yada yada.)

Yan Zhang said...

I have two issues:

1) I've always felt thinking in quantum mechanics at least overlaps with thinking in a Bayesian sense, or at least the weaker mode of thinking in the language of probabilities (uh say... wave functions?), and the "cleaner" causal model is basically what happens when lots of these probabilities get sent close to 0 or 1 (I know this is an oversimplification, but bear with me). So I feel what you say about physics is more in line with say, classical mechanics, then the more modern paradigms? My argument fails, of course, if it is the case that physicists still mostly think in a "classical mechanics way" even though they're working on models more complex than that, and I haven't talked to enough physicists to have a good idea of what goes on in their heads.

2) I feel that our brains *do* work in terms of correlation and context (at least on the fundamental building block level) instead of direct "A causes B" type reasoning -- my toy example would be the way we form synaptic links around concepts when new information reinforces them, which seems to me like adding weight to the probability of firing adjacent neurons, assuming that when we "think" we're kind of doing a random walk on the neurons. To me, this seems like it isn't that our *brains* favor simple, linear narratives, rather when we try to explain things (including the situations when we explain to ourselves) we favor these narratives. Alternatively, I think that we "think" in two ways, one at a very fundamental level under the box and one at a higher level after we convert to some sort of language (this includes teaching others in a spoken language or ourselves in an internal dialogue). I think the latter way of thinking suits the "simple, linear narratives" you're talking about, but the former, when we break it down to fundamental building blocks that we still don't completely understand, may very well act in terms of correlation and probabilities. Of course, this may be exactly what you meant when you said "our brains favor..."

Neither of these are not necessarily mutually exclusive with your stances, it really depends on your wording.

-Yan

Yan Zhang said...

I have two issues:

1) I've always felt thinking in quantum mechanics at least overlaps with thinking in a Bayesian sense, or at least the weaker mode of thinking in the language of probabilities (uh say... wave functions?), and the "cleaner" causal model is basically what happens when lots of these probabilities get sent close to 0 or 1 (I know this is an oversimplification, but bear with me). So I feel what you say about physics is more in line with say, classical mechanics, then the more modern paradigms? My argument fails, of course, if it is the case that physicists still mostly think in a "classical mechanics way" even though they're working on models more complex than that, and I haven't talked to enough physicists to have a good idea of what goes on in their heads.

2) I feel that our brains *do* work in terms of correlation and context (at least on the fundamental building block level) instead of direct "A causes B" type reasoning -- my toy example would be the way we form synaptic links around concepts when new information reinforces them, which seems to me like adding weight to the probability of firing adjacent neurons, assuming that when we "think" we're kind of doing a random walk on the neurons. To me, this seems like it isn't that our *brains* favor simple, linear narratives, rather when we try to explain things (including the situations when we explain to ourselves) we favor these narratives. Alternatively, I think that we "think" in two ways, one at a very fundamental level under the box and one at a higher level after we convert to some sort of language (this includes teaching others in a spoken language or ourselves in an internal dialogue). I think the latter way of thinking suits the "simple, linear narratives" you're talking about, but the former, when we break it down to fundamental building blocks that we still don't completely understand, may very well act in terms of correlation and probabilities. Of course, this may be exactly what you meant when you said "our brains favor..."

Neither of these are not necessarily mutually exclusive with your stances, it really depends on your wording.

-Yan

Blog Archive

Labels