Monday, December 01, 2008

Frequentists vs Bayesians

Noted Berkeley statistician David Freedman recently passed away. I recommend the essay below if you are interested in the argument between frequentists (objectivists) and Bayesians (subjectivists). I never knew Freedman, but based on his writings I think I would have liked him very much -- he was clearly an independent thinker :-)

In everyday life I tend to be sympathetic to the Bayesian point of view, but as a physicist I am willing to entertain the possibility of true quantum randomness.

I wish I understood better some of the foundational questions mentioned below. In the limit of infinite data will two Bayesians always agree, regardless of priors? Are exceptions contrived?

Some issues in the foundation of statistics

Abstract: After sketching the conflict between objectivists and subjectivists on the foundations of statistics, this paper discusses an issue facing statisticians of both schools, namely, model validation. Statistical models originate in the study of games of chance, and have been successfully applied in the physical and life sciences. However, there are basic problems in applying the models to social phenomena; some of the difficulties will be pointed out. Hooke’s law will be contrasted with regression models for salary discrimination, the latter being a fairly typical application in the social sciences.

...The subjectivist position seems to be internally consistent, and fairly immune to logical attack from the outside. Perhaps as a result, scholars of that school have been quite energetic in pointing out the flaws in the objectivist position. From an applied perspective, however, the subjectivist position is not free of difficulties. What are subjective degrees of belief, where do they come from, and why can they be quantified? No convincing answers have been produced. At a more practical level, a Bayesian’s opinion may be of great interest to himself, and he is surely free to develop it in any way that pleases him; but why should the results carry any weight for others? To answer the last question, Bayesians often cite theorems showing "inter-subjective agreement:" under certain circumstances, as more and more data become available, two Bayesians will come to agree: the data swamp the prior. Of course, other theorems show that the prior swamps the data, even when the size of the data set grows without bounds-- particularly in complex, high-dimensional situations. (For a review, see Diaconis and Freedman, 1986.) Theorems do not settle the issue, especially for those who are not Bayesians to start with.

My own experience suggests that neither decision-makers nor their statisticians do in fact have prior probabilities. A large part of Bayesian statistics is about what you would do if you had a prior.7 For the rest, statisticians make up priors that are mathematically convenient or attractive. Once used, priors become familiar; therefore, they come to be accepted as "natural" and are liable to be used again; such priors may eventually generate their own technical literature. ...

It is often urged that to be rational is to be Bayesian. Indeed, there are elaborate axiom systems about preference orderings, acts, consequences, and states of nature, whose conclusion is-- that you are a Bayesian. The empirical evidence shows, fairly clearly, that those axioms do not describe human behavior at all well. The theory is not descriptive; people do not have stable, coherent prior probabilities.

Now the argument shifts to the "normative:" if you were rational, you would obey the axioms, and be a Bayesian. This, however, assumes what must be proved. Why would a rational person obey those axioms? The axioms represent decision problems in schematic and highly stylized ways. Therefore, as I see it, the theory addresses only limited aspects of rationality. Some Bayesians have tried to win this argument on the cheap: to be rational is, by definition, to obey their axioms. ...

How do we learn from experience? What makes us think that the future will be like the past? With contemporary modeling techniques, such questions are easily answered-- in form if not in substance.

·The objectivist invents a regression model for the data, and assumes the error terms to be independent and identically distributed; "iid" is the conventional abbreviation. It is this assumption of iid-ness that enables us to predict data we have not seen from a training sample-- without doing the hard work of validating the model.

·The classical subjectivist invents a regression model for the data, assumes iid errors, and then makes up a prior for unknown parameters.

·The radical subjectivist adopts an exchangeable or partially exchangeable prior, and calls you irrational or incoherent (or both) for not following suit.

In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved; although the connection with data analysis often remains to be established. And an enormous amount of fiction has been produced, masquerading as rigorous science. [!!!]


Steve said...

The theorem he is quoting is Aumann's Agreement Theorem.

Anonymous said...

I've never understood why it's called quantum randomness rather than quantum uncertainty.

With no information on laws of motion or initial conditions, our R-squared for a coin flip is 0%. But as science progresses, we acquire a better model and more information on initial conditions, and our R-squared increases. (Yes, this is circular logic here, since a better model is defined as one with higher R-squared.)

A hundred years ago, we knew of atoms, but not of quarks. As we've delved down further, we have generally been increasing our R-squared for the universe.
What makes us think
1) That we won't discover more and more (infinitely more?) layers of complexity and detail?
2) That our R-squared won't asymptote up to 100%, ever growing but never actually reaching it?

Steve Hsu said...

If qm is correct there is an *irreducible* uncertainty in predictions about the future even if we know the initial state and laws of evolution *perfectly*.

If I create a spin polarized along the +x axis (so I know the state perfectly), then immediately measure its polarization along the z axis, I have a 50-50 chance of measuring +z or -z. This uncertainty (or randomness) in the outcome is irreducible as far as we know, and there are deep theorems (Bell's inequality) that make it so.

CW said...

For a long time I've had this sense that certain questions about the foundations of statistics must eventually converge with questions about the foundations of physics, at least insofar as one cares about effectively applying statistics to the physical world.

silkop said...

As professor of physics, can you can point me to some explanation of why another (now dead) professor of physics, E. T. Jaynes, was wrong in accusing QM proponents of committing what he called the "mind projection fallacy"? (By this he meant assuming that randomness is a property of nature instead of trying to reduce uncertainty like all other scientists have done to good effect).

As for the empirical evidence that Bayesian theory is not "descriptive" wrt people's reasoning, I suspect that it refers to the research in cognitive psychology conducted by Kahneman and Tversky. What the experiments have shown is that people are not Bayesian/rational on the level assumed by the researchers. In particular, certain kinds of information are seemingly not processed according to the model. However, they have not really shown (nor do I believe it is possible to show) that the experimental subjects are not Bayesian at a deeper level, with their seemingly "wrong" answers being just the result of drawing upon MORE information than an experimenter explicitly allowed them to use. The explanation could then be that people are in fact Bayesian, but choose (for hidden Bayesian reasons) not to pay attention to the instructions of the experimenter. In other words, the experimenter's assumed model of a "Bayesian" solution to the posed problem is too naive.

Carson C. Chow said...

It is possible to subscribe to Bayesian principles without necessarily being purely subjective or objective. I put myself into that category. In my view, using Bayesian analysis simply means that we can assign a probability to any quantity or parameter. For example, suppose we want to fit a model with parameters to data. The Bayesian way is the most straightforward to me. You assign some prior to the parameters and update with the data to get a posterior for the parameter. In the frequentist approach, the parameters cannot have a probability distribution so you must do some statistical test against some null hypothesis.

I use Bayesian methods to make straightforward statements about the probability that a model explains the data. Frequentists must stand on their heads for these questions. They must have some null hypothesis and then assign significance by finding the probability that the data would arise from the null hypothesis. That seems backwards to me.

In Bayesian analysis, you can ask questions more directly and you only have to remember one equation. Classical statistics is often more art than science. In Bayesian analysis the prior is explicit whereas in classical statistics it gets hidden in the tests you use, the null hypothesis and criteria for significance.

For maximum likelihood estimation, Bayesian inference gets the same answer as the classical approach if you use a nonnormalizable flat prior. In some sense, Bayesian analysis simply turns the likelihood function into a proper probability.

Finally, as long as the prior has support over the entire domain of the likelihood function then given enough data, the prior will be overwhelmed. I'm sure you can find high dimensional counterexamples but given the curse of dimensionality, you really can't fit high dimensional models anyway, using Bayesian methods or otherwise. I'm my experience, with current computational capability, 10 parameters is about the most we can do right now.

Carson C. Chow said...

If I'm not mistaken Steve, I think Bell's inequality only applies to entangled states. I think in your example, you couldn't distinguish QM from a hidden variable theory.

Steve Hsu said...

CC: Thanks for your Bayesian comments -- I think I have the same (pragmatic) view of it as you do.

The spin example I gave is an example of true randomness *assuming* you accept qm (as conventionally formulated) is true. Bell's inequality, which does as you note make use of entanglement, has been tested experimentally and excludes local hidden variable models which would be deterministic alternatives to standard qm.

It sounds like you think cases where data fails to swamp prior are contrived, but it sounded like Freedman may not have felt that way... Someday I will look at his article with Diaconis on this subject.


Re: Jaynes, I'm not sure whether he followed the Bell results and experiments. People have analyzed these experiments against other uncertainties (e.g., from the apparatus) and conclude that nature really does violated Bell, so no local hidden variable theory is possible, so there is true unpredictability in Nature (randomness). You might find the comments (by Shimony) on p.54 of this document of interest:*.pdf

Finally, I suspect there are lab experiments where people actually violate hard consistency conditions that would be required of a Bayesian. (The only way out being that perhaps they assume God intervenes in the experiment and probabilities of, e.g., dice throws, are not as given by the experimenter :-)

silkop said...

I would be very surprised if Jaynes wasn't aware of Bell's work when he has wrote his critique in 2003. Jaynes made abundant references to other scientists, based on detailed historical research, so it's hard to accept that he'd just chose to ignore those who did not fit his views (although it is entirely possible he misunderstood some, possibly in assuming that they misunderstood him).

I vaguely remember that Jaynes explicitly opposed Shimony (perhaps even ridiculed his ideas, as he was fond of doing in general). Now I'm inspired to look for more concrete references (Jaynes on Bell, Jaynes on Shimony)...

silkop said...

Ok, based on this article of 1990, it is certain that Jaynes was aware of Bell's work and unimpressed by his reasoning, although he doesn't seem to provide a direct strong argument against it:

I'm not a physicist, so it's over my head when Jaynes (or anyone else) jumps into physics equations. That's also why I'd love to learn what a well-informed contemporary physicist who really understands Jaynes would have to say about his arguments. Because it seems that QM is the *only* field from which Jaynes's interpretation (which he himself credited to Laplace/Jeffreys/Cox) could be seriously challenged.

Steve Hsu said...


Thanks for that link -- I will have a look, time permitting :-)

I should mention that there are people in quantum foundations who would like to characterize quantum uncertainty as describing the state of knowledge of the observer, and not a fundamental aspect of the universe. (Quantum Bayesians!) I don't quite see how those approaches can work, and I think even those people have some problems with Bell...

silkop said...

It's nice to know that there are "quantum Bayesians". Ok, so now I'll just subscribe to your blog and watch for deep insights. ;-)

I also found a reference to Shimony on page 21 of this paper:

My memory served me quite well. Jaynes: "The conflict is not between Shimony and me, but between Shimony and the English language."

CW said...

Regarding the Bayesian approach to the interpretation of QM, I believe the go-to guys are Carlton Caves, Chris Fuchs, and their collaborators (co-conspirators?) in this area. NOTE: They clearly are both endowed with a robust sense of humor.

From Caves' home page:

Ask, "How does the incredible variety of the world come to be?" and a disembodied voice replies enigmatically, "Rolls of the quantum dice." Puzzled, ask, "But what does that mean?" and the answer is whispered soundlessly in your ear, "Rolls of the quantum dice plumb a bottomless well of information, giving rise to the variety of the world around you." Still puzzled, but sensing the important question, ask haltingly, "So, where is that well?" and no answer comes forth, not even a whisper, for there is no answer, only a description, the phrase that Gertrude Stein used to damn her native Oakland, "There is no there there."

Steve Hsu said...

silkop: I just looked at the relevant part of the first Jaynes paper you referenced (p.10), and I have to say I am a bit disappointed. He is asserting a Bayesian alternative to standard qm (what we usually call probabilities are not really objective probabilities), but admits he can't formulate it...

"This is not an auspicious time to be making public announcements of startling revolutionary
new scientific discoveries so it is rather a relief that we have none to announce. To exhibit the
variables of that deeper hypothesis space explicitly is a job for the future..."

CW: Thanks for that quote from Caves. He and I at least agree on the implications of the standard theory (random quantum outcomes determine almost everything about the world around us)! Everett *does* have an explanation for the "infinite well of quantum dice rolls" that Caves asks about (and the answer is truly objective and frequentist, modulo some technical problems I and others have written about), but I doubt Caves likes the answer :-)

CW said...

From Chris Fuchs' home page (see 'Talks' at the bottom):

Being Bayesian in a Quantum World
(PowerPoint, 4,502 KB, 39 slides)

"This is a talk dedicated to Anton Zeilinger on the occasion of his 60th birthday. In it, I strive to make quantum mechanics look as much like Bayesian probability theory as possible. The shape of the gap where the two structures differ—it is speculated—gives the clearest indication yet of what the "reality" underlying quantum theory may actually look like."

Steve Hsu said...

I've discussed this issue with both Caves and Fuchs (both have been here at UO recently) and have to say I am unconvinced, but ready to examine any substantive proposal...

Carson C. Chow said...

Well they show inconsistency if you have an infinite number of parameters. My point is that you couldn't even remotely map out the posterior space for much higher than 10 parameters today and there are probably not enough bits or time in the known universe to do a computation for 100 parameters. So, in a practical sense, we only use Bayesian analysis when it is consistent, at least I do for the simple models I try to fit.

Jeff said...

A nice thing about particle physics (and in contrast to just about everything else) is that one can very nearly make identical, repeated experiments. More specifically, one can quantify essentially all the uncertainties associated with each observation/experiment. Because of this, results in particle physics are more amenable to a frequentist interpretation of statistics.

CW said...

It's worth quoting Nassim Nicholas Taleb in this context, from his recent essay on


I spent a long time believing in the centrality of probability in life and advocating that we should express everything in terms of degrees of credence, with unitary probabilities as a special case for total certainties and null for total implausibility. Critical thinking, knowledge, beliefs—everything needed to be probabilized. Until I came to realize, twelve years ago, that I was wrong in this notion that the calculus of probability could be a guide to life and help society. Indeed, it is only in very rare circumstances that probability (by itself) is a guide to decision making. It is a clumsy academic construction, extremely artificial, and nonobservable. Probability is backed out of decisions; it is not a construct to be handled in a stand-alone way in real-life decision making. It has caused harm in many fields.

The disputes between frequentists and Bayesians seem (to me) to be more than slightly related to the considerations that led Taleb to this conclusion.

Anonymous said...

Taleb is also fond of saying that the normal distribution is idiotic/evil and chiding people smarter than himself as ivory tower academicians - while he offers no superior substitute. So far he has failed to notice that he is 1) boring 2) conceited 3) incompetent to tackle the topic.

If you want to learn about applications and usefulness of probability theory, there are better sources than Taleb's stories of personal and "everyone else's" failure.

Blog Archive