Information Processing: bayes

Showing posts with label bayes. Show all posts

Saturday, April 15, 2017

History of Bayesian Neural Networks

This talk gives the history of neural networks in the framework of Bayesian inference. Deep learning is (so far) quite empirical in nature: things work, but we lack a good theoretical framework for understanding why or even how. The Bayesian approach offers some progress in these directions, and also toward quantifying prediction uncertainty.

I was sad to learn from this talk that David Mackay passed last year, from cancer. I recommended his book Information theory, inference and learning algorithms back in 2007.

Yarin Gal's dissertation Uncertainty in Deep Learning, mentioned in the talk.

I suppose I can thank my Caltech education for a quasi-subconscious understanding of neural nets despite never having worked on them. They were in the air when I was on campus, due to the presence of John Hopfield (he co-founded the Computation and Neural Systems PhD program at Caltech in 1986). See also Hopfield on physics and biology.

Amusingly, I discovered this talk via deep learning: YouTube's recommendation engine, powered by deep neural nets, suggested it to me this Saturday afternoon :-)

Wednesday, June 26, 2013

Kolmogorov, Solomonoff, and de Finetti

This is a nice historical article that connects a number of key figures in the history of probability and information theory. See also Frequentists and Bayesians, Jaynes and Bayes, and On the Origin of Probability in Quantum Mechanics.

Induction: From Kolmogorov and Solomonoff to De Finetti and Back to Kolmogorov
John J. McCall DOI: 10.1111/j.0026-1386.2004.00190.x

ABSTRACT This paper compares the solutions to “the induction problem” by Kolmogorov, de Finetti, and Solomonoff. Brief sketches of the intellectual history of de Finetti and Kolmogorov are also composed. Kolmogorov's contributions to information theory culminated in his notion of algorithmic complexity. The development of algorithmic complexity was inspired by information theory and randomness. Kolmogorov's best-known contribution was the axiomatization of probability in 1933. Its influence on probability and statistics was swift, dramatic, and fundamental. However, Kolmogorov was not satisfied by his treatment of the frequency aspect of his creation. This in time gave rise to Kolmogorov complexity. De Finetti, on the other hand, had a profound vision early in his life which was encapsulated in his exchangeability theorem. This insight simultaneously resolved a fundamental philosophical conundrum—Hume's problem, and provided the bricks and mortar for de Finetti's constructive probabilistic theory. Most of his subsequent research involved extensions of his representation theorem. De Finetti was against determinism and celebrated quantum theory, while Kolmogorov was convinced that in every seemingly indeterministic manifestation there lurked a hidden deterministic mechanism. Solomonoff introduced algorithmic complexity independently of Kolmogorov and Chaitin. Solomonoff's motivation was firmly focused on induction. His interest in induction was to a marked extent sparked by Keynes’ 1921 seminal book. This interest in induction has never faltered, remaining prominent in his most recent research. The decisive connection between de Finetti and Kolmogorov was their lifelong interest in the frequency aspect of induction. Kolmogorov's solution to the problem was algorithmic complexity. De Finetti's solution to his frequency problem occurred early in his career with the discovery of the representation theorem. In this paper, we try to explain these solutions and mention related topics which captured the interest of these giants.

Excerpt below from the paper. I doubt Nature (evolution) uses Solomonoff's Universal Prior (determined by minimum length programs), as it is quite expensive to compute. I think our priors and heuristics are much more specialized and primitive.

... There are a host of similarities and equivalences joining concepts like Shannon entropy, Kolmogorov complexity, maximum entropy, Bayes etc. Some of these are noted. In short, the psychological aspects of probability and induction emphasized by de Finetti, Ramsey and Keynes may eventually emerge from a careful and novel neuroscientific study. In this study, neuronal actors would interact as they portray personal perception and mem- ories. In this way, these versatile neuronal actors comprise the foundation of psychology.

... In comparing de Finetti and Kolmogorov one becomes entangled in a host of controversial issues. de Finetti was an indeterminist who championed a subjective inductive inference. Kolmogorov sought determinism in even the most chaotic natural phenomena and reluctantly accepted a frequency approach to statistics, an objective science. In the 1960s Kolmogorov questioned his frequency position and developed algorithmic complexity, together with Solomonoff and Chaitin. With respect to the frequency doubts, he would have been enlightened by de Finetti’s representation theorem. The KCS research remains an exciting research area in probability, statistics and computer science. It has raised a whole series of controversial issues. It appears to challenge both the frequency and Bayesian schools: the former by proclaiming that the foundations of uncertainty are to be found in algorithms, information and combinatorics, rather than probability; the latter by replacing the Bayes prior which differs across individuals in accord with their distinctive beliefs with a universal prior applicable to all and drained of subjectivity.

Saturday, July 10, 2010

Beyond Bayes: causality vs correlation

A draft paper by Harvard graduate student James Lee (student of Steve Pinker; I'd love to post the paper here but don't know yet if that's OK) got me interested in the work of statistical learning pioneer Judea Pearl. I found the essay Bayesianism and Causality, or, why I am only a half-Bayesian (excerpted below) a concise, and provocative, introduction to his ideas.

Pearl is correct to say that humans think in terms of causal models, rather than in terms of correlation. Our brains favor simple, linear narratives. The effectiveness of physics is a consequence of the fact that descriptions of natural phenomena are compressible into simple causal models. (Or, perhaps it just looks that way to us ;-)

Judea Pearl: I turned Bayesian in 1971, as soon as I began reading Savage’s monograph The Foundations of Statistical Inference [Savage, 1962]. The arguments were unassailable: (i) It is plain silly to ignore what we know, (ii) It is natural and useful to cast what we know in the language of probabilities, and (iii) If our subjective probabilities are erroneous, their impact will get washed out in due time, as the number of observations increases.

Thirty years later, I am still a devout Bayesian in the sense of (i), but I now doubt the wisdom of (ii) and I know that, in general, (iii) is false. Like most Bayesians, I believe that the knowledge we carry in our skulls, be its origin experience, schooling or hearsay, is an invaluable resource in all human activity, and that combining this knowledge with empirical data is the key to scientific enquiry and intelligent behavior. Thus, in this broad sense, I am a still Bayesian. However, in order to be combined with data, our knowledge must first be cast in some formal language, and what I have come to realize in the past ten years is that the language of probability is not suitable for the task; the bulk of human knowledge is organized around causal, not probabilistic relationships, and the grammar of probability calculus is insufficient for capturing those relationships. Specifically, the building blocks of our scientific and everyday knowledge are elementary facts such as “mud does not cause rain” and “symptoms do not cause disease” and those facts, strangely enough, cannot be expressed in the vocabulary of probability calculus. It is for this reason that I consider myself only a half-Bayesian. ...

Sunday, May 09, 2010

Climate change priors and posteriors

I recommend this nice discussion of climate change on Andrew Gelman's blog. Physicist Phil, the guest-author of the post, gives his prior and posterior probability distribution for temperature sensitivity as a function of CO2 density. I guess I'm somewhere between Skeptic and Phil Prior.

As an aside, I think it is worth distinguishing between a situation where one has a high confidence level about a probability distribution (e.g., at an honest casino game like roulette or blackjack) versus in the real world, where even the pdf itself isn't known with any confidence (Knightian uncertainty). Personally, I am in the latter situation with climate science.

Here is an excerpt from a skeptic's comment on the post:

... So where are we on global climate change? We have some basic physics that predicts some warming caused by CO2, but a lot of positive and negative feedbacks that could amplify and attenuate temperature increases. We have computer models we can't trust for a variety of reasons. We have temperature station data that might have been corrupted by arbitrary "adjustments" to produce a warming trend. We have the north polar ice area decreasing, while the south polar ice area is constant or increasing. Next year an earth satellite will launch that should give us good measurements of polar ice thickness using radar. Let's hope that data doesn't get corrupted. We have some alternate theories to explain temperature increases such as cosmic ray flux. All this adds up to a confused and uncertain picture. The science is hardly "settled."

Finally the public is not buying AGW. Anyone with common sense can see that the big funding governments have poured into climate science has corrupted it. Until this whole thing gets an independent review from trustworthy people, it will not enjoy general acceptance. You can look for that at the ballot box next year.

For a dose of (justified?) certitude, see this angry letter, signed by numerous National Academy of Science members, that appeared in Science last week. See here for a systematic study of the record of expert predictions about complex systems. Scientists are only slightly less susceptible than others to group think.

Thursday, December 11, 2008

Jaynes and Bayes

E.T. Jaynes, although a physicist, was one of the great 20th century proponents of Bayesian thinking. See here for a wealth of information, including some autobiographical essays. I recommend this article on probability, maximum entropy and Bayesian thinking, and this, which includes his recollections of Dyson, Feynman, Schwinger and Oppenheimer.

Here are the first three chapters of his book Probability Theory: the Logic of Science. The historical material in the preface is fascinating.

Jaynes started as an Oppenheimer student, following his advisor from Berkeley to Princeton. But Oppenheimer's mystical adherence to the logically incomplete Copenhagen interpretation (Everett's "philosophic monstrosity") led Jaynes to switch advisors, becoming a student of Wigner.

Edwin T. Jaynes was one of the first people to realize that probability theory, as originated by Laplace, is a generalization of Aristotelian logic that reduces to deductive logic in the special case that our hypotheses are either true or false. This web site has been established to help promote this interpretation of probability theory by distributing articles, books and related material. As Ed Jaynes originated this interpretation of probability theory we have a large selection of his articles, as well as articles by a number of other people who use probability theory in this way.

See Carson Chow for a nice discussion of how Bayesian inference is more like human reasoning than formal logic.

The seeds of the modern era could arguably be traced to the Enlightenment and the invention of rationality. I say invention because although we may be universal computers and we are certainly capable of applying the rules of logic, it is not what we naturally do. What we actually use, as coined by E.T. Jaynes in his iconic book Probability Theory: The Logic of Science, is plausible reasoning. Jaynes is famous for being a major proponent of Bayesian inference during most of the second half of the last century. However, to call Jaynes’s book a book about Bayesian statistics is to wholly miss Jayne’s point, which is that probability theory is not about measures on sample spaces but a generalization of logical inference. In the Jaynes view, probabilities measure a degree of plausibility.

I think a perfect example of how unnatural the rules of formal logic are is to consider the simple implication A -> B which means - If A is true then B is true. By the rules of formal logic, if A is false then B can be true or false (i.e. a false premise can prove anything). Conversely, if B is true, then A can be true or false. The only valid conclusion you can deduce from is that if B is false then A is false. ...

However, people don’t always (seldom?) reason this way. Jaynes points out that the way we naturally reason also includes what he calls weak syllogisms: 1) If A is false then B is less plausible and 2) If B is true then A is more plausible. In fact, more likely we mostly use weak syllogisms and that interferes with formal logic. Jaynes showed that weak syllogisms as well as formal logic arise naturally from Bayesian inference.

[Carson gives a nice example here -- see the original.]

...I think this strongly implies that the brain is doing Bayesian inference. The problem is that depending on your priors you can deduce different things. This explains why two perfectly intelligent people can easily come to different conclusions. This also implies that reasoning logically is something that must be learned and practiced. I think it is important to know when you draw a conclusion, whether you are using deductive logic or if you are depending on some prior. Even if it is hard to distinguish between the two for yourself, at least you should recognize that it could be an issue.

While I think the brain is doing something like Bayesian inference (perhaps with some kinds of heuristic shortcuts), there are probably laboratory experiments showing that we make a lot of mistakes and often do not properly apply Bayes' theorem. A quick look through the old Kahneman and Tversky literature would probably confirm this :-)

Monday, December 01, 2008

Frequentists vs Bayesians

Noted Berkeley statistician David Freedman recently passed away. I recommend the essay below if you are interested in the argument between frequentists (objectivists) and Bayesians (subjectivists). I never knew Freedman, but based on his writings I think I would have liked him very much -- he was clearly an independent thinker :-)

In everyday life I tend to be sympathetic to the Bayesian point of view, but as a physicist I am willing to entertain the possibility of true quantum randomness.

I wish I understood better some of the foundational questions mentioned below. In the limit of infinite data will two Bayesians always agree, regardless of priors? Are exceptions contrived?

Some issues in the foundation of statistics

Abstract: After sketching the conflict between objectivists and subjectivists on the foundations of statistics, this paper discusses an issue facing statisticians of both schools, namely, model validation. Statistical models originate in the study of games of chance, and have been successfully applied in the physical and life sciences. However, there are basic problems in applying the models to social phenomena; some of the difficulties will be pointed out. Hooke’s law will be contrasted with regression models for salary discrimination, the latter being a fairly typical application in the social sciences.

...The subjectivist position seems to be internally consistent, and fairly immune to logical attack from the outside. Perhaps as a result, scholars of that school have been quite energetic in pointing out the flaws in the objectivist position. From an applied perspective, however, the subjectivist position is not free of difficulties. What are subjective degrees of belief, where do they come from, and why can they be quantified? No convincing answers have been produced. At a more practical level, a Bayesian’s opinion may be of great interest to himself, and he is surely free to develop it in any way that pleases him; but why should the results carry any weight for others? To answer the last question, Bayesians often cite theorems showing "inter-subjective agreement:" under certain circumstances, as more and more data become available, two Bayesians will come to agree: the data swamp the prior. Of course, other theorems show that the prior swamps the data, even when the size of the data set grows without bounds-- particularly in complex, high-dimensional situations. (For a review, see Diaconis and Freedman, 1986.) Theorems do not settle the issue, especially for those who are not Bayesians to start with.

My own experience suggests that neither decision-makers nor their statisticians do in fact have prior probabilities. A large part of Bayesian statistics is about what you would do if you had a prior.7 For the rest, statisticians make up priors that are mathematically convenient or attractive. Once used, priors become familiar; therefore, they come to be accepted as "natural" and are liable to be used again; such priors may eventually generate their own technical literature. ...

It is often urged that to be rational is to be Bayesian. Indeed, there are elaborate axiom systems about preference orderings, acts, consequences, and states of nature, whose conclusion is-- that you are a Bayesian. The empirical evidence shows, fairly clearly, that those axioms do not describe human behavior at all well. The theory is not descriptive; people do not have stable, coherent prior probabilities.

Now the argument shifts to the "normative:" if you were rational, you would obey the axioms, and be a Bayesian. This, however, assumes what must be proved. Why would a rational person obey those axioms? The axioms represent decision problems in schematic and highly stylized ways. Therefore, as I see it, the theory addresses only limited aspects of rationality. Some Bayesians have tried to win this argument on the cheap: to be rational is, by definition, to obey their axioms. ...

How do we learn from experience? What makes us think that the future will be like the past? With contemporary modeling techniques, such questions are easily answered-- in form if not in substance.

·The objectivist invents a regression model for the data, and assumes the error terms to be independent and identically distributed; "iid" is the conventional abbreviation. It is this assumption of iid-ness that enables us to predict data we have not seen from a training sample-- without doing the hard work of validating the model.

·The classical subjectivist invents a regression model for the data, assumes iid errors, and then makes up a prior for unknown parameters.

·The radical subjectivist adopts an exchangeable or partially exchangeable prior, and calls you irrational or incoherent (or both) for not following suit.

In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved; although the connection with data analysis often remains to be established. And an enormous amount of fiction has been produced, masquerading as rigorous science. [!!!]

Information Processing

About Me