In the reference lists below, references with "✶" are particularly recommended. Kruschke refers to John Kruschke's Doing Bayesian Data Analysis (2nd edition), the recommended text for STSCI 4780. Rethinking refers to McElreath's Statistical Rethinking. BDA3 refers to the third edition of Bayesian Data Analysis by Gelman et al. (2013), the standard statistics PhD-level reference on Bayesian data analysis.
Slides: Lec01-BigPicture.pdf
While not covering the Bayesian approach from a technical perspective, as we did in Lec01, Sharon McGrayne's 2012 book, ✶ The Theory That Would Not Die, provides a popular-level treatment of the history of Bayesian inference and its place in modern science. This book was a New York Times "Editor's Choice" selection; see the 2011 pre-publication review. For more info about the book, see Sharon Bertsch McGrayne's "The Theory..." web page. The book was timed to come out just before the 250th anniversary of the publication of Bayes's paper presenting a special case of what came to be called Bayes's theorem. McGrayne was the dinner speaker at Bayes 250 Day, a meeting held by the International Society for Bayesian Analysis (ISBA) in 2013 to honor the anniversary.
I can't resist a personal, astronomical note on the book. McGrayne interviewed me as part of her research for it. I get a brief mention (for helping to introduce Bayesian methods into modern astronomy). But our main correspondence and discussion was about Laplace's work, especially a famous calculation he did estimating the mass of Saturn. He used Bayes's theorem to estimate Saturn's mass from noisy data. His late-18th century parlance for what we today call a credible region went as follows:
Applying to them my formulae of probability I find that it is a bet of 11,000 against one that the error of this result is not 1/100 of its value....
Indeed, today's best estimate is well within a percent of Laplace's estimate; he would have won that bet! See the Lec02 notes below for more from and about Laplace.
The main technical part of Lec01 concerned a key distinction between Bayesian and frequentist approaches: both approaches involve computing averages and integrals, but over different spaces. Bayesian probabilities typically involve sums or averages over the hypothesis or parameter space; frequentist probabilities instead sum or average over the data or sample space. Two papers of mine elaborate on this distinction:
- The Promise of Bayesian Inference for Astrophysics | SpringerLink — An old (1991) paper using some simple examples to highlight this difference. For a longer version, see: The Promise... (unabridged).
- Bayesian Astrostatistics: A Backward Look to the Future | SpringerLink — This focuses more on concepts than calculations, and addresses some misunderstandings about the difference between Bayesian and frequentist approaches to quantifying uncertainty.
If you are very familiar with frequentist statistics, you may appreciate this old paper by Ed Jaynes offers several example calculations displaying the difference results one can get when adopting Bayesian vs. frequentist approaches to simple problems; this paper inspired my 1991 paper cited above:
- Confidence Intervals vs Bayesian Intervals | SpringerLink — Also available as item 32 in the "E. T. Jaynes: Articles" section of Probability Theory As Extended Logic - Bayesian resources at Washington U.
Slides: Lec02-Logic+Probability.pdf
- Propositions and arguments
- Valid vs. sound arguments
- Propositional logic: unary and binary operations/connectives, truth tables
- Inductive vs. deductive logic
- Probability theory as a calculus for inductive logic
-
For a brief overview of the historical connections between logic and probability theory, see: ✶ "Bayesian Methods: General Background" by Ed Jaynes (1986).
-
The earliest extensive work using probability theory in this way to tackle data analysis problems is Laplace's Théorie analytique des probabilités (TAP). Despite its enormous importance, to my knowledge TAP has never been translated into English. Laplace summarized his main ideas in Essai philosophique sur les probabilités (Philosophical Essay on Probabilities, 1814), which is available in English. A fairly recent translation is by Andrew Dale: Philosophical Essay on Probabilities. This is a free download via the CULib SpringerLink subscription, but it is not for the faint of heart. Laplace offers the essay as a non-mathematical description of his approach, but he accomplishes this by the artifice of writing out key equations in words! For further details about Laplace's work in probability theory, see Laplace on probability and statistics (part of a site on the history of probability hosted by the Xavier University CS department).
-
The approach Laplace adopted came to be known as the method of inverse probability. It's what we would today call parametric Bayesian inference, with use of uniform priors by default—setting the posterior to be proportional to the likelihood function, "inverting" the conditional. If you're interested in the complicated historical path from "inverse probability" to "Bayesian inference," see "When did Bayesian inference become 'Bayesian'?" by Stephen E. Fienberg (2006). Spoiler: Fisher appears to have invented the modern usage of "Bayesian," intending it as somewhat of an insult, referring to methods different from the fiducial probability approach he advocated, which failed to develop a following. (Fisher, probably the most influential statistician of the 20th century, had a reputation of being more than a bit ornery!) The modern (non-derogatory!) usage only became common in the statistics literature in the 1960s.
-
When scientific interest in Bayesian methods revived in the early 20th century, key early figures expounding the probability theory as generalized logic viewpoint were Harold Jeffreys (geophysicist, mathematician), John Maynard Keynes (economist, mathematician), and Rudolf Carnap (philosopher). From among the writings of these thinkers, Jeffreys's Theory of Probability offers the most accessible and useful account of this viewpoint for scientist and engineer readers (philosophers may prefer Carnap and Keynes). A trio of leading French Bayesian statisticians recently looked back at this historically important book: "Harold Jeffreys’s Theory of Probability Revisited". Jeffreys's book played a major role in my own early education in Bayesian methods. It's still a good read.
-
The most clear and forceful mid-20th century advocates of the probability-as-logic viewpoint were two physicists, Richard Cox (Richard Threlkeld Cox - Wikipedia) and Ed Jaynes (Edwin Thompson Jaynes - Wikipedia). They built on the legacies of Laplace and Jeffreys. An accessible early exposition is the article, "How Does the Brain Do Plausible Reasoning?", by Ed Jaynes (1957; the link is to a 1988 reprint of the original report). For a brief (4pp) overview of the line of argument, see section 3 of my first Bayesian publication: "From Laplace to Supernova SN 1987A: Bayesian Inference in Astrophysics" (1990; note the errata at the end, and be patient with the sometimes-polemical tone, reflecting the controversial status of Bayesian methods ca. 1990; this paper became half of my PhD thesis).
-
The most thorough recent treatment is in the first two chapters of Jaynes's posthumously published book, Probability theory: The logic science (PTLOS) (on course reserve). ✶ These two chapters are available online via Washington University, where Jaynes worked and where his last PhD student, G. Larry Bretthorst, maintains an archive of Jaynes's work: Probability Theory As Extended Logic. More chapters from a pre-publication version of the book are available via archive.org. This book is truly outstanding, though it was written mostly before the advent of powerful computational methods for Bayesian inference, and thus is not the most practical book on modern Bayesian data analysis.
-
For a fun read about Jaynes's book, see the entertaining review by Stanford (formerly Cornell) statistician/mathematician Persi Diaconis: ✶ "A Frequentist Does This, A Bayesian That":
There are many places in which I want to yell at him. He's so full of himself. That's what makes the book so terrific. It's the real thing—the best introduction to Bayesian statistics that I know.
-
An alternative viewpoint on Bayesian probability theory is the subjective Bayes approach. This considers probability theory as a kind of calculus for guiding bets based on personal beliefs about uncertain outcomes. This approach arose nearly contemporaneously to the "probability as logic" approach in the early 20th century, but in different literature—in economtrics and statistics. The two approaches are close in spirit and execution, although they take different attitudes toward how objective or subjective calculations should be (especially in regard to assigning priors). In particular, part of the subjective Bayes literature is concerned with prior elicitation—techniques for working with domain experts in order to build priors that attempt to quantify expert pre-experiment knowledge.
-
Prominent founders of subjective Bayes include Frank P. Ramsey (Wikipedia) and Bruno de Finetti (Wikipedia); prominent early proponents of this viewpoint in applied statistics include Leonard "Jimmie" Savage (Wikipedia) and Dennis Lindley (Wikipedia). Abraham Wald (Wikipedia) also played an important role, establishing connections between this approach and (frequentist) decision theory. Lindley was active until his death, at 90, in 2014, and was probably the most influential of the early "neo-Bayesians" in terms of impact on applied statistics.
-
Two particularly clear recent books explaining the subjective Bayes outlook are Dennis Lindley's ✶ Understanding Uncertainty (a free download via CULib) and Jay Kadane's Principles of Uncertainty (2011; on course reserve and also available as a free PDF via Kadane's book site archived at archive.org). Lindley's book is largely nontechnical; Kadane's is mathematical, but with an emphasis on concepts, especially in the early chapters (I'd give it a ✶, but it's more technical than other starred resources). See also CHANCE magazine's interview: "Discussing Principles of Uncertainty with Jay Kadane". Lindley published an article-length overview: ✶ "The Philosophy of Statistics" (2000; this was published in The Statistician, a Royal Statistical Society journal intended for a wide, not-too-technical audience; this journal became the magazine Significance in 2004, a joint publication of the RSS and the American Statistical Society).
-
The objective Bayes viewpoint rests on the subjective Bayes foundational arguments (optimal betting), but seeks to use rule-based priors as much as possible. It has a lot of overlap with the probability-as-logic viewpoint. Many practitioners who favor logical Bayes foundations nevertheless would happily accept "objective Bayes" as a fair description of their approach.