Bayesian Thinking for People Who Hated Statistics

corridor, Tuesday morning. The professor uncaps a marker and writes throughout the whiteboard: P(A|B) = P(B|A) · P(A) / P(B). Your hand copies the system. Your mind checks out someplace across the vertical bar.

If that reminiscence simply surfaced, you’re in good firm. Analysis suggests up to 80% of college students expertise some type of statistics nervousness. For a lot of, it’s the strongest predictor of their course grade (stronger than prior math potential, according to a University of Kansas study).

Right here’s what most statistics programs by no means point out: you’ve been doing Bayesian reasoning since childhood. The system on the whiteboard wasn’t educating you one thing new. It was burying one thing you already understood underneath a pile of notation.

The Drawback That Broke 82% of Medical doctors

Do that earlier than studying additional.

One % of ladies aged 40 who take part in routine screening have breast most cancers. A mammogram appropriately identifies most cancers 80% of the time. It additionally produces a false alarm 9.6% of the time, flagging most cancers when none exists.

A lady will get a constructive mammogram. What’s the chance she truly has most cancers?

Take a second.

In 1978, researchers at Harvard Medical College posed a similar base-rate problem to 60 physicians and medical college students. Solely 18% arrived on the right reply. Almost half guessed 95%.

The precise reply for the mammogram drawback: 7.8%.

The trick is to depend as an alternative of calculate. Take 10,000 girls:

100 have most cancers (that’s 1%)
Of these 100, 80 take a look at constructive (80% sensitivity)
Of the 9,900 cancer-free girls, about 950 get a false constructive (9.6%)

Whole constructive mammograms: 80 + 950 = 1,030.

Ladies who even have most cancers among the many positives: 80.

Chance: 80 ÷ 1,030 = 7.8%.

The false positives from the huge wholesome group swamp the true positives from the small most cancers group. Picture by the creator.

No Greek letters required. Simply counting.

In Python, it’s 4 strains:

prior = 0.01           # 1% base price
sensitivity = 0.80     # P(constructive | most cancers)
false_pos = 0.096      # P(constructive | no most cancers)

posterior = (sensitivity * prior) / (
    sensitivity * prior + false_pos * (1 - prior)
)
print(f"{posterior:.1%}")  # 7.8%

German psychologist Gerd Gigerenzer spent many years learning this precise failure. When he and Ulrich Hoffrage rewrote probability problems using natural frequencies (counting actual individuals as an alternative of juggling percentages), right responses amongst naive individuals jumped from the one digits to just about 50%. Similar math, completely different illustration. The bottleneck was by no means intelligence. It was the format.

You’ve Been Bayesian Your Entire Life

You do that calculation unconsciously day-after-day.

Your buddy recommends a restaurant. “Greatest pad thai within the metropolis,” she says. You open Google Maps: 4.2 stars, 1,200 evaluations. Your prior (she is aware of Thai meals, she’s been proper earlier than) meets the proof (stable however not stellar evaluations from strangers). Your up to date perception: in all probability good, value making an attempt. You go.

That’s Bayes’ theorem in three seconds. Prior perception + new proof = up to date perception.

A noise at 3 AM. Your prior: the cat knocked one thing over (this occurs twice every week). The proof: it feels like glass shattering, not a gentle thud. Your posterior shifts. You stand up to examine. If you happen to discover the cat standing subsequent to a damaged vase, whiskers twitching, your perception updates once more. Prior confirmed. Again to sleep.

You examine the climate app: 40% probability of rain. You look exterior at a blue sky with no clouds on the horizon. Your inside mannequin disagrees with the app. You seize a light-weight jacket however depart the umbrella.

You get an electronic mail out of your CEO asking you to purchase present playing cards. Your prior: she has by no means made a request like this earlier than. The proof: the e-mail got here from a Gmail handle, the grammar feels off, the tone is incorrect. Your posterior: virtually definitely phishing. You don’t click on.

None of those really feel like statistics. They really feel like frequent sense. That’s the purpose.

The system on the whiteboard was simply notation for what your mind does between sensing an issue and making a choice.

The perceived hole between “statistics” and “frequent sense” is an artifact of how statistics is taught. Begin with the system, and also you get confusion. Begin with the instinct, and the system writes itself.

Why Your Statistics Course Obtained It Backwards

This isn’t a fringe critique. The statistics institution itself has began saying it out loud.

In 2016, the American Statistical Association (ASA) released its first formal guidance on a selected statistical technique in 177 years of existence. The goal: p-value misuse. Among the many six rules: p-values don’t measure the chance {that a} speculation is true, and the 0.05 significance threshold is “standard and arbitrary.”

Three years later, 854 scientists signed a Nature commentary titled “Scientists Rise Up In opposition to Statistical Significance.” The identical challenge of The American Statistician carried 43 papers on what comes after p < 0.05.

The core structural drawback, as biostatistician Frank Harrell at Vanderbilt describes it: frequentist statistics asks “how unusual are my information, assuming nothing fascinating is occurring?” That’s P(information | speculation). What you truly need is: “given this information, how doubtless is my speculation?” That’s P(speculation | information).

These are usually not the identical query. Complicated them is what mathematician Aubrey Clayton calls “Bernoulli’s Fallacy,” an error he traces to a selected mistake by Jacob Bernoulli within the 18th century that has been baked into curricula ever since.

How deep does this confusion go? A 2022 study discovered that 73% of statistics methodology instructors (not college students, instructors) endorsed the commonest misinterpretation of p-values, treating them as P(speculation | information).

“P-values situation on what’s unknown and don’t situation on what is thought. They’re backward chances.”

Frank Harrell, Vanderbilt College

The downstream end result: a replication disaster. The Reproducibility Project tried to copy 100 printed psychology research. Roughly 60% failed. Replicated results have been, on common, half the initially reported measurement. P-hacking (adjusting evaluation till p < 0.05 seems) was recognized as a major driver.

Bayes in 5 Minutes, No Formulation

Each Bayesian calculation has precisely three elements.

The Prior. What you believed earlier than seeing any proof. Within the mammogram drawback, it’s the 1% base price. Within the restaurant resolution, it’s your buddy’s observe report. Priors aren’t guesses; they will incorporate many years of knowledge. They’re your beginning place.

The Chance. How possible is the proof you noticed, underneath every potential state of actuality? If most cancers is current, how doubtless is a constructive take a look at? (80%.) If absent, how doubtless? (9.6%.) The ratio of those two numbers (80 ÷ 9.6 ≈ 8.3) is the chance ratio. It measures the diagnostic energy of the proof: how a lot ought to this proof transfer your perception?

The Posterior. Your up to date perception after combining prior with proof. That is what you care about. Within the mammogram case: 7.8%.

That’s the entire framework. Prior × Chance = Posterior (after normalizing). The system P(A|B) = P(B|A) · P(A) / P(B) is shorthand for “replace what you believed primarily based on what you simply discovered.”

One important rule: a powerful prior wants robust proof to maneuver. If you happen to’re 95% positive your deployment is secure and a single noisy alert fires, your posterior barely budges. But when three impartial monitoring programs all flag the identical service at 3 AM, the proof overwhelms the prior. Your perception shifts quick. For this reason patterns matter greater than single information factors, and why accumulating proof is extra highly effective than any single take a look at.

The PRIOR Framework: Bayesian Reasoning at Work

Right here’s a five-step course of you may apply at your desk on Monday morning. No statistical software program required.

P: Pin Your Prior

Earlier than taking a look at any information, write down what you imagine and why. Power a quantity. “I feel there’s a 60% probability the conversion drop is attributable to the brand new checkout circulation.” This prevents anchoring to regardless of the information exhibits first.

Labored instance: Your crew’s A/B take a look at studies a 12% raise in sign-ups. Earlier than deciphering, ask: what was your prior? If 9 out of ten related experiments at your organization produced lifts underneath 5%, a 12% end result deserves scrutiny, not celebration. Your prior says giant results are uncommon right here.

R: Fee the Proof

Ask two questions:

If my perception is right, how doubtless is that this proof?
If my perception is incorrect, how doubtless is that this proof?

The ratio issues greater than both quantity alone. A ratio close to 1 means the proof is equally in line with each explanations (it’s weak, barely value updating on). A ratio of 8:1 or greater means the proof strongly favors one facet. Transfer your perception accordingly.

I: Invert the Query

Earlier than concluding something, examine: am I answering the query I care about? “What’s the chance of seeing this information if my speculation have been true” is just not “what’s the chance my speculation is true given this information.” The primary is a p-value. The second is what you need. Complicated them is the one most typical statistical error in printed analysis.

O: Output Your Up to date Perception

Mix prior and proof. Robust proof with a excessive chance ratio shifts your perception considerably. Ambiguous proof barely touches it. State the end result explicitly: “I now estimate a 35% probability this impact is actual, down from 60%.”

You don’t want precise numbers. Even tough classes (unlikely, believable, possible, near-certain) beat binary pondering (important vs. not important).

R: Rinse and Repeat

Your posterior immediately turns into tomorrow’s prior. Run a follow-up experiment. Examine a special information lower. Each bit of proof refines the image. The self-discipline: by no means throw away your collected information and begin from scratch with each new dataset.

From Spam Filters to Sunken Submarines

Bayesian reasoning isn’t only a pondering device. It runs in manufacturing programs processing billions of selections.

Spam filtering. In August 2002, Paul Graham printed “A Plan for Spam,” introducing Bayesian classification for electronic mail. The system assigned every phrase a chance of showing in spam versus professional mail (the chance), mixed it with the bottom price of spam (the prior), and computed a posterior for every message. Graham’s filter caught spam at a 99.5% price with zero false positives on his private corpus. Each main electronic mail supplier now makes use of some descendant of this method.

Hyperparameter tuning. Bayesian optimization has changed grid search at firms operating costly coaching jobs. As an alternative of exhaustively testing each setting mixture, it builds a probabilistic mannequin of which configurations will carry out effectively (the prior), evaluates probably the most promising candidate, observes the end result, and updates (posterior). Every iteration makes a wiser alternative. For a mannequin that takes hours to coach, this could lower tuning time from weeks to days.

Uncertainty quantification. Probabilistic programming frameworks like PyMC and Stan construct fashions that output full chance distributions as an alternative of single numbers. Relatively than “the coefficient is 0.42,” you get “the coefficient falls between 0.35 and 0.49 with 95% chance.” This can be a Bayesian credible interval. Not like a frequentist confidence interval, it truly means what most individuals assume a confidence interval means: there’s a 95% probability the true worth is in that vary.

However probably the most dramatic Bayesian success story includes a nuclear submarine on the backside of the Atlantic.

In Could 1968, the USS Scorpion didn’t arrive at its dwelling port in Norfolk, Virginia. Ninety-nine males aboard. The Navy knew the sub was someplace within the Atlantic, however the search space spanned 1000’s of sq. miles of deep ocean flooring.

Mathematician John Craven took a different approach than grid-searching the ocean. He assembled specialists and had them assign chances to 9 failure situations (hull implosion, torpedo malfunction, navigation error). He divided the search space into grid squares and assigned every a previous chance primarily based on the mixed estimates.

Then the search started. Each time a crew cleared a grid sq. and located nothing, Craven up to date the posteriors. Empty sq. 47? Chance mass shifted to the remaining squares. Every failed search was not a wasted effort. It was proof, systematically narrowing the chances.

Each grid sq. that turned up empty wasn’t a failure. It was information.

The strategy pinpointed the Scorpion inside 220 yards of the expected location, on the ocean flooring at 10,000 ft. The identical Bayesian search method later positioned a hydrogen bomb misplaced after a 1966 B-52 crash close to Palomares, Spain, and helped discover the wreckage of Air France Flight 447 within the deep Atlantic in 2011.

Return to the mammogram drawback for a second.

The rationale 82% of docs obtained it incorrect wasn’t arithmetic. It was that no one taught them to ask the one query that issues: how frequent is that this situation within the inhabitants being examined?

That query (the prior) is probably the most uncared for step in information interpretation. Skip it, and also you mistake a false alarm for a analysis, a loud experiment for an actual impact, a coincidence for a sample.

Each statistic you encounter this week is a mammogram end result. The headline claiming a drug “doubles your danger.” The A/B take a look at with p = 0.03. The efficiency evaluation primarily based on a single quarter of knowledge.

Each is proof. None is a conclusion.

The conclusion requires what you’ve at all times had: what you knew earlier than you noticed the quantity. Your statistics professor simply by no means gave you permission to make use of it.

References

Casscells, W., Schoenberger, A., & Graboy, T.B. (1978). “Interpretation by Physicians of Clinical Laboratory Results.” New England Journal of Medication, 299(18), 999-1001.
Gigerenzer, G. & Hoffrage, U. (1995). “How to Improve Bayesian Reasoning Without Instruction: Frequency Formats.” Psychological Overview, 102, 684-704.
American Statistical Affiliation (2016). “The ASA Statement on Statistical Significance and P-Values.” The American Statistician, 70(2), 129-133.
Amrhein, V., Greenland, S., & McShane, B. (2019). “Scientists Rise Up Against Statistical Significance.” Nature, 567, 305-307.
Open Science Collaboration (2015). “Estimating the Reproducibility of Psychological Science.” Science, 349(6251), aac4716.
Graham, P. (2002). “A Plan for Spam.”
Harrell, F. (2017). “My Journey from Frequentist to Bayesian Statistics.” Statistical Considering.
Clayton, A. (2021). Bernoulli’s Fallacy: Statistical Illogic and the Crisis of Modern Science. Columbia College Press.
Badenes-Ribera, L., et al. (2022). “Persistent Misconceptions About P-Values Among Academic Psychologists.” PMC.
Kalid Azad. “An Intuitive (and Short) Explanation of Bayes’ Theorem.” BetterExplained.
Wikipedia contributors. “Bayesian Search Theory.” Wikipedia.

Source link

Hallucinations in LLMs Are Not a Bug in the Data

Follow the AI Footpaths | Towards Data Science

How to Build a Production-Ready Claude Code Skill

Software Engineering in the LLM Era

Katie Spivakovsky wins 2026 Churchill Scholarship | MIT News

The Complete Guide to NetSuite SuiteScript

Mastering NLP with spaCY — Part 1 | Towards Data Science

How It Works, Benefits & Real-World Examples

Most Popular

What Mercor’s $10B Valuation Could Mean for the Future of Work

An Intuitive Guide to MCMC (Part I): The Metropolis-Hastings Algorithm

OpenAI Is Now a For-Profit Company, Paving the Way for a Possible $1 Trillion IPO

Our Picks