Skip to content

The Probability Sense – Does the Brain Compute Probabilities?

Psychophysics shows neat successes in cue combination and movement control. Behavioural records also show base-rate neglect and step-hold updates. We map the contradiction and set out tests that could link beliefs to neural mechanisms without stretching a modelling language.

Silhouette walking through an abstract tunnel of probability and neural data

Before it was a number, ‘probability’ meant something a magistrate could live with. From the Latin probabilis, a claim was ‘approvable’ because a credible witness backed it. That old meaning, tied to judgement rather than arithmetic, sits under a modern puzzle: neuroscientists test for a ‘probability sense’ with lab tasks defined by objective frequencies, then explain the results with a theory about subjective belief. The two do not line up cleanly.

When We Talk About Probability

Start with the name. ‘Probability’ did not begin as maths. In early legal and philosophical use, it meant an opinion worthy of approval. A judge looked at a speaker’s standing and the quality of the testimony. That made the claim ‘probable’ in the everyday sense, plausible to accept.

Aristotle, the Greek philosopher, mapped the territory by separating what is certain, what is probable, and what lies in chance. For him, the probable lived in rhetoric and practical judgement. No tables of odds. No long-run frequencies. No formal calculation at all.

That origin matters because it set up a split that never fully closed.

Much later, when mathematicians built a clean formalism, they gave the field rules for how probabilities behave, but stayed quiet on what a probability is. Philosophers kept arguing over that point. Neuroscience, which borrows the maths, often treats the argument as settled when it is not.

Working definition for this article

Where needed, we will use ‘frequentist’ to mean probability as a property of the world measured by long-run relative frequency, and ‘Bayesian’ to mean probability as a degree of belief that an agent updates with evidence. Both definitions are live in the literature and both appear in the experiments we cover.

The Original Meaning of ‘Probable’

  • Etymology: From the Latin probabilis.
  • Original Meaning: ‘Approvable’ or ‘plausible’; a claim backed by a credible witness, not by mathematical frequency.
  • Implication: Probability began as a measure of the authority behind testimony, not an objective feature of the world.

The Split Personality of a Number

The familiar story starts in 1654, when Blaise Pascal and Pierre de Fermat, French mathematicians, answered a gambler’s question about a game that stopped midway. That exchange helped push probability from talk into arithmetic.

Within decades, Christiaan Huygens, a Dutch physicist, wrote the first textbook. In 1713, Jakob Bernoulli, a Swiss mathematician, formalised the law of large numbers, which anchors the frequentist view… Repeat an experiment enough, and the relative frequency will settle near a stable value.

A second track began when Thomas Bayes, an English clergyman-mathematician, and later Pierre-Simon Laplace, a French mathematician, treated probability as a belief that can be revised with new data. Bayes’ rule is the short recipe. Take a prior belief about how likely a claim is, multiply by the strength of the new evidence, and renormalise to get a posterior belief. In plain terms, start with a view, weigh what arrives, and move accordingly.

The unifying mathematical language came with Andrey Kolmogorov’s axioms in the 1930s, the Soviet mathematician’s rules for how probabilities must add and normalise. They do not tell us which interpretation is correct. So the field sits with two live meanings under one formal roof.

That unresolved split is not a footnote. Modern experiments mix the two levels all the time. A display might contain dots that move in a coherent direction 60 per cent of the time, which is an objective frequency in the stimulus, while the model used to read the subject’s responses is written in the language of subjective belief. The mapping between the two is not automatic.

Two Meanings of Probability

Evidence Axis Frequentist View Bayesian View
Core Definition Probability is an objective property of the world, measured by the long-run relative frequency of an event. Probability is a subjective degree of belief an agent holds about a claim. It is a property of the observer.
What It Measures The stable outcome of a repeatable physical process, like flipping a coin many times. An agent’s confidence in a hypothesis, which is updated as new evidence arrives.
Key Question ‘How often does this event happen in the long run?’. ‘How strongly should I believe this is true, given the available evidence?’.
Use in Neuroscience Used to define the objective statistics of a stimulus, such as the percentage of dots moving coherently in a display. The dominant language used to model the brain’s internal processes of belief updating and inference.

Neuroscience experiments often test the brain’s supposed Bayesian mechanisms using stimuli defined by objective frequencies, creating a conceptual tension.

The Brain as a Calculating Machine

The idea that perception amounts to inference predates the current theory by a century and a half. Hermann von Helmholtz, a German physician-physicist, argued that the brain performs ‘unconscious inference’. It infers the likely causes of its sensations from incomplete and noisy signals.

The modern ‘Bayesian Brain’ hypothesis is the idea that the brain represents uncertainty explicitly and updates it with something like Bayes’ rule. In this framing, the brain does not just store a single best guess, it keeps a distribution over possibilities and shifts that distribution as new signals arrive.

A plausible mechanism was proposed by Alexandre Pouget, Wei Ji Ma, and Jeffrey Beck. In their ‘probabilistic population codes’ (PPCs), neural variability is treated as signal. A population of neurons represents a probability distribution. The peak marks the best estimate, while the overall activity level tracks certainty.

In probabilistic population codes, probabilities are represented in the log domain. Because synapses sum inputs, a pool of neurons can add log likelihoods from different cues. In ordinary space, that addition corresponds to multiplying likelihoods, which is the step required by Bayes’ rule. The combination that looks like heavy maths is implemented as a weighted neural sum.

The grandest extension of this family is Karl Friston’s free energy principle: in compact terms, a living system minimises the gap between what it expects and what it sees, or else fail to keep itself within the bounds that define it as that system. In cognitive terms, the brain reduces ‘surprise’ by updating its internal model of the world and by acting to sample data that fit its expectations. The principle is deliberately general. That breadth is its attraction and its weak point.

The Path to a Probabilistic Brain

Step 1: The Intellectual Ancestor

In the 1860s, Hermann von Helmholtz proposes that perception is a process of ‘unconscious inference’, where the brain infers the most likely cause of its noisy sensory data. This is the conceptual origin of the modern theory.

Step 2: The Formal Hypothesis

In the late 20th century, this idea is formalised as the ‘Bayesian Brain’ hypothesis. The central claim is that the brain explicitly represents information as probability distributions and updates them using rules that approximate Bayesian inference.

Step 3: A Plausible Mechanism

To make the theory biologically feasible, researchers including Pouget, Ma, and Beck propose Probabilistic Population Codes (PPCs). This model shows how a group of neurons can collectively represent a probability distribution and perform Bayesian calculations through simple linear summation.

The Cognitive Bias Machine

Many Bayesian models call the brain ‘near-optimal’: given noisy signals, behaviour should match what an ideal observer would do, an idealised decision-maker that uses all available information without noise. Psychology documents the opposite in many tasks.

Two standard examples make the contrast clear.

Conservatism: People often underweight new evidence relative to Bayes’ rule. They update, but not enough. Present a prior, present strong new data, and many subjects move less than the normative calculation predicts. This happens reliably across tasks that ask for probability judgements.

Base-rate neglect: When offered a vivid description of a case and a known base rate for the class, people tend to follow the description and ignore the base rate. Again, that is a direct breach of the normative rule. The prior matters in Bayes’ rule. In these tasks, many subjects act as if it does not.

These are not random fumbles. They are patterned failures. If the brain were tuned for near-optimal probabilistic inference, we would not expect such stable deviations in so many domains. That mismatch is the first major crack in the ‘probability sense’ as a literal engine.

“These biases are not random noise but predictable patterns of sub-optimal performance, which stand as a direct empirical challenge to the strong optimality claims of many Bayesian models.”

A Disagreement on How the Sums Are Done

Most standard models assume that belief updates are continuous, or at least occur trial by trial. Evidence is supposed to accumulate smoothly towards a new estimate. Even when a task is noisy, the internal variable is assumed to drift and settle without long periods of flatlining.

C. Randy Gallistel, an American psychologist-neuroscientist and his colleagues report something else. In tasks where the underlying probability changes over time, subjects often hold one estimate steady through many trials, then jump to a new value in a single step. This step-hold pattern points to a different computation. Rather than adding a small increment on every trial, the brain may be watching for a change-point and only then rewriting its estimate. That is not a small tweak to a continuous picture. It is a different mechanism.

The implications are practical. A change-point detector can be frugal and robust in environments where most short-run fluctuations are noise. Continuous integration can smooth noise, but it can also lag behind genuine shifts. Which mechanism dominates may depend on the task structure and on the training history of the subject.

The ‘Just-So Story’ Problem

The Bayesian Brain wears two hats. Sometimes it is sold as a literal mechanism, neurons encode priors and likelihoods and compute posteriors. Sometimes it is framed as a convenient description at a high level. The brain behaves as if it were doing Bayesian inference, whether or not any circuit implements those variables in that way.

That flip has costs.

A theory that can be flexed to fit any pattern of results by altering invisible priors, likelihoods, or utilities is hard to falsify. The danger is post-hoc rescue. If an experiment finds a bias, the model adds a different prior. If an update looks too slow, the model adds a cost of change. The fit improves, but the core claim remains untouched. Critics call this a ‘just-so story’ problem.

There is also the biological bill. Exact Bayesian inference scales badly with problem size. Real neurons have non-linear dynamics and noise that is not always well behaved. Approximate schemes can be efficient, and the PPC story shows how linear operations might get you far, but the more complex the model, the more assumptions it needs about noise, connectivity and coding. Those assumptions are testable. Many remain untested.

The ‘Just-So Story’ Problem

  • The Problem: When experimental data does not fit the Bayesian Brain model, the core theory is rarely questioned or rejected.
  • The Tactic: Instead, researchers can adjust the model’s unobservable assumptions, such as the ‘prior belief’, until it explains the unexpected result after the fact.
  • The Consequence: This flexibility makes the hypothesis difficult to falsify. Critics argue it risks becoming a story that can be adapted to fit any behaviour, rather than a predictive scientific claim.

Four Competing Theories on the Record

The conflicting evidence and deep-seated contradictions do not allow for a single conclusion. The record supports at least four distinct and plausible working theories.

Theory 1: The Literal Engine

This is the strongest interpretation. It suggests that the ‘probability sense’ is real and the Bayesian Brain hypothesis is a literal description of a neural mechanism. The brain is a probabilistic inference engine. The best evidence for this view comes from sensory and motor tasks.

In experiments on multisensory integration, for example, people combine cues from vision and touch in a way that is statistically optimal, weighting each sense by its reliability. This is exactly what a Bayesian model predicts. This near-optimal performance, combined with the plausible neural implementation model of Probabilistic Population Codes, suggests the brain really is built to compute with probabilities. The main evidence against it is the mountain of data on cognitive biases, which show systematically sub-optimal reasoning.

Theory 2: The Heuristic Approximation

This theory offers a middle path. It suggests the brain does not run perfect Bayesian calculations. Instead, it uses a collection of simpler, computationally cheaper shortcuts, or heuristics, that produce ‘good enough’ answers.

In the stable and well-defined contexts of many lab experiments, the output of these heuristics might look very close to the Bayesian optimum. In more complex or novel situations, they break down and produce the systematic biases we see. This theory’s main strength is that it can account for both optimal-looking behaviour and systematic irrationality. Its main weakness is explaining why the output of simple shortcuts so often matches the precise quantitative predictions of the optimal models. It makes the near optimality seem like a remarkable coincidence.

Theory 3: The Emergent Property

This theory proposes there is no dedicated, specific ‘probability sense’ at all. What we are observing is an emergent property. It arises from the complex, real-time interaction of multiple general-purpose cognitive systems like attention, working memory, and sensory processing.

Neuroimaging studies show that decision-making under uncertainty activates a broad, distributed network of brain regions involved in many different functions, not a single ‘probability centre’. Just as our ‘number sense’ appears to emerge from interactions in a wider network, so too might our ability to handle uncertainty. The challenge for this view is the evidence for neural specialisation. If there are ‘number neurons’ that are specifically tuned to quantity, it is plausible that similar specialisation exists for representing probabilistic information.

Theory 4: The Descriptive Metaphor

This final theory reframes the question entirely. It argues that the Bayesian Brain is not a falsifiable, mechanistic hypothesis. It is a powerful normative metaphor and a formal language. Its scientific value is not that it is a literally true description of what is happening in our neurons. Its value is that it provides a rational benchmark, a description of what an ideal observer should do, which allows researchers to precisely measure and describe the performance of a biological system.

The very critiques of unfalsifiability that damage the literal interpretation actually support this one. A language does not need to be falsifiable, only useful. The main argument against this view is that the framework has been remarkably successful at predicting behaviour across a huge range of domains, suggesting it has captured a deep, unifying principle of brain function, not just a convenient way of speaking.

The evidence on the record does not resolve the conflict between these four views. It fuels it.

Four Competing Theories on the Record

The conflicting evidence does not point to one conclusion, but instead supports four distinct and plausible working theories.

Theory 1: The Literal Engine

Core Claim

The brain is a literal probabilistic inference machine. Neural circuits are structured to perform computations that approximate optimal Bayesian inference.

Key Evidence

Human performance in tasks like combining sensory cues is statistically near-optimal, exactly as a Bayesian model predicts. The theory is supported by a plausible neural implementation model (Probabilistic Population Codes).

Theory 2: The Heuristic Approximation

Core Claim

The brain uses a collection of simple, computationally cheap shortcuts (heuristics) to produce ‘good enough’ answers. In some lab contexts, this behaviour can look near-optimal.

Key Evidence

This theory’s main strength is its ability to account for the wide range of documented cognitive biases that show systematically sub-optimal reasoning. This is more evolutionarily plausible than a single, complex inference engine.

Theory 3: The Emergent Property

Core Claim

There is no dedicated ‘probability sense’. Instead, probabilistic reasoning is an emergent property arising from the interaction of general-purpose systems like attention, working memory, and sensory processing.

Key Evidence

Neuroimaging shows that decision-making under uncertainty activates a broad, distributed network of brain regions, not a single ‘probability centre’.

Theory 4: The Descriptive Metaphor

Core Claim [

The Bayesian Brain is not a falsifiable, mechanistic hypothesis. It is a powerful normative metaphor and a formal language used to describe what an ideal observer should do.

Key Evidence

The critiques that the framework is unfalsifiable and can be adjusted to fit any data support this view. A language does not need to be falsifiable, only useful.

Sources

Sources include: classical philosophical texts such as Aristotle’s Rhetoric and Posterior Analytics; historical works on the origins of probability including the Pascal–Fermat correspondence (1654), Jacob Bernoulli’s Ars Conjectandi (1713), and Thomas Bayes’ posthumous essay (1763); Andrey Kolmogorov’s Foundations of the Theory of Probability (1933); Hermann von Helmholtz’s 19th-century writings on perception as unconscious inference; modern neuroscience and computational modelling research on the Bayesian Brain, including the work of Alexandre Pouget, Wei Ji Ma, Jeffrey Beck, and Karl Friston’s Free Energy Principle; psychological literature on cognitive biases, especially studies of conservatism and base-rate neglect; and experimental research by C. R. Gallistel and colleagues on step-like belief updating. The investigation also draws on secondary analyses and reviews in cognitive science and philosophy of mind addressing the debate over whether the Bayesian Brain is a literal mechanism, a heuristic, an emergent property, or a descriptive metaphor.

What we still do not know

  • Do neural populations encode full probability distributions or only summary statistics such as a mean and a measure of spread? A clear dissociation task is needed with matched means and variances, but different skew or kurtosis.
  • When task statistics change, is belief updating genuinely step-like, or are our measurements too coarse to see short continuous drifts? High-temporal-resolution recordings could settle this.
  • Can we isolate a physical substrate for a ‘prior’ and manipulate it causally without disrupting the processing of incoming evidence? Training plus targeted stimulation would be one route.
  • In subjects who are near-optimal in cue combination but show base-rate neglect, do the same networks light up, or do biases signal a shift to a different control regime?
  • Which parts of the Bayesian Brain framework make predictions risky enough to fail clearly if they are wrong? Until we have more failures, the theory will continue to look too flexible.

Similar Topics

Comments (1)

  1. I honestly love this. I am a math and stats major and love knowing that my intuition and stubbornness is why I’m here. And I feel like in learning math I’m just learning a new way to explain what I’ve always known. It’s interesting too because in stats you have random error. I think when it comes to using my intuition probability, I just increase the random error and mental confidence interval. Bayesian stats is really interesting and connects to having prior beliefs (experiences) and getting new information and updating your beliefs. Its really a natural scaffolding process of our brain and learning!!

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top