A set of outcomes may be finite or infinite.
For now, we consider finite sample spaces.
To talk about the uncertain outcomes, we assign
credibility to each outcome according to our
intuition of proportion.^{1}

Suppose $\Omega $ is a finite set.
A function $p: \Omega \to \R $ *on*
$\Omega $ is a probability
distribution *on* $\Omega $ if it is
nonnegative (i.e., $p(\omega ) \geq 0$ for every
$\omega \in \Omega $) and

\[ \textstyle \sum_{\omega \in \Omega } p(\omega ) = 1 \]

The probabilityThere are two usual meanings of the word “probability”. The first, is its intuitive interpretation as frequency—the fraction of times that an outcome $\omega $ will occur if we are able to repeat the scenario producing the outcomes many times. This is the so-called frequentist viewpoint.

The trouble is that some scenarios are not
“repeatable” (e.g., whether it will rain or not
*tomorrow*).
Thus, it is sometimes natural to think of
probabilities as beliefs
or degrees of belief
which are updated according to particular rules.^{2}
This is the so-called Bayesian
viewpoint.

This second interpretation matches the English etymology: the word probabiliy has its roots in the English word probable, which has the Middle English sense “worthy of belief”. The probability of an outcome models how worthy of belief it is, relative to other outcomes. In the case of flipping a coin, or rolling a die, we may assert that all outcomes are equally worthy of belief.

If a first outcome has a larger probability than a second outcome, we call the first more probable (or more likely) than the second. Similarly, we call the second outcome less probable (or less likely) than the first outcome.

*Probabilities for flipping a coin.*
Suppose we model flipping a coin, as before, with the sample space $\set{0,1}$.
We may model both heads and tails as equally
worthy of belief.
Thus we would like to pick two nonnegative
numbers $p(1)$ and $p(2)$ so that they are
non-negative and $p(1) + p(2) = 1$.
Consequently, we define $p(0) = p(1) = 1/2 $.
We often refer to this particular model as a
fair coin.
Neither heads nor tails is *more* or
*less* probable.

*Probabilities for rolling a die.*
Suppose we model rolling a die, as before, with the sample space $\set{1, 2, 3, 4, 5,
6}$.
We may model each side of the die as equally
likely to face up.
Thus we want numbers $p(1), p(2), p(3), p(4),
p(5), p(6)$ so that

\[ p(1) = p(2) = p(3) = p(4) = p(5) = p(6) \]

Consequently, we choose $p(\omega ) = 1/6$ for each $\omega \in \Omega $. A Bayesian interpretation is that, prior to the roll, each outcome is
Other terminology for probability distribution
includes distribution,
probability mass function,
pmf,
proportion distribution,
and probabilities.^{3}

Suppose $p: \Omega \to \R $ is a distribution.
Then $p(\omega ) \leq 1$ for all $\omega \in
\Omega $.

Let $\omega \in \Omega $.
We claim

\[ \textstyle p(\omega ) \leq \sum_{t \in \Omega } p(t) = 1 \]

This holds because $p(t) \geq 0$ for every $t \in \Omega $.Consequently, the range of $p$ is contained in $[0,1]$. For this reason, we often introduce a distribution on the finite sample space $\Omega $ with the notation $p: \Omega \to [0,1]$, to remind ourselves that $\range(p) \subset [0,1]$.

- Future editions may drop the dependence on real
numbers, and use intuition of repeated trials to
introduce
*rational*probability distributions. ↩︎ - Future editions may elaborate on the justification for these rules, according to Keynes and Jaynes. ↩︎
- Many authors reserve the term
*probability mass function*for the case in which $\Omega = \R $. ↩︎