A set of outcomes may be finite or infinite. For now, we consider finite sample spaces. To talk about the uncertain outcomes, we assign credibility to each outcome according to our intuition of proportion.1
Suppose $\Omega $ is a finite set.
A function $p: \Omega \to \R $ on
$\Omega $ is a probability
distribution on $\Omega $ if it is
nonnegative (i.e., $p(\omega ) \geq 0$ for every
$\omega \in \Omega $) and
\[
\textstyle
\sum_{\omega \in \Omega } p(\omega ) = 1
\]
There are two usual meanings of the word “probability”. The first, is its intuitive interpretation as frequency—the fraction of times that an outcome $\omega $ will occur if we are able to repeat the scenario producing the outcomes many times. This is the so-called frequentist viewpoint.
The trouble is that some scenarios are not “repeatable” (e.g., whether it will rain or not tomorrow). Thus, it is sometimes natural to think of probabilities as beliefs or degrees of belief which are updated according to particular rules.2 This is the so-called Bayesian viewpoint.
This second interpretation matches the English etymology: the word probabiliy has its roots in the English word probable, which has the Middle English sense “worthy of belief”. The probability of an outcome models how worthy of belief it is, relative to other outcomes. In the case of flipping a coin, or rolling a die, we may assert that all outcomes are equally worthy of belief.
If a first outcome has a larger probability than a second outcome, we call the first more probable (or more likely) than the second. Similarly, we call the second outcome less probable (or less likely) than the first outcome.
Probabilities for flipping a coin. Suppose we model flipping a coin, as before, with the sample space $\set{0,1}$. We may model both heads and tails as equally worthy of belief. Thus we would like to pick two nonnegative numbers $p(1)$ and $p(2)$ so that they are non-negative and $p(1) + p(2) = 1$. Consequently, we define $p(0) = p(1) = 1/2 $. We often refer to this particular model as a fair coin. Neither heads nor tails is more or less probable.
Probabilities for rolling a die.
Suppose we model rolling a die, as before, with the sample space $\set{1, 2, 3, 4, 5,
6}$.
We may model each side of the die as equally
likely to face up.
Thus we want numbers $p(1), p(2), p(3), p(4),
p(5), p(6)$ so that
\[
p(1) = p(2) = p(3) = p(4) = p(5) = p(6)
\]
Other terminology for probability distribution includes distribution, probability mass function, pmf, proportion distribution, and probabilities.3
\[ \textstyle p(\omega ) \leq \sum_{t \in \Omega } p(t) = 1 \]
This holds because $p(t) \geq 0$ for every $t \in \Omega $.Consequently, the range of $p$ is contained in $[0,1]$. For this reason, we often introduce a distribution on the finite sample space $\Omega $ with the notation $p: \Omega \to [0,1]$, to remind ourselves that $\range(p) \subset [0,1]$.