Since one and only one outcome occurs, given a distribution on outcomes, we define the probability of a set of outcomes as the sum of their probabilities.
Suppose $p$ is a distribution on a finite set of outcomes $\Omega $. Given an event $E \subset \Omega $, the probability (or chance) of $E$ under $p$ is the sum of the probabilities of the outcomes in $E$. The frequentist interpretation is clear—the probability of an event is the proportion of times any of its outcomes will occur in the long run.
It is common to define a function $P:
\powerset{\Omega } \to \R $ by
\[
P(A) = \sum_{a \in A} p(a) \quad \text{for all } A \subset
\Omega
\]
It is tempting, and therefore common to write $P(\omega )$ when $\omega \in \Omega $ and one intends to denote $P(\set{\omega })$, which is just $p(\omega )$. It is therefore easy to see that from $P$ we can compute $p$, and vice versa.
Rolling a die.
We consider the usual model of rolling a fair
die (see Outcome Probabilities).
So we have $\Omega = \set{1, \dots , 6}$ and
$p: \Omega \to [0,1]$ defined by
\[
p(\omega ) = 1/6 \quad \text{for all } \omega \in \Omega
\] \[
\textstyle
P(E) = \sum_{\omega \in E} p(\omega ) = p(2) + p(4) + p(6)
= 1/2.
\]
Rolling two dice.
We consider the usual model of rolling two die
at once (see Outcome Probabilities).
We take $\Omega = \set{(1,1), (1,2) \dots ,
(6,5), (6,6)}$
In other words, $\Omega $ is
$\set{1,2,3,4,5,6}^2$.
Suppose we model a distribution on outcomes $p:
\Omega \to [0,1]$ by defining $p(\omega ) =
1/36$ for each $\omega \in \Omega $.
We use the set $A = \set{(1,4), (2,3), (3,
2), (4,1)}$ for the event corresponding to the
statement that the sum of the two numbers is 5.
In other words,
\[
A = \Set{(\omega _1, \omega _2) \in \Omega }{ \omega _1 +
\omega _2 = 5}
\] \[
P(A) = p((1,4)) + p((2,3)) + p((3,2)) + p((4,1)) = 4/36 =
1/9.
\]
Flipping a coin three times.
We model flipping a coin three times with the
outcome space $\Omega = \set{0,1}^3$.
We interpret $(\omega _1, \omega _2, \omega _3)
\in \Omega $ so that $\omega _1$ is the outcome
of the first flip—heads is 1 and tails is 0.
Suppose we model each outcome as equally
probable, and so put a distribution $p: \Omega
\to [0,1]$ on $\Omega $ satisfying $p(\omega ) =
1/8$ for every $\omega \in \Omega $.
We want to consider all outcomes in which we
see two heads.
Our model is the event $A \subset \Omega $
defined by
\[
A = \set{(1,1,1), (1,1,0), (1,0,1), (0,1,1)}
\]
Flipping a coin $n$ times.
We model flipping a coin $n$ times with a
sample space $\Omega = \{0,1\}^n$.
Here, we agree to interpret $(\omega _1, \dots ,
\omega _n) \in \Omega $ so that $\omega _i$ is 1
if the coin lands heads on the $i$th toss and
$0$ if it lands tails; $i = 1, \dots , n$.
The size of $\Omega $ is $2^n$, since
$\num{\{0,1\}} = 2$.
Suppose we choose a distribution $p: \Omega
\to [0,1]$ so that
\[
p(\omega ) = \frac{1}{2^n}
\] \[
H_k = \Set{\omega \in \Omega }{\num{\Set{i}{\omega _i = 1} =
1} = k}.
\] \[
P(H_k) = \frac{\num{H_k}}{2^n} = \frac{{n \choose k}}{2^n}
\]
The properties of $p$ ensure that $P$ satisfies
\[
P(A \cup B) = P(A) + P(B) - P(A \cap B)
\]
The last statement (3) follows from the more
general identity—known as the
inclusion-exclusion formula—
Do all such $P$ satisfying (1)-(3) have a corresponding underlying probability distribution? The answer is easily seen to be yes. Suppose $f: \powerset{\Omega } \to \R $ satisfies (1)-(3). Define $q: \Omega \to \R $ by $q(\omega ) = f(\set{\omega })$. If $f$ satisfies the axioms, then $q$ is a probability distribution. For this reason we call any function satisfying (i)-(iii) an event probability function (or a (finite) probability measure).
Disjoint events.
Two events $A$ and $B$ are
disjoint or
mutually exclusive if $A
\cap B = \varnothing$.
Likwise, a list of events $A_1, \dots , A_n$
are disjoint or
mutually exclusive if $A_i
\cap A_j = \varnothing$ for all $i \neq j$,
$i,j \in \set{1, \dots , n}$.
A direct consequence of (3) above is
\[
\textstyle
P(\cup_{i =1 }^{n} A_i) = \sum_{i = 1}^{n} P(A_i)
\]
Probability by cases.
Suppose $A_1, \dots , A_n$ partition $\Omega $.
Then for any $B \subset \Omega $,
\[
\textstyle
P(B) = \sum_{i = 1}^{n} P(A_i \cap B).
\] \[
P(B) = P(B \cap A) + P(B \cap (\Omega - A))
\]
Monotonicity. If $A \subseteq B$, then $P(A) \leq P(B)$. This is easy to see by splitting $B$ into $A \cap B$ and $B - A$, and applying (1) and (3).
Subadditivity. For $A, B \subset \Omega $, $P(A \cup B) \leq P(A) + P(B)$. This is easy to see from the more general identity in (3) above. This is sometimes referred to as a union bound, in reference to bounding the quantity $P(A \cup B)$.