\(\DeclarePairedDelimiterX{\Set}[2]{\{}{\}}{#1 \nonscript\;\delimsize\vert\nonscript\; #2}\) \( \DeclarePairedDelimiter{\set}{\{}{\}}\) \( \DeclarePairedDelimiter{\parens}{\left(}{\right)}\) \(\DeclarePairedDelimiterX{\innerproduct}[1]{\langle}{\rangle}{#1}\) \(\newcommand{\ip}[1]{\innerproduct{#1}}\) \(\newcommand{\bmat}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\barray}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\mat}[1]{\begin{matrix}#1\end{matrix}}\) \(\newcommand{\pmat}[1]{\begin{pmatrix}#1\end{pmatrix}}\) \(\newcommand{\mathword}[1]{\mathop{\textup{#1}}}\)
Needs:
Uncertain Events
Outcome Probabilities
Size of Direct Products
Needed by:
Birthday Probabilities
Conditional Event Probabilities
Generalized Inclusion-Exclusion Formula
Outcome Variables
Probability Measures
Links:
Sheet PDF
Graph PDF
Wikipedia

Event Probabilities

Why

Since one and only one outcome occurs, given a distribution on outcomes, we define the probability of a set of outcomes as the sum of their probabilities.

Definition

Suppose $p$ is a distribution on a finite set of outcomes $\Omega $. Given an event $E \subset \Omega $, the probability (or chance) of $E$ under $p$ is the sum of the probabilities of the outcomes in $E$. The frequentist interpretation is clear—the probability of an event is the proportion of times any of its outcomes will occur in the long run.

Notation

It is common to define a function $P: \powerset{\Omega } \to \R $ by

\[ P(A) = \sum_{a \in A} p(a) \quad \text{for all } A \subset \Omega \]

We call this function $P$ the event probability function (or the probability measure) associated with $p$. Since it depends on the sample space $\Omega $ and the distribution $p$, we occasionally denote this dependence by $P_{\Omega , p}$ or $P_p$.

It is tempting, and therefore common to write $P(\omega )$ when $\omega \in \Omega $ and one intends to denote $P(\set{\omega })$, which is just $p(\omega )$. It is therefore easy to see that from $P$ we can compute $p$, and vice versa.

Examples

Rolling a die. We consider the usual model of rolling a fair die (see Outcome Probabilities). So we have $\Omega = \set{1, \dots , 6}$ and $p: \Omega \to [0,1]$ defined by

\[ p(\omega ) = 1/6 \quad \text{for all } \omega \in \Omega \]

Given the model, the probability of the event $E = \set{2, 4, 6}$ is

\[ \textstyle P(E) = \sum_{\omega \in E} p(\omega ) = p(2) + p(4) + p(6) = 1/2. \]

Rolling two dice. We consider the usual model of rolling two die at once (see Outcome Probabilities). We take $\Omega = \set{(1,1), (1,2) \dots , (6,5), (6,6)}$ In other words, $\Omega $ is $\set{1,2,3,4,5,6}^2$. Suppose we model a distribution on outcomes $p: \Omega \to [0,1]$ by defining $p(\omega ) = 1/36$ for each $\omega \in \Omega $. We use the set $A = \set{(1,4), (2,3), (3, 2), (4,1)}$ for the event corresponding to the statement that the sum of the two numbers is 5. In other words,

\[ A = \Set{(\omega _1, \omega _2) \in \Omega }{ \omega _1 + \omega _2 = 5} \]

The probability of $A$ is

\[ P(A) = p((1,4)) + p((2,3)) + p((3,2)) + p((4,1)) = 4/36 = 1/9. \]

Suppose we modify the statement so that $B = \Set{(\omega _1,\omega _2) \in \Omega }{\omega _1 + \omega _2 = 12}$. We have $P(B) = 1/36$. So we have modeled that the sum of the number of the pips on the two die being 12 as less probable than the event that the sum of the number of pips being 5.

Flipping a coin three times. We model flipping a coin three times with the outcome space $\Omega = \set{0,1}^3$. We interpret $(\omega _1, \omega _2, \omega _3) \in \Omega $ so that $\omega _1$ is the outcome of the first flip—heads is 1 and tails is 0. Suppose we model each outcome as equally probable, and so put a distribution $p: \Omega \to [0,1]$ on $\Omega $ satisfying $p(\omega ) = 1/8$ for every $\omega \in \Omega $. We want to consider all outcomes in which we see two heads. Our model is the event $A \subset \Omega $ defined by

\[ A = \set{(1,1,1), (1,1,0), (1,0,1), (0,1,1)} \]

Under our chosen distribution, $P(A) = 1/2$.

Flipping a coin $n$ times. We model flipping a coin $n$ times with a sample space $\Omega = \{0,1\}^n$. Here, we agree to interpret $(\omega _1, \dots , \omega _n) \in \Omega $ so that $\omega _i$ is 1 if the coin lands heads on the $i$th toss and $0$ if it lands tails; $i = 1, \dots , n$. The size of $\Omega $ is $2^n$, since $\num{\{0,1\}} = 2$. Suppose we choose a distribution $p: \Omega \to [0,1]$ so that

\[ p(\omega ) = \frac{1}{2^n} \]

Now consider the event $H_k$ defined by

\[ H_k = \Set{\omega \in \Omega }{\num{\Set{i}{\omega _i = 1} = 1} = k}. \]

so that it contains all outcomes having a total of $k$ heads.
Then

\[ P(H_k) = \frac{\num{H_k}}{2^n} = \frac{{n \choose k}}{2^n} \]

Properties of event probabilities

The properties of $p$ ensure that $P$ satisfies

  1. $P(A) \geq 0$ for all $A \subset \Omega $;
  2. $P(\Omega ) = 1$ (and $P(\varnothing) = 0$);
  3. $P(A) + P(B)$ for all $A, B \subset \Omega $ and $A \cap B = \varnothing$.
The last statement (3) follows from the more general identity—known as the inclusion-exclusion formula

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

for $A, B \subset \Omega $, by using $\mathbfsf{P} (\varnothing) = 0$ of (2) above.
These three conditions are sometimes called the axioms of probability for finite sets.

Do all such $P$ satisfying (1)-(3) have a corresponding underlying probability distribution? The answer is easily seen to be yes. Suppose $f: \powerset{\Omega } \to \R $ satisfies (1)-(3). Define $q: \Omega \to \R $ by $q(\omega ) = f(\set{\omega })$. If $f$ satisfies the axioms, then $q$ is a probability distribution. For this reason we call any function satisfying (i)-(iii) an event probability function (or a (finite) probability measure).

Other basic consequences

Disjoint events. Two events $A$ and $B$ are disjoint or mutually exclusive if $A \cap B = \varnothing$. Likwise, a list of events $A_1, \dots , A_n$ are disjoint or mutually exclusive if $A_i \cap A_j = \varnothing$ for all $i \neq j$, $i,j \in \set{1, \dots , n}$. A direct consequence of (3) above is

\[ \textstyle P(\cup_{i =1 }^{n} A_i) = \sum_{i = 1}^{n} P(A_i) \]

Probability by cases. Suppose $A_1, \dots , A_n$ partition $\Omega $. Then for any $B \subset \Omega $,

\[ \textstyle P(B) = \sum_{i = 1}^{n} P(A_i \cap B). \]

Some authors call this the law of total probability. This is easy to see by using the distributive laws of set algebra (see Set Unions and Intersections). A simple consequence is that for any $A$, $B$

\[ P(B) = P(B \cap A) + P(B \cap (\Omega - A)) \]

since $A, \Omega - A$ partition $\Omega $.

Monotonicity. If $A \subseteq B$, then $P(A) \leq P(B)$. This is easy to see by splitting $B$ into $A \cap B$ and $B - A$, and applying (1) and (3).

Subadditivity. For $A, B \subset \Omega $, $P(A \cup B) \leq P(A) + P(B)$. This is easy to see from the more general identity in (3) above. This is sometimes referred to as a union bound, in reference to bounding the quantity $P(A \cup B)$.

Copyright © 2023 The Bourbaki Authors — All rights reserved — Version 13a6779cc About Show the old page view