We want to talk about how knowledge of one aspect of an outcome can give us knowledge about another aspect.
Suppose $\Omega $ is a finite set of outcomes
with event probability function $P$.
Two uncertain events $A$ and $B$ with $P(A),
P(B) > 0$ are
dependent under $P$ if
\[
P(A \mid B) \neq P(A) \quad \text{ or } \quad P(B \mid
A) \neq P(B)
\] \[
\frac{P(A \cap B)}{P(B)} \neq P(A) \quad \text{ or } \quad
\frac{P(A \cap B)}{P(A)} \neq P(B)
\]
Two events $A$ and $B$ are
independent under $P$ if
\[
P(A \cap B) = P(A)P(B).
\]
As we have seen, in the case that $P(B) \neq 0$, $P(A \cap B) = P(A)P(B)$ is equivalent to $P(A \mid B) = P(A)$. Roughly speaking, we interpret this second expression as encoding the fact that the occurence of event $B$ does not change the probability—intuitively, the “credibility”—of the event $A$.
Two coin tosses.
As usual, model flipping a coin twice with the
sample space $\Omega = \set{0,1}^2$.
Define $A = \set{(1,0), (1, 1)}$, the event
that the first tops is heads, and $B =
\set{(0,1), (1,1)}$, the event that the second
toss turns up head.
Then $A \cap B$ is $\set{(1,1)}$, the event
both tosses turn up heads.
Suppose we put a distribution on $\Omega $ as
usual with $p(\omega ) = 1/4$ for all $\omega
\in \Omega $.
Then $P(A) = 1/2$, $P(B) = 1/2$, and $P(A
\cap B) = 1/4$.
Thus:
\[
P(A \cap B) = P(A)P(B)
\]
Three tosses. As usual, model flipping a coin three times with the sample space $\Omega = \set{0,1}^3$ and define $p: \Omega \to [0,1]$ by $p(\omega ) = 1/8$ for all $\omega \in \Omega $. Let $A$ be the event $\set{(1, 1, 0), (1,1,1)}$, the event that the first two tosses turn up heads, and $B = \set{(1, 0, 0), (0, 0, 0)}$, the event the second two tosses are tails. Then $P(A) = P(B) = 2/8 = 1/4$, but $A \cap B = \varnothing$. So $P(A)P(B) = 1/16 \neq 0 P(A \cap B)$. These are dependent events under the model.
Rolling two dice. As usual, model rolling two dice with the sample space $\Omega = \Set{(\omega _1, \omega _2)}{ \omega _i \in \set{1, \dots , 6}}$. Define a distribution $p: \Omega \to \R $ by $p(\omega ) = 1/36$. Two events are $A = \Set{\omega \in \Omega }{ \omega _1 + \omega _2 > 5}$, “the sum is greater than 5”, and $B = \Set{\omega \in \Omega }{\omega _1 > 3}$, “the number of pips on the first die is greater than 3”. Then $P(A) = 26/36$. Also, $P(A \mid B) = 17/16$. So, these events are dependent. Roughly speaking, we say that knowing $B$ tells us something about $A$. In this case, we say that it “makes $A$ more probable.” In the language used to describe the events, we say that knowledge that the number of pips on the first die is greater than three makes its more probable that the sum of the number of pips on each die is greater than 5.
Suppose $P$ is a probability measure on a finite sample space $\Omega $. It happens that $A$ and $B$’s independence under $P$ is sufficient for the independence of $A^c$ and $B$, $A^c$ and $B^c$ and $A$ and $B^c$. Here $A^c$ and $B^c$ denote the relative complement of $A$ and $B$ in $\Omega $, respectively.
There are a few ways to see these—here’s one
\[
\begin{aligned}
P(A^c \cap B)
&= P(B) - P(A \cap B)\\
&= P(B) - P(A)P(B)\\
&= (1- P(A))P(B) = P(A^c)P(B).
\end{aligned}
\]
Here are alternative routes, using the more
explicit notation of relative complements.
To this, we note that since $P(\cdot \mid B)$
is a probability measure, and the events $A$
and $\relcomplement{A}{\Omega }$ partition
$\Omega $, we have
\[
P(A \mid B) + P(\relcomplement{A}{\Omega } \mid B) = 1.
\] \[
P(A \cup B) + P(\relcomplement{A}{\Omega } \cap
\relcomplement{B}{\Omega }) = 1.
\] \[
P(\relcomplement{A}{\Omega } \cap \relcomplement{B}{\Omega }) = 1
- P(A) - P(B) - P(A)P(B).
\]