The mutual information of a joint distribution over two random variables is the entropy of the product of the marginal distributions relative to the joint distribution.
Let $A$ and $B$ be two non-empty sets. Let $p_{12}: A \times B \to \R $ be a distribution with marginal distributions $p_1: A \to \R $ and $p_2: B \to \R $. The mutual information of $p$ is $d(p, p_1p_2)$ where $d$ denotes the relative entropy.