Which is the optimal tree to use for tree density approximation?
We want to choose the tree whose corresponding approximator for the given density achieves minimum relative entropy with the given density among all tree density approximators. We call such a density an optimal tree approximator of the given density. We call a tree according to which an optimal tree approximator factors and optimal approximator tree.
Let $g: \R ^n \to [0, 1]$ be a density. A tree $T$ on $\set{1, \dots , n}$ is an optimal approximator tree if and only if it is a maximal spanning tree of the differential mutual information graph of $q$.
First, denote the optimal approximator of $g$
for tree $T$ by $f^*_T$. Recall
\[
f^*_T = f_1 \prod_{i \neq 1} f_{i \mid \pa{i}}
\]
Second, recall $d(g, f) = H(g, f) - H(g)$. Since $H(g)$ does not depend on $f$, $f$ is a minimizer of $d(g, f)$ if and only if it is a minimizer of $H(g, f)$.
Third, express the cross entropy of $f^*_T$
relative to $g$ as
\[
\begin{aligned}
H(q, p^*_T) &= h(q_1) - \sum_{j \neq i} ( \int_{\R ^d}
g(x) \log g_{i \mid \pa_i}(x_i, x_{\pa_i}) dx ) \\
&= H(q_1) - \sum_{i \neq 1} \sum_{a \in A} q(a) (\log
q_{i,\pa_i}(a_i, a_{\pa_i}) - \log q_{\pa{i}}(a_{\pa_i})) \\
&= H(q_1) - \sum_{i \neq 1} \sum_{a \in A} q(a) (\log
q_{i,\pa_i}(a_i, a_{\pa_i}) - \log q_{\pa_i}(a_{\pa_i}) - \log
q_i(a_i) + \log q_{i}(a_i) ) \\
&= \sum_{i = 1}^{n} H(q_i) - \sum_{i \neq 1} I(q_i,
q_{\pa_i}) \\
&= \sum_{i = 1}^{n} H(q_i) - \sum_{\set{i,j} \in T} I(q_i,
q_{j}) \\
\end{aligned}
\]