We want to approximate a given distribution with one which factors according to a tree.
Given $q: A \to [0, 1]$, we want to find a
distribution $p$ on $A$ and tree $T$ on
$\set{1, \dots , n}$ to
\[
\begin{aligned}
\text{minimize} & \quad d_{kl}(q, p) \\
\text{subject to} & \quad p \text{ factors according to } T.
\end{aligned}
\]
Let $A_1, \dots , A_n$ be finite non-empty
sets.
Define $A = \prod_{i = 1}^{n} A_i$.
Let $q: A \to [0, 1]$ a distribution and $T$
a tree on $\set{1, \dots , n}$.
The distribution $p^*_T: A \to [0, 1]$ defined
by
\[
p^*_T = q_1 \prod_{i \neq 1} q_{i \mid \pa{i}}
\]
Let $p: A \to [0, 1]$ be a distribution
which factors according to $T$.
First, express
\[
p = p_1 \prod_{i \neq i} p_{i \mid \pa_i}
\]
Second, recall that the relative entropy of $q$ with $p$ is $H(q, p) - H(q)$. Since $H(q)$ does not depend on $p$, $p$ is a minimizer of the relative of $q$ with $p$ if and only if $p$ is a minimizer of $H(q, p)$.
Third, express
\[
\begin{aligned}
H(q, p) &= - \sum_{a \in A} q(a) \log p(a) \\
&= - \sum_{a \in A} q(a) (\log p_1(a_1) + \sum_{i \neq 1}
\log p_{i \mid \pa_i}(a_i, a_{\pa_i})) \\
&= H(q_1, p_1) + \sum_{i \neq 1} \sum_{a_{\pa_i} \in
A_{\pa_i}} q_{\pa_i}(a_{\pa_i})H(q_{i\mid\pa_i}(\cdot , a_{\pa_i}),
p_{i\mid\pa_i}(\cdot , a_{\pa_i}))
\end{aligned}
\]
Fourth, recall $H(\cdot , \cdot ) \geq 0$ and is zero on repeated pairs. By this, we mean, for example, $H(p_1, p_1) = 0$. So $p_1 = q_1$ and $p_{i \mid \pa_i} = q_{i \mid \pa_i}$ are solutions.
The foregoing proposition states the form of an optimal approximator given a tree. A natural next question is to select the tree.