We want to estimate a random variable $x: \Omega \to \R ^n$ from a random variable $y: \Omega \to \R ^n$ using an estimator $\phi : \R ^m \to \R ^n$ which is affine.1 In other words, $\phi (\xi ) = A\xi + b$ for some $A \in \R ^{n \times m}$ and $b \in \R ^n$. We will use the mean squared error cost.
We want to find $A$ and $b$ to minimize
\[
\E {\norm{Ax + b - y}^2}.
\]
\[ \mat{ & + & \tr(A \E (xx^\top ) A^\top ) & + & \E (x)^\top A^\top b & - & \tr(A^\top \E (yx^\top )) \\ & + & b^\top A \E (x) & + & b^\top b & - & b^\top \E (y) \\ & - & \tr(A \E (xy^\top ) ) & - & \E (y)^\top b & + & \E (yy^\top ) } \]
The gradients with respect to $b$ are\[ \mat{ & + & 0 & + & A \E (x) & - & 0 \\ & + & A \E (x) & + & 2 b & - & \E (y) \\ & - & 0 & - & \E (y) & + & 0 } \]
so $2A\E (x) + 2b - 2\E (y)$. The gradients with respect to $A$ are\[ \mat{ && + & \E (xx^\top )A^\top + \E (xx^\top )^\top A^\top & + & \E (x)b^\top & - & \E (yx^\top )^\top \\ & + & \E (x)b^\top & + & 0 & - & 0 \\ & - & \E (xy^\top ) & - & 0 & + & 0 } \]
so $2\E (xx^\top )A^\top + 2\E (x)b^\top - 2\E (xy^\top )$. We want $A$ and $b$ solutions to\[ \begin{aligned} A\E (x) + b - \E (y) &= 0 \\ \E (xx^\top )A^\top + \E (x)b^\top - \E (xy^\top ) &= 0 \end{aligned} \]
so first get $b = \E (y) - A\E (x)$. Then express\[ \begin{aligned} \E (xx^\top )A^\top + \E (x)(\E (y) - A \E (x))^\top - \E (xy^\top ) = 0.\\ \E (xx^\top )A^\top + \E (x)\E (y)^\top - \E (x)\E (x)^\top A^\top - \E (xy^\top ) = 0. \\ (\E (xx^\top ) - \E (x)\E (x)^\top )A^\top = \E (xy^\top ) - \E (x)\E (y)^\top . \\ \cov(x, x)A^\top = \cov(x, y). \end{aligned} \]
So $A^\top = (\cov(x,x))^{-1}\cov(x, y)$ means $A = \cov(y, x)(\cov(x, x))^{-1}$ is a solution. Then $b = \E (y) - \cov(y, x)\cov(x,x)^{-1}\E (x)$. So to summarize, the estimator $\phi (x) = Ax + b$ is\[ \cov(y, x)(\cov{x, x})^{-1} x + \E (y) - \cov(y, x)(\cov(x,x))^{-1}\E (x) \]
or\[ \E (y) + \cov(y, x) (\cov{x, x})^{-1} (x - \E (x)) \]