What is the best linear regressor if we choose according to a squared loss function.
Let $X \in \R ^{n \times d}$ and $y \in \R ^d$. In other words, we have a paired dataset of records with inputs in $\R ^d$ (the rows of $X$) and outputs in $\R $ (the elements of $y$).
A least squares linear
predictor or linear least
squares predictor is a linear
transformation $f: \R ^d \to \R $ (the field is
$\R $) which minimizes
\[
\frac{1}{n} \sum_{i = 1}^{n} (f(x^i) - y_i)^2.
\]
The set of linear functions from $\R ^d$ to
$\R $ is in one-to-one correspondence with
$\R ^d$.
So we want to find $\theta \in \R ^d$ to
minimize
\[
\frac{1}{n} \norm{X\theta - y}^2.
\]