\(\DeclarePairedDelimiterX{\Set}[2]{\{}{\}}{#1 \nonscript\;\delimsize\vert\nonscript\; #2}\) \( \DeclarePairedDelimiter{\set}{\{}{\}}\) \( \DeclarePairedDelimiter{\parens}{\left(}{\right)}\) \(\DeclarePairedDelimiterX{\innerproduct}[1]{\langle}{\rangle}{#1}\) \(\newcommand{\ip}[1]{\innerproduct{#1}}\) \(\newcommand{\bmat}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\barray}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\mat}[1]{\begin{matrix}#1\end{matrix}}\) \(\newcommand{\pmat}[1]{\begin{pmatrix}#1\end{pmatrix}}\) \(\newcommand{\mathword}[1]{\mathop{\textup{#1}}}\)
Matrix Transpose
Real Matrix Inverses
Loss Functions
Linear Predictors
Needed by:
Feature Maps
Weighted Least Squares Linear Regressors
Sheet PDF
Graph PDF

Least Squares Linear Regressors


What is the best linear regressor if we choose according to a squared loss function.


Let $X \in \R ^{n \times d}$ and $y \in \R ^d$. In other words, we have a paired dataset of records with inputs in $\R ^d$ (the rows of $X$) and outputs in $\R $ (the elements of $y$).

A least squares linear predictor or linear least squares predictor is a linear transformation $f: \R ^d \to \R $ (the field is $\R $) which minimizes

\[ \frac{1}{n} \sum_{i = 1}^{n} (f(x^i) - y_i)^2. \]

over the dataset of pairs $(x^1, y_1), \dots , (x^n, y_n) \in \R ^d \times \R $ where $(x^i)^\top $ is the $i$th row of $X$ for $i = 1, \dots , n$.

The set of linear functions from $\R ^d$ to $\R $ is in one-to-one correspondence with $\R ^d$. So we want to find $\theta \in \R ^d$ to minimize

\[ \frac{1}{n} \norm{X\theta - y}^2. \]


There exists a unique linear least squares predictor and its parameters are given by $(X^\top X)^{-1}X^\top y$.1

  1. Future editions will include an account. ↩︎
Copyright © 2023 The Bourbaki Authors — All rights reserved — Version 13a6779cc About Show the old page view