\(\DeclarePairedDelimiterX{\Set}[2]{\{}{\}}{#1 \nonscript\;\delimsize\vert\nonscript\; #2}\) \( \DeclarePairedDelimiter{\set}{\{}{\}}\) \( \DeclarePairedDelimiter{\parens}{\left(}{\right)}\) \(\DeclarePairedDelimiterX{\innerproduct}[1]{\langle}{\rangle}{#1}\) \(\newcommand{\ip}[1]{\innerproduct{#1}}\) \(\newcommand{\bmat}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\barray}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\mat}[1]{\begin{matrix}#1\end{matrix}}\) \(\newcommand{\pmat}[1]{\begin{pmatrix}#1\end{pmatrix}}\) \(\newcommand{\mathword}[1]{\mathop{\textup{#1}}}\)
Needs:
Normal Linear Model
Regressors
Interpolators
Needed by:
Featurized Probabilistic Linear Models
Links:
Sheet PDF
Graph PDF

Normal Linear Model Regressors

Why

There is a natural predictor corresponding to a normal linear model.

Definition

Let $(x: \Omega \to \R ^d, A \in \R ^{n \times d}, e: \Omega \to \R ^n)$ be a normal linear model over the probability space $(\Omega , \mathcal{A} , \mathbfsf{P} )$.

Predictive density

We are modeling $h_\omega : \R ^d \to \R $ by $h_w(a) = \transpose{x(\omega )}a$. The predictive density for a dataset $c^1, \dots , c^m \in \R ^d$ is the conditional density of the random vector $(h_{(\cdot )}(c^1), \dots , h_{(\cdot )}(c^m))$ given $y$.

The predictive density for $c^1, \dots , c^m \in \R ^d$ (with data matrix $C \in \R ^{m \times d}$) is normal with mean

\[ g(a) = (C\Sigma _{x}\transpose{A})\inv{(A\Sigma _{x}\transpose{A} + \Sigma _e)}\gamma . \]

and covariance

\[ C\Sigma _{x}\transpose{C} - C\Sigma _{x}\transpose{A}\inv{(A\Sigma _{x}\transpose{A} + \Sigma _e)}A\Sigma _{x}\transpose{C}. \]

Define (as usual) $y: \Omega \to \R ^n$ and $z : \Omega \to \R ^m$ by

\[ \begin{aligned} y &= Ax + e \\ z &= Cx. \end{aligned} \]

Recognize $(x, y, z)$ as jointly normal, and use Normal Conditionals).

Predictor

The normal linear model predictor or normal linear model regressor for the normal linear model $(x, A, e)$ is the predictor which assigns to a new point $a \in \R ^d$ the mean of the predictive density at $a$. That is, the predictor $g: \R ^d \to \R $ defined by

\[ g(a) = \transpose{a}\Sigma _{x}\transpose{A}\inv{(A\Sigma _{x}\transpose{A} + \Sigma _e)}\gamma . \]

In the above we have substituted $\transpose{a}$ for $C$. In the case of normal random vectors this corresponds with the MAP estimate and the MMSE estimate.1 Notice that $g$ is linear in its argument, $a$.

The use of a normal linear model predictor is often called Bayesian linear regression. The word Bayesian is used in reference to treating the parameters of the function, $x$, as a random variable.


  1. Future editions will have discussed this and include a reference to the sheet. ↩︎
Copyright © 2023 The Bourbaki Authors — All rights reserved — Version 13a6779cc About Show the old page view