\(\DeclarePairedDelimiterX{\Set}[2]{\{}{\}}{#1 \nonscript\;\delimsize\vert\nonscript\; #2}\) \( \DeclarePairedDelimiter{\set}{\{}{\}}\) \( \DeclarePairedDelimiter{\parens}{\left(}{\right)}\) \(\DeclarePairedDelimiterX{\innerproduct}[1]{\langle}{\rangle}{#1}\) \(\newcommand{\ip}[1]{\innerproduct{#1}}\) \(\newcommand{\bmat}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\barray}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\mat}[1]{\begin{matrix}#1\end{matrix}}\) \(\newcommand{\pmat}[1]{\begin{pmatrix}#1\end{pmatrix}}\) \(\newcommand{\mathword}[1]{\mathop{\textup{#1}}}\)
Needs:
Estimators
Probabilistic Errors Linear Model
Needed by:
Normal Linear Model
Links:
Sheet PDF
Graph PDF

Probabilistic Linear Model

Why

If we treat the parameters of a linear function as a random variable, an inductor for the predictor is equivalent to an estimator for the parameters.1

Definition

Let $(\Omega , \mathcal{A} , \mathbfsf{P} )$ be a probability space. Let $x: \Omega \to \R ^d$. Define $g: \Omega \to (\R ^d \to \R )$ by $g(\omega )(a) = \transpose{a}x(\omega )$, for $a \in \R ^d$. In other words, for each outcome $\omega \in \Omega $, $g_\omega : \R ^d \to \R $ is a linear function with parameters $x(\omega )$. $g_\omega $ is the function of interest.

Let $a^1, \dots , a^n \in \R ^d$ a dataset with data matrix $A \in \R ^{n \times d}$. Let $e: \Omega \to \R ^n$ independent of $x$, and define $y: \Omega \to \R ^n$ by

\[ y = Ax + e. \]

In other words, $y_i = \transpose{x}a^i + e_i$.

We call $(x, A, e)$ a probabilistic linear model. Other terms include linear model, statistical linear model, linear regression model, bayesian linear regression, and bayesian analysis of the linear model.2 We call $x$ the parameters, $A$ a design, $e$ the error or noise vector, and $y$ the observation vector.

One may want an estimator for the parameters $x$ in terms of $y$ or one may be modeling the function $g$ and want to predict $g(a)$ for $a \in A$ not in the dataset.

Inconsistency

In this model, the dataset is assumed to be inconsistent as a result of the random errors. In these cases, the error vector $e$ may model a variety of sources of error ranging from inaccuracies in the measurements (or measurement devices) to systematic errors from the “inapproriateness” of the use of a linear predictor.3 In this case the linear part is sometimes called the deterministic effect of the response on the input $a \in A$.

Moment assumptions

One route to be more specific about the underlying distribution of the random vector is give its mean and variance. It is common to give the mean of $\E (w)$

Mean and variance

$\E (y) = A\E (x) + \E (w)$4
$\cov((x, y)) = A\cov(x)\transpose{A} + \cov{e}$5

  1. Future editions will offer further discussion. ↩︎
  2. The word bayesian is in reference to treating the object of interest—$x$—as a random variable. ↩︎
  3. Future editions will clarify and may excise this sentence. ↩︎
  4. By linearity. Full account in future editions. ↩︎
  5. Full account in future editions. ↩︎
Copyright © 2023 The Bourbaki Authors — All rights reserved — Version 13a6779cc About Show the old page view