Probabilistic Linear Model

Why

If we treat the parameters of a linear function as a random variable, an inductor for the predictor is equivalent to an estimator for the parameters.¹

Definition

Let $(\Omega , \mathcal{A} , \mathbfsf{P} )$ be a probability space. Let $x: \Omega \to \R ^d$. Define $g: \Omega \to (\R ^d \to \R )$ by $g(\omega )(a) = \transpose{a}x(\omega )$, for $a \in \R ^d$. In other words, for each outcome $\omega \in \Omega $, $g_\omega : \R ^d \to \R $ is a linear function with parameters $x(\omega )$. $g_\omega $ is the function of interest.

Let $a^1, \dots , a^n \in \R ^d$ a dataset with data matrix $A \in \R ^{n \times d}$. Let $e: \Omega \to \R ^n$ independent of $x$, and define $y: \Omega \to \R ^n$ by

\[ y = Ax + e. \]

In other words, $y_i = \transpose{x}a^i + e_i$.

We call $(x, A, e)$ a probabilistic linear model. Other terms include linear model, statistical linear model, linear regression model, bayesian linear regression, and bayesian analysis of the linear model.² We call $x$ the parameters, $A$ a design, $e$ the error or noise vector, and $y$ the observation vector.

One may want an estimator for the parameters $x$ in terms of $y$ or one may be modeling the function $g$ and want to predict $g(a)$ for $a \in A$ not in the dataset.

Inconsistency

In this model, the dataset is assumed to be inconsistent as a result of the random errors. In these cases, the error vector $e$ may model a variety of sources of error ranging from inaccuracies in the measurements (or measurement devices) to systematic errors from the “inapproriateness” of the use of a linear predictor.³ In this case the linear part is sometimes called the deterministic effect of the response on the input $a \in A$.

Moment assumptions

One route to be more specific about the underlying distribution of the random vector is give its mean and variance. It is common to give the mean of $\E (w)$

Mean and variance

$\E (y) = A\E (x) + \E (w)$⁴

$\cov((x, y)) = A\cov(x)\transpose{A} + \cov{e}$⁵

Future editions will offer further discussion. ↩︎
The word bayesian is in reference to treating the object of interest—$x$—as a random variable. ↩︎
Future editions will clarify and may excise this sentence. ↩︎
By linearity. Full account in future editions. ↩︎
Full account in future editions. ↩︎