If we treat the parameters of a linear
function as a random variable, an inductor for
the predictor is equivalent to an estimator for
the parameters.^{1}

Let $(\Omega , \mathcal{A} , \mathbfsf{P} )$ be a probability space. Let $x: \Omega \to \R ^d$. Define $g: \Omega \to (\R ^d \to \R )$ by $g(\omega )(a) = \transpose{a}x(\omega )$, for $a \in \R ^d$. In other words, for each outcome $\omega \in \Omega $, $g_\omega : \R ^d \to \R $ is a linear function with parameters $x(\omega )$. $g_\omega $ is the function of interest.

Let $a^1, \dots , a^n \in \R ^d$ a dataset with data matrix $A \in \R ^{n \times d}$. Let $e: \Omega \to \R ^n$ independent of $x$, and define $y: \Omega \to \R ^n$ by

\[ y = Ax + e. \]

In other words, $y_i = \transpose{x}a^i + e_i$.
We call $(x, A, e)$ a
probabilistic linear model.
Other terms include linear
model, statistical linear
model, linear regression
model, bayesian linear
regression, and bayesian
analysis of the linear model.^{2}
We call $x$ the parameters, $A$ a
design, $e$ the
error or
noise vector, and $y$ the
observation vector.

One may want an estimator for the parameters $x$ in terms of $y$ or one may be modeling the function $g$ and want to predict $g(a)$ for $a \in A$ not in the dataset.

In this model, the dataset is assumed to be
inconsistent as a result of the random errors.
In these cases, the error vector $e$ may model
a variety of sources of error ranging from
inaccuracies in the measurements (or measurement
devices) to systematic errors from the
“inapproriateness” of the use of a linear
predictor.^{3}
In this case the linear part is sometimes
called the deterministic
effect of the response on the input $a
\in A$.

One route to be more specific about the underlying distribution of the random vector is give its mean and variance. It is common to give the mean of $\E (w)$

$\E (y) = A\E (x) + \E (w)$^{4}

$\cov((x, y)) = A\cov(x)\transpose{A} + \cov{e}$^{5}