We model a real-valued output as corrupted by small random errors. Thus, we can talk about a dataset which is “close” to being consistent with a linear predictor.

Let $(\Omega , \mathcal{A} , \mathbfsf{P} )$ be a probability space. Let $x \in \R ^d$ and $e: \Omega \to \R ^n$. For $A \in \R ^{n \times d}$, define $y: \Omega \to \R ^n$ by $y = Ax + e$. We call $(x, A, e)$ a probabilistic errors linear model. We call $y$ the response vector, $A$ the model matrix and $e$ the error vector.

The most basic distributional assumption for a probabilistic errors linear model pertain to the expectation and variance. Since $\E (y) = Ax + \E (e)$ and $\var(y) = \var(e)$, these assumptions can be given for $e$ or for $y$.

If $\E (x) = 0$ and $\var(y) = \sigma ^2I$ then we call $(x, A, e)$ a classical linear model with moment assumptions. Notice that the components of $e$ are assumed uncorrelated. We have $d + 1$ unknowns (the $d \times 1$ entires of $\theta $ and scalar parameter $\sigma ^2$.

In this case $\E (y_i) = \transpose{a^i}\theta $ and so $\theta $ is called the mean parameter vector and $\sigma ^2$ is called the model variance. The model variance indicates the variability inherent in the observations. Neither the mean nor variance of the error depends on the regression vector $x$ nor on the parameter vector $\theta $.

Consider the two-sample problem in which we have two populations with (unknown) mean responses $\alpha _1, \alpha _2 \in \R $. We observe these responses with (perhaps unknown) common variance $\sigma ^2$, and assume that errors are uncorrelated.

We define $y^1 = \alpha _1\mathbf{1} + e^1$ and $y^2 = \alpha _2\mathbf{1} + e^2$ so that we can stack these and obtain

\[ y = \bmat{y^1 \\ y^2} = \bmat{\alpha _1\mathbf{1} \\ \alpha _2\mathbf{1} } + \bmat{e^1 \\ e^2}. \]

To cast this in our standard form we define\[ A = \transpose{ \bmat{\bmat{1\\0} & \cdots &\bmat{1\\0} & \bmat{0 \\ 1} &\cdots & \bmat{0 \\ 1} } }, \quad x = \bmat{\alpha _1 \\ \alpha _2}. \]

with regression vectors $a_1 = (1, 0)$ and $x_2 = (0, 1)$ repeated $n_1$ and $n_2$ times, respectively. An input design for this model involves specifying a sequence of these two vectors, which (with the uncorrelated assumption) reduces to dictating how many responses should be collected from each population. The inputs here is really the set $\mathcal{X} = \set{1, 2}$. The feature function is $\phi : \mathcal{X} \to \R ^2$ defined by $\phi (1) = (1, 0)$ and $\phi (2) = (0, 1)$. And so the regression range is $\set{(1, 0), (0, 1)}$.