We use a loss function and a predictive density to construct a regressor on the domain of the random process.
Let $\ell : \R \times \R \to \R $ be a loss function. We choose a predictor to minimize the expected loss under the predictive density.
Consider $\ell (\alpha , \beta ) = (\alpha - \beta )^2$. The minimum squared error normal random function predictor or minimum squared error gaussian process predictor for dataset $(a^1, \gamma _1), \dots , (a^n, \gamma _n)$ in $A \times \R $ is the predictor which minimizes the squared error loss. Since the predictive density is normal, the minimizer is the conditional mean.
Consider $\ell (\alpha , \beta ) = \abs{\alpha - \beta }$. The minimum absolute deviation normal random function predictor or minimum absolute deviation gaussian process predictor for dataset $(a^1, \gamma _1), \dots , (a^n, \gamma _n)$ in $A \times \R $ is the predictor which minimizes the absolute deviation loss. For any density, the solution is the median. Since the predictive density is normal, and so symmetric, the median is the conditional mean. In other words, the minimum absolute deviation normal random function predictor coinicides with the minimium squared error normal random function predictor.
For this reason, the normal
random function predictor or
gaussian process predictor
for dataset $(a^1, \gamma _1), \dots , (a^n,
\gamma _n)$ in $A \times \R $ is $h: A \to \R $
defined by
\[
h(x) = m(x) + \pmat{k(x,a^1) \cdots k(x, a^n)}\invp{\Sigma _{a}
+ \Sigma _{e}}(\gamma - m_{a}).
\]
Notice that $h$ is an affine function of $\gamma $. If the mean function $m \equiv 0$ then $h$ is linear in $\gamma $. This is sometimes called a linear estimator.1 Alternatively, notice (in the zero mean setting) that $h$ is a linear combination of $n$ kernel function $k(x, a^i)$ for $i = 1, \dots , n$. Specifically, $h$ is a linear combination of
The process of using a normal random function predictor is often called Gaussian process regression or (especially in spatial statistics) kriging. The upside is that a gaussian process predicor interpolates the data, is smooth, and the so-called variance increases with the distance from the data.2