# Normal Random Function Regressors

# Why

We use a loss function and a predictive
density to construct a regressor on the domain
of the random process.

# Setup

Let $\ell : \R \times \R \to \R $ be a loss
function.
We choose a predictor to minimize the expected
loss under the predictive density.

## Squared error case

Consider $\ell (\alpha , \beta ) = (\alpha -
\beta )^2$.
The minimum squared error
normal random function predictor or
minimum squared error gaussian
process predictor for dataset $(a^1,
\gamma _1), \dots , (a^n, \gamma _n)$ in $A
\times \R $ is the predictor which minimizes the
squared error loss.
Since the predictive density is normal, the
minimizer is the conditional mean.

## Absolute error case

Consider $\ell (\alpha , \beta ) = \abs{\alpha -
\beta }$.
The minimum absolute deviation
normal random function predictor or
minimum absolute deviation
gaussian process predictor for dataset
$(a^1, \gamma _1), \dots , (a^n, \gamma _n)$ in
$A \times \R $ is the predictor which minimizes
the absolute deviation loss.
For any density, the solution is the median.
Since the predictive density is normal, and so
symmetric, the median is the conditional mean.
In other words, the minimum absolute deviation
normal random function predictor coinicides with
the minimium squared error normal random function
predictor.

# Definition

For this reason, the normal
random function predictor or
gaussian process predictor
for dataset $(a^1, \gamma _1), \dots , (a^n,
\gamma _n)$ in $A \times \R $ is $h: A \to \R $
defined by

\[
h(x) = m(x) + \pmat{k(x,a^1) \cdots k(x, a^n)}\invp{\Sigma _{a}
+ \Sigma _{e}}(\gamma - m_{a}).
\]

In other words, the regressor which assigns to
each point its conditional mean.
Notice that $h$ is an affine function of
$\gamma $.
If the mean function $m \equiv 0$ then $h$ is
linear in $\gamma $.
This is sometimes called a
linear estimator.
Alternatively, notice (in the zero mean setting)
that $h$ is a linear combination of $n$ kernel
function $k(x, a^i)$ for $i = 1, \dots , n$.
Specifically, $h$ is a linear combination of

The process of using a normal random function
predictor is often called
Gaussian process regression
or (especially in spatial statistics)
kriging.
The upside is that a gaussian process predicor
interpolates the data, is smooth, and the
so-called variance increases with the distance
from the data.