Featurized Probabilistic Linear Models

Why

It is natural to embed a dataset.

Definition

Let $(x: \Omega \to \R ^d, A \in \R ^{n \times d}, e: \Omega \to \R ^n)$ be a probabilistic linear model over the probability space $(\Omega , \mathcal{A} , \mathbfsf{P} )$. Let $\phi : \R ^d \to \R ^{d'}$ be a feature map.

We call the sequence $(x, A, e, \phi )$ a featurized probabilistic linear model (also embedded probabilistic linear model). We interpret the model as a random field $h: \Omega \to (\R ^d \to \R )$ which is a linear function of the features

\[ h_{\omega }(a) = \transpose{\phi (a)}x(\omega ). \]

Denote the data matrix of the embedded feature vectors by $\phi (A)$. In other words, $\phi (A) \in \R ^{n \times d'}$ is a matrix whose rows are feature vectors. Then $(x, A, e, \phi )$ corresponds to the probabilistic linear model $(x, \phi (A), e)$.

Normal case

In the normal (Gaussian) case, the parameter posterior $g_{x \mid y}(\cdot , \gamma )$ is a normal density with mean

\[ \Sigma _{x}\transpose{\phi (A)}\inv{(\phi (A)\Sigma _{x}\transpose{\phi (A)} + \Sigma _{e})} \gamma \]

and covariance

\[ \inv{(\inv{\Sigma _{x}} + \transpose{\phi (A)}\inv{\Sigma _{e}}\phi (A))}. \]

The predictive density for $a \in \R ^d$ is normal with mean

\[ \transpose{\phi (a)}\Sigma _{x}\transpose{\phi (A)}\inv{(\phi (A)\Sigma _{x}\transpose{\phi (A)} + \Sigma _{e})}\gamma . \]

and covariance

\[ \transpose{\phi _a}\Sigma _{x}\phi _a - \transpose{\phi _a}\Sigma _{x}\transpose{\phi (A)}\inv{(\phi (A)\Sigma _{x}\transpose{\phi (A)} + \Sigma _e)}\phi (A)\Sigma _{x}\phi _a. \]

So the featurized linear regressor is the predictor $h: \R ^d \to \R $ defined by

\[ h(a) = \transpose{\phi (a)}\Sigma _{x}\transpose{\phi (A)}\inv{(\phi (A)\Sigma _{x}\transpose{\phi (A)} + \Sigma _{e})}\gamma . \]