A neural network $\nu $ commutes with a neural network $\mu $ if their associated predictors commute as functions.
An autoencoder (or feedforward autoencoder) is a pair of neural networks $((\phi _1, \dots , \phi _k), (\psi _1, \dots , \psi _\ell ))$. If the networks commute and $\dom \phi _1 = \dom \psi _\ell $, we call the autoencoder regular. We call the predictor of the first network the encoder and the predictor the second network the decoder. We call the image of an input to the encoder an embedding (or feature vector, representation, code).
Let $(\phi , \psi )$ be regular and let $f: \R ^d \to \R ^k$ be the encoder and $g: \R ^k \to \R ^d$ be the decoder. If $k < d$, we call the autoencoder compressive. Otherwise, we call the autoencoder noncompressive. An autoencoder is perfect if $g \circ f$ is the identity function. Clearly, a compressive autoencoder can not be perfect.
Let us relax our notion of perfect by introducing a similarity function $\ell : \R ^d \times \R ^d \to \R $ (see Similarity Functions). An autoencoder is optimal with respect to $\ell $ if it minimizes $\int_{\R ^d} \ell (g(f(z)), z) dz$. This integral may diverge. Even if it converges for some autoencoders, there may not be an optimal autoencoder, or a unique one.
If we parameterize a family of autoencoders $\set{x_{\theta }}_{\theta \in \Theta }$ by a compact set $\Theta $, ...2
It is natural to be interested in compressive autoencoders.