Let $X$ and $Y$ be sets.
A hidden memory chain
(hidden markov chain,
hidden memory model,
hidden markov model,1
HMM) with
hiddens (or
latents) $Y$ and
observations $Y$ of
length $n$ is a joint
distribution $p: X^n \times Y^n \to [0, 1]$
satisfying
\[
p(x, y)
=
f(x) g(y_1, x_1) \prod_{i = 1}^{n} h(x_i, x_{i-1}) g(y_i,
x_i),
\]
Clearly, $p_{1, \dots , n}$ is a memory chain (see Memory Chains). For this reason, we continue to refer to $h$ as the conditional distribution. We continute to refer to $f$ as the initial distribution. We refer to $h$ as the observation distribution.
The word “hidden” refers to the situation in which we observe outcomes $y$, and we hypothesize that they were “generated by” unobserved outcomes $x$.