Since our predictions are often uncertain, we can use the language of probability distributions to characterize them.1
Denote the set of probability distributions on a set $X$ by $\Delta(X)$.
A probabilistic classifier $G: A \to \Delta(B)$ is a function from inputs $A$ to probability distributions over the classes $B$.
Given an input $a$, the prediction of $G$ on $a$ is a probability distribution $\hat{p}_a = G(a)$ on $B$.
Given a point classifier $f: A \to B$, we can
define a probabilistic classifier $G: A \to
\Delta(B)$ corresponding to $f$ by
\[
\hat{p}_a(b) =
\begin{cases}
1 & \text{ if } f(a) = b \\
0 & \text{ otherwise.} \\
\end{cases}
\]
On the other hand, given probabilistic
classifier $G: A \to \Delta(B)$, we can define
a point classifier $f: A \to B$ by
\[
f(a) = \underset{b \in B}{\text{argmax}} \; \hat{p}_a(v)
\]
We can extend this idea, and define a list classifier by sorting the outputs by their probability, from largest to smallest.