Empirical Distribution of a Dataset

Why

A natural distribution to associate with a dataset is to assign to each outcome a probability which reflects the number of times it appears in the dataset.

Definition

Given a dataset $x_1, \dots , x_n$ is a finite set $X$, the empirical distribution is the function $q: X \to \R $ which associates each outcome with the proportion of times it appears in the dataset. In other words, $q$ is defined by

\[ q(a) = \frac{1}{n} \num{\Set*{k \in \set{1, \dots , n}}{a^k = a}}. \]

The function $q$ is clearly a distribution, since the proportions are nonnegative and sum to one.