\(\DeclarePairedDelimiterX{\Set}[2]{\{}{\}}{#1 \nonscript\;\delimsize\vert\nonscript\; #2}\) \( \DeclarePairedDelimiter{\set}{\{}{\}}\) \( \DeclarePairedDelimiter{\parens}{\left(}{\right)}\) \(\DeclarePairedDelimiterX{\innerproduct}[1]{\langle}{\rangle}{#1}\) \(\newcommand{\ip}[1]{\innerproduct{#1}}\) \(\newcommand{\bmat}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\barray}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\mat}[1]{\begin{matrix}#1\end{matrix}}\) \(\newcommand{\pmat}[1]{\begin{pmatrix}#1\end{pmatrix}}\) \(\newcommand{\mathword}[1]{\mathop{\textup{#1}}}\)
Needs:
Outcome Probabilities
Set Numbers
Needed by:
Distribution Approximators
Empirical Measure
Links:
Sheet PDF
Graph PDF

Empirical Distribution of a Dataset

Why

A natural distribution to associate with a dataset is to assign to each outcome a probability which reflects the number of times it appears in the dataset.

Definition

Given a dataset $x_1, \dots , x_n$ is a finite set $X$, the empirical distribution is the function $q: X \to \R $ which associates each outcome with the proportion of times it appears in the dataset. In other words, $q$ is defined by

\[ q(a) = \frac{1}{n} \num{\Set*{k \in \set{1, \dots , n}}{a^k = a}}. \]

The function $q$ is clearly a distribution, since the proportions are nonnegative and sum to one.

Copyright © 2023 The Bourbaki Authors — All rights reserved — Version 13a6779cc About Show the old page view