\(\DeclarePairedDelimiterX{\Set}[2]{\{}{\}}{#1 \nonscript\;\delimsize\vert\nonscript\; #2}\) \( \DeclarePairedDelimiter{\set}{\{}{\}}\) \( \DeclarePairedDelimiter{\parens}{\left(}{\right)}\) \(\DeclarePairedDelimiterX{\innerproduct}[1]{\langle}{\rangle}{#1}\) \(\newcommand{\ip}[1]{\innerproduct{#1}}\) \(\newcommand{\bmat}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\barray}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\mat}[1]{\begin{matrix}#1\end{matrix}}\) \(\newcommand{\pmat}[1]{\begin{pmatrix}#1\end{pmatrix}}\) \(\newcommand{\mathword}[1]{\mathop{\textup{#1}}}\)
Needs:
Cardinality
Empirical Distribution of a Dataset
Probability Measures
Needed by:
None.
Links:
Sheet PDF
Graph PDF

Empirical Measure

Why

There is a natural probability measure on a measurable space to associate with a dataset from the base set of that space.

Definition

The empirical measure for a dataset in some measurable space is the measure which associates to each event the proportion of the records which are elements of that event.

Notation

Let $(a^1, \dots , a^n)$ be a dataset in a measurable space $(A, \mathcal{A} )$. Let $P: \powerset{A} \to [0, 1]$ be the probability measure that assigns to each set $B \subset A$ the number

\[ P(B) = \frac{1}{n} \num{\Set{k \in \set{1, \dots , n}}{a^k \in B}}. \]

Then $P$ is the empirical measure.

Copyright © 2023 The Bourbaki Authors — All rights reserved — Version 13a6779cc About Show the old page view