\(\DeclarePairedDelimiterX{\Set}[2]{\{}{\}}{#1 \nonscript\;\delimsize\vert\nonscript\; #2}\) \( \DeclarePairedDelimiter{\set}{\{}{\}}\) \( \DeclarePairedDelimiter{\parens}{\left(}{\right)}\) \(\DeclarePairedDelimiterX{\innerproduct}[1]{\langle}{\rangle}{#1}\) \(\newcommand{\ip}[1]{\innerproduct{#1}}\) \(\newcommand{\bmat}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\barray}[1]{\left[\hspace{2.0pt}\begin{matrix}#1\end{matrix}\hspace{2.0pt}\right]}\) \(\newcommand{\mat}[1]{\begin{matrix}#1\end{matrix}}\) \(\newcommand{\pmat}[1]{\begin{pmatrix}#1\end{pmatrix}}\) \(\newcommand{\mathword}[1]{\mathop{\textup{#1}}}\)
Needs:
Normal Densities
Maximum Likelihood Densities
Partial Derivatives
Needed by:
Maximum Likelihood Multivariate Normals
Links:
Sheet PDF
Graph PDF

Maximum Likelihood Normals

Why

We want to select a normal density which summarizes well a dataset.

Formulation

Let $D = (x^1, \dots , x^n)$ be a dataset in $\R $. We want to select a density from among normal densities, which require specifying a mean and covariance.

Following the principle of maximum likelihood, we want to solve

\[ \maximizationn{\mu ,\sigma \in \R }{ \prod_{k = 1}^{n} \normaldensity{x^k}{\mu }{\sigma } }{ \sigma > 0 \\ } \]

We call a solution to the above problem a maximum likelihood normal density with respect to the dataset.

Solution

Let $(x^1, \dots , x^n)$ be a dataset in $\R $. Let $f$ be a normal density with mean

\[ \frac{1}{n} \sum_{k = 1}^{n} x^k \]

and covariance

\[ \frac{1}{n} \sum_{k = 1}^{n} \left(x^k - \frac{1}{n} \sum_{k = 1}^{n} x^k\right)^2. \]

Then $f$ is a maximum likelihood normal density.
Every normal density has two parameters: the mean and the covariance. If the likelihood of one normal is less than or equal to the likelihood of another, then so also with their log likelihoods. Let $f$ be a normal density with parameter $\mu $ and $\sigma ^2$. We express the log likelihood of $f$ by

\[ \sum_{k = 1}^{n} \left( \frac{1}{2\sigma ^2}(x^k - \mu )^2 - \frac{1}{2}\log2\pi \sigma ^2\right) \]

The partial derivative of the log likelihood with respect to the mean $(\partial_{\mu } \ell ): \R ^2 \to \R $ is

\[ (\partial_\mu \ell )(\mu , \sigma ^2) = - \sum_{k = 1}^{n} \frac{1}{\sigma ^2}(x - \mu ) \]

and with respect to the covariance $(\partial_{\sigma ^2} \ell ): \R ^2 \to \R $ is

\[ (\partial_{\sigma ^2} \ell )(\mu , \sigma ^2) = \left(\frac{-1}{2(\sigma ^2)^{2}}\sum_{k = 1}^{n}(x^k - \mu )^2\right)- \frac{1}{2\sigma ^2} \]

We are interested in finding $\mu _0 \in \R $ and $\sigma ^2_0 > 0$, at which $\partial_\mu \ell (\mu _0, \sigma ^2_0) = 0$ and $\partial_{\sigma ^2} \ell (\mu _0, \sigma ^2_0) = 0$. So we have two equations. First, notice that $\partial_\mu \ell $ is zero if an only if its first argument (the mean) is $\frac{1}{n} \sum_{k = 1}^{n} x^k$. Second, notice that for all $\mu , \sigma ^2$, $\partial_{\sigma ^2}\ell $ is zero if and only if

\[ \sigma ^2 = \sum_{k = 1}^{n} (x^k - \mu )^2. \]

So the pair

\[ \left(\frac{1}{n}\sum_{k = 1}^{k} x^k, \frac{1}{n} \sum_{k = 1}^{n} (x_k - \frac{1}{n} \sum_{k = 1}^{n} x^k)^2\right) \]

is a stationary point of $\ell $.
Copyright © 2023 The Bourbaki Authors — All rights reserved — Version 13a6779cc About Show the old page view