Derivation of GMM

Derivation of one single-variable gaussian distribution

Considering a generative model, let \(P\) denote the generated probability distribution, \(\{x\}_{i=1}^n\) denote the data set, and \(\mu\) and \(\sigma\) denotes the parameters of a generated Gaussian Distribution. We will go over a max-likelihood process to find the parameters of the generated distribution.

Independence Assumption: \(x_i\perp x_j\ \forall i,j \ \mbox{s.j} \ 1\le i,j\le n\)

\[ \begin{align} P(\{x\}_{i=1}^n\mid \mu, \sigma) & = \prod_{i=1}^n P(x_i \mid \mu, \sigma ) \\ \mbox{[Take Log]}& = \sum_{i=1}^n\log{P(x_i \mid \mu, \sigma )} \\ \mbox{[Substitute]}& = \sum_{i=1}^n\log{(\frac{1}{\sqrt{\pi}\sigma}e^{-\frac{(x_i-\mu)^2}{2\sigma^2}})} \\ \mbox{[Decompose]}& = \sum_{i=1}^n\{-\log{(\sqrt{\pi}\sigma)}-{\frac{(x_i-\mu)^2}{2\sigma^2}}\} \\ \end{align} \]

\[ \therefore \mathop{\mbox{argmax}}_{\mu,\sigma}{P(\{x\}_{i=1}^n\mid \mu, \sigma)} = \mathop{\mbox{argmin}}_{\mu,\sigma}{\sum_{i=1}^n\{\log{(\sqrt{\pi}\sigma)}+{\frac{(x_i-\mu)^2}{2\sigma^2}}\}} \]

\[ \begin{align} &\mbox{Let } L = \sum_{i=1}^n\{\log{(\sqrt{\pi}\sigma)}+{\frac{(x_i-\mu)^2}{2\sigma^2}}\} \\ \therefore \ & \frac{\partial L}{\partial \mu} = \sum_{i=1}^n(\frac{1}{\sigma^2}(x_i-\mu)) \\ &\frac{\partial L}{\partial \sigma} = \sum_{i=1}^n{(\frac{1}{\sigma}-\frac{(x_i-\mu)^2}{\sigma^3})} \\ \\ & \mbox{Set partial derivaties equal to zero} \\ \therefore \ & \hat{\mu} = \frac{1}{n}\sum_{i=1}^n{x_i} \\ & \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^n{(x_i-\mu)^2} \end{align} \]

Result of one multi-variable gaussian distribution

Multi-variate Gaussian Distribution Probability Function

\[ g(\boldsymbol{x}) = \frac{1}{(2\pi)^{D/2}|\Sigma|^{1/2}}\exp\left\{ -\frac{1}{2} (\boldsymbol{x}- \boldsymbol{\mu})^\text{T}\Sigma^{-1}(\boldsymbol{x}- \boldsymbol{\mu})\right\} \]

Result of EM algorithm

Gaussian Mixture

\[ p(\boldsymbol{x}) = \sum_{k=1}^K {w_k g_k(\boldsymbol{x} \mid \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)} \]

Latent Variable

\[ z_k^i = \frac{g_k( \boldsymbol{x}_i \mid \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k) }{ \sum_{l=1}^K{g_l( \boldsymbol{x}_i \mid \boldsymbol{\mu}_l, \boldsymbol{\Sigma}_l)}} \]

Result

\[ \begin{align} \hat{ \boldsymbol{\mu}}_k & = \frac{1}{z_k}\sum_{i=1}^n{z_k^i \boldsymbol{x}_i} \\ \hat{ \boldsymbol{\Sigma}}_k & = \frac{1}{z_k}\sum_{i=1}^n{z_k^i (\boldsymbol{x}_i-\hat{ \boldsymbol{\mu}}_k) (\boldsymbol{x}_i-\hat{ \boldsymbol{\mu}}_k)^ \text{T}} \\ z_k & = \sum_{i=1}^N{z_k^i} \end{align} \]

Ref

Coursera Robotics Week 1 Lectures