next up previous
Next: About this document ...

                Useful Results for Variational Inference. Matthew J. Beal Summer 2000 (update history 06/12/00, 01/04/02 dd/mm/yy)         Nomenclature and layout taken partly from Bayesian Data Analysis, Gelman et al.

Distribution Notation Parameters Density function Moments, entropy, KL-divergence, etc. Notes
Uniform
$ \theta \sim U(a,b)$
$ p(\theta) = U(\theta\vert a,b)$
boundaries $ a$,$ b$
with $ b>a$
$ p(\theta) = \frac{1}{b-a}, \theta \in [a,b]$
$ H_\theta = \ln(b-a)$
$ \langle \theta \rangle = \frac{a+b}{2}, \langle \theta^2 \rangle - \langle \theta \rangle^2 = \frac{(b-a)^2}{12}$
Exponential
Laplace
$ \theta \sim Laplace(\mu,\lambda)$
$ p(\theta) = Laplace(\theta\vert\mu,\lambda)$
$ \mu$ mean
$ \lambda$ decay scale
$ p(\theta) = \frac{1}{2\lambda} e^{-\frac{\vert\theta-\mu\vert}{\lambda}}$
$ \lambda>0$
$ H_\theta = 1+ \ln(2\lambda)$
Multivariate
normal
$ \boldsymbol{\theta}\sim N(\boldsymbol{\mu},\Sigma)$
$ p(\boldsymbol{\theta}) = N(\boldsymbol{\theta}\vert\boldsymbol{\mu},\Sigma)$
$ \boldsymbol{\mu}$ mean vector
$ \Sigma$ covariance
$ p(\boldsymbol{\theta}) = (2\pi)^{-d/2} \vert\Sigma\vert^{-1/2} e^{-\frac{1}{2}...
...{\theta}-\boldsymbol{\mu})(\boldsymbol{\theta}-\boldsymbol{\mu})^\top \right] }$
$ H_{\boldsymbol{\theta}} = \frac{d}{2} (\ln 2\pi e) + \frac{1}{2} \ln \vert\Sigma\vert$
$ {\rm KL}(\tilde{\boldsymbol{\mu}},\tilde{\Sigma}\vert\vert\boldsymbol{\mu},\Sigma) = -\frac{1}{2} \left( \ln \vert\tilde{\Sigma} \Sigma^{-1} \vert \right. $
$ \left. + {\rm tr}\left[ I - \left[ \tilde{\Sigma} + (\tilde{\boldsymbol{\mu}}-...
...dsymbol{\mu}}-\boldsymbol{\mu})^\top \right] \Sigma^{-1} \right] \ln e \right) $
$ \langle \boldsymbol{\theta}\rangle = \boldsymbol{\mu}$
$ \langle \boldsymbol{\theta}\boldsymbol{\theta}^\top \rangle = \Sigma$
$ K_{\boldsymbol{\theta}} = \frac{\langle \boldsymbol{\theta}^4 \rangle}{\langle \boldsymbol{\theta}^2 \rangle^2} - 3 = 0$
Generalised
normal
$ p(\theta) = \frac{2\beta^{\alpha/2}}{\Gamma(\frac{\alpha}{2})}\theta^{\alpha-1}e^{-\beta \theta^2}$
$ \theta,\alpha,\beta>0$
$ H_\theta = \ln \frac{\Gamma(\alpha/2)}{2\beta^{1/2}} - \frac{\alpha-1}{2}\psi(\frac{\alpha}{2}) +\frac{\alpha}{2}$
$ \langle \theta \rangle = {\rm addition}$
$ \langle \theta^2 \rangle - \langle \theta \rangle^2 = {\rm addition}$
Gamma
$ \tau \sim Gamma(\alpha,\beta)$
$ p(\tau) = Gamma(\tau\vert\alpha,\beta)$
shape $ \alpha>0$
inv. scale $ \beta>0$
$ p(\tau) = \frac{\beta^\alpha}{\Gamma(\alpha)} \tau^{\alpha-1} e^{-\beta \tau}$
$ H_\tau = \ln \Gamma(\alpha) -\ln \beta + (1-\alpha) \psi(\alpha) + \alpha$
$ \langle \tau^n \rangle = \frac{\Gamma(\alpha+n)}{\beta^n \Gamma(\alpha)}$
$ \langle (\ln \tau)^n \rangle = \frac{\beta^\alpha}{\Gamma(\alpha)}\frac{\partial^n}{\partial \alpha^n}\left( \frac{\Gamma(\alpha)}{\beta^\alpha} \right)$
$ \langle \tau \rangle = \alpha / \beta $
$ \langle \tau^2 \rangle - \langle \tau \rangle^2 = \alpha / \beta^2$
$ \langle \ln \tau \rangle = \psi(\alpha) - \ln \beta $
$ {\rm KL}(\tilde{\alpha},\tilde{\beta}\vert\vert\alpha,\beta) = \tilde{\alpha}\ln\tilde{\beta}- \alpha\ln\beta - \ln\frac{\Gamma(\tilde{\alpha})}{\Gamma(\alpha)}$
$ + (\tilde{\alpha}-\alpha)(\psi(\tilde{\alpha})-\ln\tilde{\beta}) - \tilde{\alpha}(1-\frac{\beta}{\tilde{\beta}})$
conj. Gaussian precision
$ y=\tau_1 + \tau_2$ where
$ \tau_i \sim Gamma(\alpha,\beta)$ then...
Inverse gamma
conj. Gaussian variance
Wishart
$ W \sim Wishart_\nu(S)$
$ p(W) = Wishart_\nu(W\vert S)$
deg. of freedom $ \nu$
precision matrix $ S$
$ p(W) = \frac{1}{Z_{\nu S}} \vert W\vert^{(\nu-k-1)/2} e^{-\frac{1}{2} {\rm tr}\left[ S^{-1}W \right] }$
$ Z_{\nu S} = 2^{\nu k /2} \pi^{k(k-1)/4} \vert S\vert^{\nu/2} \prod_{i=1}^k \Gamma\left(\frac{\nu+1-i}{2}\right) $
$ H_{W} = \ln Z_{\nu S} - \frac{\nu-k-1}{2} \langle \ln\vert W\vert \rangle + \frac{1}{2}\nu k $
$ \langle W \rangle = \nu S $
$ \langle \ln \vert W\vert \rangle = \sum_{i=1}^k \psi \left(\frac{\nu+1-i}{2}\right) + k \ln 2 + \ln \vert S\vert$
$ {\rm KL}(\tilde{\nu},\tilde{S}\vert\vert\nu,S) = \ln \frac{Z_{\nu S}}{Z_{\tild...
... + \frac{1}{2} \tilde{\nu}\, {\rm tr}\left[ S^{-1} \tilde{S} - {\rm I} \right] $
conj. Gaussian precision
Inverse-Wishart
$ W \sim Inv \!\! - \!\! Wishart_\nu(S^{-1})$
$ p(W) = Inv \!\! -\!\! Wishart_\nu(W\vert S)$
deg. of freedom $ \nu$
covariance matrix $ S$
$ p(W) = \frac{1}{Z} \vert W\vert^{-(\nu+k+1)/2} e^{-\frac{1}{2} {\rm tr}\left[ SW^{-1} \right] }$
$ Z = 2^{\nu k /2} \pi^{k(k-1)/4} \prod_{i=1}^k \Gamma\left(\frac{\nu+1-i}{2}\right) \times \vert S\vert^{-\nu/2} $
entropy
$ \langle W \rangle = (\nu-k-1)^{-1} S$
conj. Gaussian covariance
If $ W^{-1} \sim Wishart_\nu(S)$, then
$ W \sim Inv \!\! - \!\! Wishart_\nu(S^{-1})$
Student-t (1)
$ \theta \sim t_\nu (\mu,\sigma^2)$
$ p(\theta)=t_\nu(\theta\vert\mu,\sigma^2)$
deg. of freedom $ \nu>0$
mean $ \mu$; scale $ \sigma>0$
$ p(\theta)=\frac{\Gamma((\nu+1)/2)}{\Gamma(\nu/2) \sqrt{\nu \pi} \sigma} \left( 1+\frac{1}{\nu}\left(\frac{\theta-\mu}{\sigma}\right)^2\right)^{-(\nu+1)/2}$
$ \langle \theta \rangle = \mu$, for $ \nu>1$
$ \langle \theta^2 \rangle - \langle \theta \rangle^2 = \frac{\nu}{\nu-2} \sigma^2$, for $ \nu>2$
Student-t (2)
$ \theta \sim t(\mu,\alpha,\beta)$
$ p(\theta) = t(\theta\vert\mu,\alpha,\beta)$
shape $ \alpha>0$; mean $ \mu$
$ \rm {scale}^2$ $ \beta>0$
$ p(\theta)=\frac{\Gamma(\alpha+1/2)}{\Gamma(\alpha) \sqrt{2\pi \beta}} \left( 1+\frac{(\theta-\mu)^2}{2 \beta} \right)^{-(\alpha+1/2)}$
$ H_\theta = \left[ \psi(\alpha+\frac{1}{2}) - \psi(\alpha) \right] (\alpha+\frac{1}{2}) + \ln \sqrt{2\beta} B(\frac{1}{2},\alpha)$
$ K_{\theta} = \frac{3}{\alpha-2}$ (relative to Gaussian)
equiv. $ \alpha \rightarrow \frac{\nu}{2}; \beta \rightarrow \frac{\nu}{2} \sigma^2$
Multivariate
Student-t
$ \boldsymbol{\theta}\sim t_\nu(\boldsymbol{\mu},\Sigma)$
$ p(\boldsymbol{\theta})=t_\nu(\boldsymbol{\theta}\vert\boldsymbol{\mu},\Sigma)$
deg. of freedom $ \nu>0$
mean $ \boldsymbol{\mu}$; $ \rm {scale}^2$ matrix $ \Sigma$
$ p(\boldsymbol{\theta}) = \frac{1}{Z} \left( 1+\frac{1}{\nu}{\rm tr}\left[ \Sig...
...l{\mu})(\boldsymbol{\theta}-\boldsymbol{\mu})^\top \right] \right)^{-(\nu+d)/2}$
$ Z=\frac{\Gamma((\nu+d)/2)}{\Gamma(\nu/2) (\nu\pi)^{d/2}}\vert\Sigma\vert^{-1/2}$
$ \langle \boldsymbol{\theta}\rangle = \boldsymbol{\mu}$, for $ \nu>1$
$ \langle \boldsymbol{\theta}\boldsymbol{\theta}^\top \rangle - \langle \boldsym...
...theta}\rangle \langle \boldsymbol{\theta}\rangle^\top= \frac{\nu}{\nu-2} \Sigma$, for $ \nu>2$
$ \nu=1, t_\nu = Cauchy$
$ \nu\rightarrow\infty, t_\nu \rightarrow N(\boldsymbol{\mu},\Sigma)$
Beta
$ \theta \sim Beta(\alpha,\beta)$
$ p(\theta)=Beta(\theta\vert\alpha,\beta)$
prior sample sizes
$ \alpha>0,\beta>0$
$ p(\theta) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha-1}(1-\theta)^{\beta-1}$
$ \theta \in [0,1]$
Dirichlet
$ \boldsymbol{\pi}\sim Dirichlet(\boldsymbol{\alpha})$
$ p(\boldsymbol{\pi}) = Dirichlet(\boldsymbol{\pi}\vert\boldsymbol{\alpha})$
prior sample sizes
$ \boldsymbol{\alpha}= \{\alpha_1,\ldots,\alpha_k\}$
$ \alpha_j>0; \alpha_0 = \sum_{j=1}^k \alpha_j$
$ p(\boldsymbol{\pi}) = \frac{\Gamma(\alpha_0)}{\Gamma(\alpha_1) \cdots \Gamma(\alpha_k)} \pi_1^{\alpha_1-1}\cdots \pi_k^{\alpha_k-1} $
$ \pi_1,\ldots,\pi_k \geq 0; \sum_{j=1}^k \pi_j = 1$
$ \langle \boldsymbol{\pi}\rangle = \boldsymbol{\alpha}/ \alpha_0$
$ \langle \boldsymbol{\pi}\boldsymbol{\pi}^\top \rangle - \langle \boldsymbol{\p...
...\alpha})- \boldsymbol{\alpha}\boldsymbol{\alpha}^\top}{\alpha_0^2 (\alpha_0+1)}$
$ \langle \ln \pi_j \rangle = \psi(\alpha_j) - \psi(\alpha_0) $
$ {\rm KL}(\tilde{\boldsymbol{\alpha}},\boldsymbol{\alpha}) = \ln \frac{\Gamma(\...
... \right) \left( \psi(\tilde{\alpha}_k) - \psi(\tilde{\alpha}_0) \right) \right]$
$ H_{\boldsymbol{\pi}}= {\rm addition}$
conj. to multinomial
Exponential
Family
$ \boldsymbol{\theta}\sim ExpFam_{\boldsymbol{\phi}}(\boldsymbol{\theta})$
$ p(\boldsymbol{\theta}) = ExpFam_{\boldsymbol{\phi}}(\boldsymbol{\theta}\vert\eta,\boldsymbol{\nu})$
number $ \eta$ and value $ \boldsymbol{\nu}$
of pseudo-observations
$ p(\boldsymbol{\theta}) = \frac{1}{Z_{\boldsymbol{\phi}\eta\boldsymbol{\nu}}} g...
...heta})^\eta e^{\boldsymbol{\phi}(\boldsymbol{\theta})^{{\top}}\boldsymbol{\nu}}$
$ H_{\boldsymbol{\theta}} = \ln Z_{\boldsymbol{\phi}\eta\boldsymbol{\nu}} - \eta...
... - \boldsymbol{\nu}^\top \langle \boldsymbol{\phi}(\boldsymbol{\theta}) \rangle$

Gamma function: $ \Gamma(x) = \int_0^\infty d\tau \; \tau^{x-1} e^{-\tau}, \ \
\Gamma(\frac{1}{...
...\partial}{\partial x} \ln \Gamma(x), \ \
\psi(x+1) = \frac{1}{x} + \psi(x)
.
$




next up previous
Next: About this document ...
Matthew Beal 2005-02-14