Home Algorithms Commercialization Data Science Information Theories Quantum Theories Lab Linear Algebra
<< Probability - Counting PDF Probability - Discrete Probability Distributions >>

$\require{cancel} \newcommand{\Ket}[1]{\left|{#1}\right\rangle} \newcommand{\Bra}[1]{\left\langle{#1}\right|} \newcommand{\Braket}[1]{\left\langle{#1}\right\rangle} \newcommand{\Rsr}[1]{\frac{1}{\sqrt{#1}}} \newcommand{\RSR}[1]{1/\sqrt{#1}} \newcommand{\Verti}{\rvert} \newcommand{\HAT}[1]{\hat{\,#1~}} \DeclareMathOperator{\Tr}{Tr}$

Probability - Normal Distribution

First created in April 2017

Probability Density Function (PDF)

The PDF of a normal distribution $\mathcal{N}(\mu,\sigma)$ is $\displaystyle \boxed{p(x~|~\mu,\sigma^2)=\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)} ~.}$

Let us take it as $p(x)$ and prove that the mean is $\mu$ and variance $\sigma^2$. When $\mu=0$ and $\sigma\rightarrow 0$, $p(x)\rightarrow\delta(x)$, which is the Dirac delta function.

For convenience, here $\displaystyle y=\frac{x-\mu}{\sqrt{2\sigma^2}}$, and as a result $\displaystyle y^2=\frac{(x-\mu)^2}{2\sigma^2}$ and $dx=\sqrt{2\sigma^2}~dy$.


Gaussian integral and related:

$\displaystyle(1):\int_{-\infty}^\infty e^{-y^2}~dy=\sqrt{\pi}.$


$\displaystyle(2):\int_{-\infty}^\infty y~e^{-y^2}~dy=0.~$

Proof: $\displaystyle \frac{d}{dy}\left(-\frac{1}{2}~e^{-y^2}\right)=y~e^{-y^2}~,\quad \Big[-\frac{1}{2}~e^{-y^2}\Big]_{-\infty}^\infty=0 .$


$\displaystyle(3):\int_{-\infty}^\infty y^2~e^{-y^2}~dy=\frac{\sqrt{\pi}}{2}.~$

Proof: Let $u=y$ and $dv=y~e^{-y^2}~dy$, so $du=dy$ and $v=-\frac{1}{2}~e^{-y^2}$. $\displaystyle \int_{-\infty}^\infty~y^2~e^{-y^2}~dy =\int_{-\infty}^\infty u~dv =\Big[uv\Big]_{-\infty}^\infty-\int_{-\infty}^\infty v~du =\left[y\cdot\left(-\frac{1}{2}~e^{-y^2}\right)\right]_{-\infty}^\infty+\int_{-\infty}^\infty\frac{1}{2}~e^{-y^2}~dy =0+\frac{\sqrt{\pi}}{2} =\frac{\sqrt{\pi}}{2} .$

Critical Points

$\displaystyle p'(x) =\frac{d}{dx}\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)} =\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}\frac{d}{dx}\frac{-(x-\mu)^2}{2\sigma^2} =\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}\frac{-2(x-\mu)}{2\sigma^2}$

$\displaystyle =\frac{-(x-\mu)}{\sqrt{2\pi}~\sigma^3}~e^{-(x-\mu)^2/(2\sigma^2)} =0$, when $x=\mu$, where PDF peaks.

$\displaystyle p''(x) =\frac{d}{dx}\left(\frac{-(x-\mu)}{\sqrt{2\pi}~\sigma^3}~e^{-(x-\mu)^2/(2\sigma^2)}\right) =\frac{-1}{\sqrt{2\pi}~\sigma^3}~e^{-(x-\mu)^2/(2\sigma^2)} +\frac{-(x-\mu)}{\sqrt{2\pi}~\sigma^3}~e^{-(x-\mu)^2/(2\sigma^2)}\frac{d}{dx}\frac{-(x-\mu)^2}{2\sigma^2} ~.$

$\displaystyle =\left(\frac{-1}{\sqrt{2\pi}~\sigma^3} +\frac{(x-\mu)^2}{\sqrt{2\pi}~\sigma^5}\right)~e^{-(x-\mu)^2/(2\sigma^2)} =\frac{(x-\mu)^2-\sigma^2}{\sqrt{2\pi}~\sigma^5}~e^{-(x-\mu)^2/(2\sigma^2)} ~.$

When $x=\mu,~~p''(x)<0$ and $f(x)$ is at maximum.\quad When $x=\mu\pm\sigma,~~p''(x)=0$, which are the points of inflection.

Mean

Mean $\displaystyle =\int_{-\infty}^\infty x\cdot p(x)~dx =\int_{-\infty}^\infty x\cdot\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}~dx =\frac{1}{\sqrt{2\sigma^2\pi}}\left[\int_{-\infty}^\infty x~e^{-(x-\mu)^2/(2\sigma^2)}~dx\right]$

$\displaystyle =\frac{1}{\sqrt{2\sigma^2\pi}}\left[\int_{-\infty}^\infty(x-\mu)~e^{-(x-\mu)^2/(2\sigma^2)}~dx+\int_{-\infty}^\infty\mu~e^{-(x-\mu)^2/(2\sigma^2)}~dx\right] =\frac{1}{\sqrt{2\sigma^2\pi}}\left[\int_{-\infty}^\infty\sqrt{2\sigma^2}~y~e^{-y^2}~\sqrt{2\sigma^2}~dy +\mu\int_{-\infty}^\infty e^{-y^2}~\sqrt{2\sigma^2}~dy\right] ~.$

$\displaystyle =\frac{1}{\sqrt{2\sigma^2\pi}}\left[0+\mu~\sqrt{2\sigma^2}\sqrt{\pi}\right] =\mu ~.$

Variance

Variance $\displaystyle =\int_{-\infty}^\infty(x-\mu)^2\cdot p(x)~dx =\int_{-\infty}^\infty(x-\mu)^2\cdot\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}~dx$

$\displaystyle =\frac{1}{\sqrt{2\sigma^2\pi}}\int_{-\infty}^\infty 2\sigma^2~y^2\cdot e^{-y^2}~\sqrt{2\sigma^2}~dy =\frac{2\sigma^2}{\sqrt{\pi}}\int_{-\infty}^\infty y^2~e^{-y^2}~dy =\frac{2\sigma^2}{\sqrt{\pi}}\cdot\frac{\sqrt{\pi}}{2} =\sigma^2 ~.$

Error Function

Here is the probability that a sample falls in $[-x,x]$ over $\varphi(x)=\frac{1}{\sqrt{2\pi}}~e^{-x^2/2}$, a standard normal distribution with $\mu=0$ and $\sigma^2=1$:

$\displaystyle \int_{-x}^x\varphi(y)~dy =\int_{-x}^x\frac{1}{\sqrt{2\pi}}~e^{-y^2/2}~dy =\frac{1}{\sqrt{2\pi}}~\sqrt{2}~\int_{-x}^x e^{-t^2}~dt =\frac{1}{\sqrt{\pi}}~\int_{-x}^x e^{-t^2}~dt ~.$

To enables us to calculate the probability of a sample falls in $[a,b]$ by $\mathrm{erf}(b)-\mathrm{erf}(a),$ we define $\displaystyle \boxed{\mathrm{erf}(x) =\frac{2}{\sqrt{\pi}}~\int_{0}^x e^{-t^2}~dt ~}$.

The error function has a domain of $(-\infty,+\infty)$ and range $(-1,1)$.

Cumulative Distribution Function (CDF)

CDF $\displaystyle =\int_{-\infty}^x p(x)~dx =\int_{-\infty}^x\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}~dx =\frac{1}{\sqrt{2\sigma^2\pi}}\int_{-\infty}^ye^{-y^2}\sqrt{2\sigma^2}~dy$

$\displaystyle =\frac{1}{\sqrt{\pi}}\left(\int_{-\infty}^0 e^{-y^2}~dy+\int_0^ye^{-y^2}~dy\right) =\frac{1}{\sqrt{\pi}}\left(\frac{1}{2}\int_{-\infty}^\infty e^{-y^2}~dy+\frac{1}{2}\int_{-y}^y e^{-y^2}~dy\right) =\frac{1}{\sqrt{\pi}}\left(\frac{1}{2}\sqrt{\pi}+\frac{1}{2}\sqrt{\pi}\cdot\mathrm{erf}\left(\frac{x-\mu}{\sqrt{2\sigma^2}}\right)\right) ~.$

$\displaystyle \boxed{\mathrm{CDF}=\frac{1}{2}\left(1+\mathrm{erf}\left(\frac{x-\mu}{\sqrt{2\sigma^2}}\right)\right) ~.}$

Note: This will work even when $y<0$.

Entropy

Information about an event $\omega$ is denoted as $I(\omega)$. If the event is certain, its probability $P(\omega)=1$ and $I(\omega)=0$.

In other words, if the event is known to have happened, revealing it gives no information. If the event is unlikely to happen, revealing it convey significant information.

If there are $n$ possible events and the probability of event $\omega_k$ occurs is $P(\omega_k)$ where $k=1,2,\ldots,n$, then $I(\omega_k)=-\log(P(\omega_k))$.

The base of log is usually 2 to measure in bits (binary unit of information). If the base is $e$ (which is what physicists usually use), the result will be in "nat" (natural unit of information). Since $e>2, -\log_e(p)<-\log_2(p)$ as $p<1$, information measures to lower absolute value in nat than in binary, so is entropy.

This means one nat contains more information than one bit. The "exchange rate" is one bit (0 or 1 with 50-50 change) is 0.693 nat, and one nat is 1.443 bits.


Entropy is the expected value of $I(\omega_k)$:

$\displaystyle H(X) =E(I(X)) =\sum I(X)\cdot p(x)~\Delta x =\sum-\log(p(x)~\Delta x)\cdot p(x)~\Delta x =-\sum\Big(\log(p(x))\cdot p(x)~\Delta x+\log(\Delta x)\cdot p(x)~\Delta x\Big) ~.$

When $\displaystyle \Delta x\rightarrow 0,~~ E(I(X))\rightarrow-\sum\log(p(x))\cdot p(x)~\Delta x ~.$

$\displaystyle H(X) =-\int_{-\infty}^\infty p(x)\log_b(p(x))~dx =-\frac{1}{\ln b}\int_{-\infty}^\infty\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)} \ln\left(\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}\right)~dx$

$\displaystyle =-\frac{1}{\ln b}\int_{-\infty}^\infty\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-y^2} \ln\left(\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-y^2}\right)~\sqrt{2\sigma^2}~dy =-\frac{1}{\ln b\sqrt{\pi}}\int_{-\infty}^\infty e^{-y^2}\left(-\frac{1}{2}\ln(2\sigma^2\pi)-y^2\right)~dy$

$\displaystyle =\frac{1}{\ln b\sqrt{\pi}}\left(\int_{-\infty}^\infty\frac{1}{2}\ln(2\sigma^2\pi)~e^{-y^2}~dy+\int_{-\infty}^\infty y^2~e^{-y^2}~dy\right) =\frac{1}{\ln b\sqrt{\pi}}\left(\frac{1}{2}\ln(2\sigma^2\pi)\cdot\sqrt{\pi}+\frac{\sqrt{\pi}}{2}\right) =\frac{1}{2\ln b}\left(\ln(2\sigma^2\pi)+1\right)$

$\displaystyle =\frac{1}{2\ln b}\ln\left(2\sigma^2\pi e\right) =\frac{1}{2}\log_b\left(2\sigma^2\pi e\right) ~.$

Note: $b$ is usually 2 for binary information unit, or "bit".

End of Article

 

<< Probability - Counting Top Probability - Discrete Probability Distributions >>