$\require{cancel} \newcommand{\Ket}[1]{\left|{#1}\right\rangle} \newcommand{\Bra}[1]{\left\langle{#1}\right|} \newcommand{\Braket}[1]{\left\langle{#1}\right\rangle} \newcommand{\Rsr}[1]{\frac{1}{\sqrt{#1}}} \newcommand{\RSR}[1]{1/\sqrt{#1}} \newcommand{\Verti}{\rvert} \newcommand{\HAT}[1]{\hat{\,#1~}} \DeclareMathOperator{\Tr}{Tr}$

Probability - Normal Distribution¶

^{First created in April 2017}

Probability Density Function (PDF)¶

The PDF of a normal distribution $\mathcal{N}(\mu,\sigma)$ is $\displaystyle \boxed{p(x~|~\mu,\sigma^2)=\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)} ~.}$

Let us take it as $p(x)$ and prove that the mean is $\mu$ and variance $\sigma^2$. When $\mu=0$ and $\sigma\rightarrow 0$, $p(x)\rightarrow\delta(x)$, which is the Dirac delta function.

For convenience, here $\displaystyle y=\frac{x-\mu}{\sqrt{2\sigma^2}}$, and as a result $\displaystyle y^2=\frac{(x-\mu)^2}{2\sigma^2}$ and $dx=\sqrt{2\sigma^2}~dy$.

Gaussian integral and related:

$\displaystyle(1):\int_{-\infty}^\infty e^{-y^2}~dy=\sqrt{\pi}.$

$\displaystyle(2):\int_{-\infty}^\infty y~e^{-y^2}~dy=0.~$

Proof: $\displaystyle \frac{d}{dy}\left(-\frac{1}{2}~e^{-y^2}\right)=y~e^{-y^2}~,\quad \Big[-\frac{1}{2}~e^{-y^2}\Big]_{-\infty}^\infty=0 .$

$\displaystyle(3):\int_{-\infty}^\infty y^2~e^{-y^2}~dy=\frac{\sqrt{\pi}}{2}.~$

Proof: Let $u=y$ and $dv=y~e^{-y^2}~dy$, so $du=dy$ and $v=-\frac{1}{2}~e^{-y^2}$. $\displaystyle \int_{-\infty}^\infty~y^2~e^{-y^2}~dy =\int_{-\infty}^\infty u~dv =\Big[uv\Big]_{-\infty}^\infty-\int_{-\infty}^\infty v~du =\left[y\cdot\left(-\frac{1}{2}~e^{-y^2}\right)\right]_{-\infty}^\infty+\int_{-\infty}^\infty\frac{1}{2}~e^{-y^2}~dy =0+\frac{\sqrt{\pi}}{2} =\frac{\sqrt{\pi}}{2} .$

Critical Points¶

$\displaystyle p'(x) =\frac{d}{dx}\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)} =\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}\frac{d}{dx}\frac{-(x-\mu)^2}{2\sigma^2} =\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}\frac{-2(x-\mu)}{2\sigma^2}$

$\displaystyle =\frac{-(x-\mu)}{\sqrt{2\pi}~\sigma^3}~e^{-(x-\mu)^2/(2\sigma^2)} =0$, when $x=\mu$, where PDF peaks.

$\displaystyle p''(x) =\frac{d}{dx}\left(\frac{-(x-\mu)}{\sqrt{2\pi}~\sigma^3}~e^{-(x-\mu)^2/(2\sigma^2)}\right) =\frac{-1}{\sqrt{2\pi}~\sigma^3}~e^{-(x-\mu)^2/(2\sigma^2)} +\frac{-(x-\mu)}{\sqrt{2\pi}~\sigma^3}~e^{-(x-\mu)^2/(2\sigma^2)}\frac{d}{dx}\frac{-(x-\mu)^2}{2\sigma^2} ~.$

$\displaystyle =\left(\frac{-1}{\sqrt{2\pi}~\sigma^3} +\frac{(x-\mu)^2}{\sqrt{2\pi}~\sigma^5}\right)~e^{-(x-\mu)^2/(2\sigma^2)} =\frac{(x-\mu)^2-\sigma^2}{\sqrt{2\pi}~\sigma^5}~e^{-(x-\mu)^2/(2\sigma^2)} ~.$

When $x=\mu,~~p''(x)<0$ and $f(x)$ is at maximum.\quad When $x=\mu\pm\sigma,~~p''(x)=0$, which are the points of inflection.

Mean¶

Mean $\displaystyle =\int_{-\infty}^\infty x\cdot p(x)~dx =\int_{-\infty}^\infty x\cdot\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}~dx =\frac{1}{\sqrt{2\sigma^2\pi}}\left[\int_{-\infty}^\infty x~e^{-(x-\mu)^2/(2\sigma^2)}~dx\right]$

$\displaystyle =\frac{1}{\sqrt{2\sigma^2\pi}}\left[\int_{-\infty}^\infty(x-\mu)~e^{-(x-\mu)^2/(2\sigma^2)}~dx+\int_{-\infty}^\infty\mu~e^{-(x-\mu)^2/(2\sigma^2)}~dx\right] =\frac{1}{\sqrt{2\sigma^2\pi}}\left[\int_{-\infty}^\infty\sqrt{2\sigma^2}~y~e^{-y^2}~\sqrt{2\sigma^2}~dy +\mu\int_{-\infty}^\infty e^{-y^2}~\sqrt{2\sigma^2}~dy\right] ~.$

$\displaystyle =\frac{1}{\sqrt{2\sigma^2\pi}}\left[0+\mu~\sqrt{2\sigma^2}\sqrt{\pi}\right] =\mu ~.$

Variance¶

Variance $\displaystyle =\int_{-\infty}^\infty(x-\mu)^2\cdot p(x)~dx =\int_{-\infty}^\infty(x-\mu)^2\cdot\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}~dx$

$\displaystyle =\frac{1}{\sqrt{2\sigma^2\pi}}\int_{-\infty}^\infty 2\sigma^2~y^2\cdot e^{-y^2}~\sqrt{2\sigma^2}~dy =\frac{2\sigma^2}{\sqrt{\pi}}\int_{-\infty}^\infty y^2~e^{-y^2}~dy =\frac{2\sigma^2}{\sqrt{\pi}}\cdot\frac{\sqrt{\pi}}{2} =\sigma^2 ~.$

Error Function¶

Here is the probability that a sample falls in $[-x,x]$ over $\varphi(x)=\frac{1}{\sqrt{2\pi}}~e^{-x^2/2}$, a standard normal distribution with $\mu=0$ and $\sigma^2=1$:

$\displaystyle \int_{-x}^x\varphi(y)~dy =\int_{-x}^x\frac{1}{\sqrt{2\pi}}~e^{-y^2/2}~dy =\frac{1}{\sqrt{2\pi}}~\sqrt{2}~\int_{-x}^x e^{-t^2}~dt =\frac{1}{\sqrt{\pi}}~\int_{-x}^x e^{-t^2}~dt ~.$

To enables us to calculate the probability of a sample falls in $[a,b]$ by $\mathrm{erf}(b)-\mathrm{erf}(a),$ we define $\displaystyle \boxed{\mathrm{erf}(x) =\frac{2}{\sqrt{\pi}}~\int_{0}^x e^{-t^2}~dt ~}$.

The error function has a domain of $(-\infty,+\infty)$ and range $(-1,1)$.

Cumulative Distribution Function (CDF)¶

CDF $\displaystyle =\int_{-\infty}^x p(x)~dx =\int_{-\infty}^x\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}~dx =\frac{1}{\sqrt{2\sigma^2\pi}}\int_{-\infty}^ye^{-y^2}\sqrt{2\sigma^2}~dy$

$\displaystyle =\frac{1}{\sqrt{\pi}}\left(\int_{-\infty}^0 e^{-y^2}~dy+\int_0^ye^{-y^2}~dy\right) =\frac{1}{\sqrt{\pi}}\left(\frac{1}{2}\int_{-\infty}^\infty e^{-y^2}~dy+\frac{1}{2}\int_{-y}^y e^{-y^2}~dy\right) =\frac{1}{\sqrt{\pi}}\left(\frac{1}{2}\sqrt{\pi}+\frac{1}{2}\sqrt{\pi}\cdot\mathrm{erf}\left(\frac{x-\mu}{\sqrt{2\sigma^2}}\right)\right) ~.$

$\displaystyle \boxed{\mathrm{CDF}=\frac{1}{2}\left(1+\mathrm{erf}\left(\frac{x-\mu}{\sqrt{2\sigma^2}}\right)\right) ~.}$

Note: This will work even when $y<0$.

Entropy¶

Information about an event $\omega$ is denoted as $I(\omega)$. If the event is certain, its probability $P(\omega)=1$ and $I(\omega)=0$.

In other words, if the event is known to have happened, revealing it gives no information. If the event is unlikely to happen, revealing it convey significant information.

If there are $n$ possible events and the probability of event $\omega_k$ occurs is $P(\omega_k)$ where $k=1,2,\ldots,n$, then $I(\omega_k)=-\log(P(\omega_k))$.

The base of log is usually 2 to measure in bits (binary unit of information). If the base is $e$ (which is what physicists usually use), the result will be in "nat" (natural unit of information). Since $e>2, -\log_e(p)<-\log_2(p)$ as $p<1$, information measures to lower absolute value in nat than in binary, so is entropy.

This means one nat contains more information than one bit. The "exchange rate" is one bit (0 or 1 with 50-50 change) is 0.693 nat, and one nat is 1.443 bits.

Entropy is the expected value of $I(\omega_k)$:

$\displaystyle H(X) =E(I(X)) =\sum I(X)\cdot p(x)~\Delta x =\sum-\log(p(x)~\Delta x)\cdot p(x)~\Delta x =-\sum\Big(\log(p(x))\cdot p(x)~\Delta x+\log(\Delta x)\cdot p(x)~\Delta x\Big) ~.$

When $\displaystyle \Delta x\rightarrow 0,~~ E(I(X))\rightarrow-\sum\log(p(x))\cdot p(x)~\Delta x ~.$

$\displaystyle H(X) =-\int_{-\infty}^\infty p(x)\log_b(p(x))~dx =-\frac{1}{\ln b}\int_{-\infty}^\infty\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)} \ln\left(\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}\right)~dx$

$\displaystyle =-\frac{1}{\ln b}\int_{-\infty}^\infty\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-y^2} \ln\left(\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-y^2}\right)~\sqrt{2\sigma^2}~dy =-\frac{1}{\ln b\sqrt{\pi}}\int_{-\infty}^\infty e^{-y^2}\left(-\frac{1}{2}\ln(2\sigma^2\pi)-y^2\right)~dy$

$\displaystyle =\frac{1}{\ln b\sqrt{\pi}}\left(\int_{-\infty}^\infty\frac{1}{2}\ln(2\sigma^2\pi)~e^{-y^2}~dy+\int_{-\infty}^\infty y^2~e^{-y^2}~dy\right) =\frac{1}{\ln b\sqrt{\pi}}\left(\frac{1}{2}\ln(2\sigma^2\pi)\cdot\sqrt{\pi}+\frac{\sqrt{\pi}}{2}\right) =\frac{1}{2\ln b}\left(\ln(2\sigma^2\pi)+1\right)$

$\displaystyle =\frac{1}{2\ln b}\ln\left(2\sigma^2\pi e\right) =\frac{1}{2}\log_b\left(2\sigma^2\pi e\right) ~.$

Note: $b$ is usually 2 for binary information unit, or "bit".

Home	Algorithms	Commercialization	Data Science	Information Theories	Quantum Theories	Lab	Linear Algebra
<< Probability - Counting			PDF		Probability - Discrete Probability Distributions >>