Home | Algorithms | Commercialization | Data Science | Information Theories | Quantum Theories | Lab | Linear Algebra |
<< Probability - Counting | Probability - Discrete Probability Distributions >> |
$\require{cancel} \newcommand{\Ket}[1]{\left|{#1}\right\rangle} \newcommand{\Bra}[1]{\left\langle{#1}\right|} \newcommand{\Braket}[1]{\left\langle{#1}\right\rangle} \newcommand{\Rsr}[1]{\frac{1}{\sqrt{#1}}} \newcommand{\RSR}[1]{1/\sqrt{#1}} \newcommand{\Verti}{\rvert} \newcommand{\HAT}[1]{\hat{\,#1~}} \DeclareMathOperator{\Tr}{Tr}$
First created in April 2017
The PDF of a normal distribution $\mathcal{N}(\mu,\sigma)$ is $\displaystyle \boxed{p(x~|~\mu,\sigma^2)=\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)} ~.}$
Let us take it as $p(x)$ and prove that the mean is $\mu$ and variance $\sigma^2$. When $\mu=0$ and $\sigma\rightarrow 0$, $p(x)\rightarrow\delta(x)$, which is the Dirac delta function.
For convenience, here $\displaystyle y=\frac{x-\mu}{\sqrt{2\sigma^2}}$, and as a result $\displaystyle y^2=\frac{(x-\mu)^2}{2\sigma^2}$ and $dx=\sqrt{2\sigma^2}~dy$.
Gaussian integral and related:
$\displaystyle(1):\int_{-\infty}^\infty e^{-y^2}~dy=\sqrt{\pi}.$
$\displaystyle(2):\int_{-\infty}^\infty y~e^{-y^2}~dy=0.~$
Proof: $\displaystyle \frac{d}{dy}\left(-\frac{1}{2}~e^{-y^2}\right)=y~e^{-y^2}~,\quad \Big[-\frac{1}{2}~e^{-y^2}\Big]_{-\infty}^\infty=0 .$
$\displaystyle(3):\int_{-\infty}^\infty y^2~e^{-y^2}~dy=\frac{\sqrt{\pi}}{2}.~$
Proof: Let $u=y$ and $dv=y~e^{-y^2}~dy$, so $du=dy$ and $v=-\frac{1}{2}~e^{-y^2}$. $\displaystyle \int_{-\infty}^\infty~y^2~e^{-y^2}~dy =\int_{-\infty}^\infty u~dv =\Big[uv\Big]_{-\infty}^\infty-\int_{-\infty}^\infty v~du =\left[y\cdot\left(-\frac{1}{2}~e^{-y^2}\right)\right]_{-\infty}^\infty+\int_{-\infty}^\infty\frac{1}{2}~e^{-y^2}~dy =0+\frac{\sqrt{\pi}}{2} =\frac{\sqrt{\pi}}{2} .$
$\displaystyle p'(x) =\frac{d}{dx}\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)} =\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}\frac{d}{dx}\frac{-(x-\mu)^2}{2\sigma^2} =\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}\frac{-2(x-\mu)}{2\sigma^2}$
$\displaystyle =\frac{-(x-\mu)}{\sqrt{2\pi}~\sigma^3}~e^{-(x-\mu)^2/(2\sigma^2)} =0$, when $x=\mu$, where PDF peaks.
$\displaystyle p''(x) =\frac{d}{dx}\left(\frac{-(x-\mu)}{\sqrt{2\pi}~\sigma^3}~e^{-(x-\mu)^2/(2\sigma^2)}\right) =\frac{-1}{\sqrt{2\pi}~\sigma^3}~e^{-(x-\mu)^2/(2\sigma^2)} +\frac{-(x-\mu)}{\sqrt{2\pi}~\sigma^3}~e^{-(x-\mu)^2/(2\sigma^2)}\frac{d}{dx}\frac{-(x-\mu)^2}{2\sigma^2} ~.$
$\displaystyle =\left(\frac{-1}{\sqrt{2\pi}~\sigma^3} +\frac{(x-\mu)^2}{\sqrt{2\pi}~\sigma^5}\right)~e^{-(x-\mu)^2/(2\sigma^2)} =\frac{(x-\mu)^2-\sigma^2}{\sqrt{2\pi}~\sigma^5}~e^{-(x-\mu)^2/(2\sigma^2)} ~.$
When $x=\mu,~~p''(x)<0$ and $f(x)$ is at maximum.\quad When $x=\mu\pm\sigma,~~p''(x)=0$, which are the points of inflection.
Mean $\displaystyle =\int_{-\infty}^\infty x\cdot p(x)~dx =\int_{-\infty}^\infty x\cdot\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}~dx =\frac{1}{\sqrt{2\sigma^2\pi}}\left[\int_{-\infty}^\infty x~e^{-(x-\mu)^2/(2\sigma^2)}~dx\right]$
$\displaystyle =\frac{1}{\sqrt{2\sigma^2\pi}}\left[\int_{-\infty}^\infty(x-\mu)~e^{-(x-\mu)^2/(2\sigma^2)}~dx+\int_{-\infty}^\infty\mu~e^{-(x-\mu)^2/(2\sigma^2)}~dx\right] =\frac{1}{\sqrt{2\sigma^2\pi}}\left[\int_{-\infty}^\infty\sqrt{2\sigma^2}~y~e^{-y^2}~\sqrt{2\sigma^2}~dy +\mu\int_{-\infty}^\infty e^{-y^2}~\sqrt{2\sigma^2}~dy\right] ~.$
$\displaystyle =\frac{1}{\sqrt{2\sigma^2\pi}}\left[0+\mu~\sqrt{2\sigma^2}\sqrt{\pi}\right] =\mu ~.$
Variance $\displaystyle =\int_{-\infty}^\infty(x-\mu)^2\cdot p(x)~dx =\int_{-\infty}^\infty(x-\mu)^2\cdot\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}~dx$
$\displaystyle =\frac{1}{\sqrt{2\sigma^2\pi}}\int_{-\infty}^\infty 2\sigma^2~y^2\cdot e^{-y^2}~\sqrt{2\sigma^2}~dy =\frac{2\sigma^2}{\sqrt{\pi}}\int_{-\infty}^\infty y^2~e^{-y^2}~dy =\frac{2\sigma^2}{\sqrt{\pi}}\cdot\frac{\sqrt{\pi}}{2} =\sigma^2 ~.$
Here is the probability that a sample falls in $[-x,x]$ over $\varphi(x)=\frac{1}{\sqrt{2\pi}}~e^{-x^2/2}$, a standard normal distribution with $\mu=0$ and $\sigma^2=1$:
$\displaystyle \int_{-x}^x\varphi(y)~dy =\int_{-x}^x\frac{1}{\sqrt{2\pi}}~e^{-y^2/2}~dy =\frac{1}{\sqrt{2\pi}}~\sqrt{2}~\int_{-x}^x e^{-t^2}~dt =\frac{1}{\sqrt{\pi}}~\int_{-x}^x e^{-t^2}~dt ~.$
To enables us to calculate the probability of a sample falls in $[a,b]$ by $\mathrm{erf}(b)-\mathrm{erf}(a),$ we define $\displaystyle \boxed{\mathrm{erf}(x) =\frac{2}{\sqrt{\pi}}~\int_{0}^x e^{-t^2}~dt ~}$.
The error function has a domain of $(-\infty,+\infty)$ and range $(-1,1)$.
CDF $\displaystyle =\int_{-\infty}^x p(x)~dx =\int_{-\infty}^x\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}~dx =\frac{1}{\sqrt{2\sigma^2\pi}}\int_{-\infty}^ye^{-y^2}\sqrt{2\sigma^2}~dy$
$\displaystyle =\frac{1}{\sqrt{\pi}}\left(\int_{-\infty}^0 e^{-y^2}~dy+\int_0^ye^{-y^2}~dy\right) =\frac{1}{\sqrt{\pi}}\left(\frac{1}{2}\int_{-\infty}^\infty e^{-y^2}~dy+\frac{1}{2}\int_{-y}^y e^{-y^2}~dy\right) =\frac{1}{\sqrt{\pi}}\left(\frac{1}{2}\sqrt{\pi}+\frac{1}{2}\sqrt{\pi}\cdot\mathrm{erf}\left(\frac{x-\mu}{\sqrt{2\sigma^2}}\right)\right) ~.$
$\displaystyle \boxed{\mathrm{CDF}=\frac{1}{2}\left(1+\mathrm{erf}\left(\frac{x-\mu}{\sqrt{2\sigma^2}}\right)\right) ~.}$
Note: This will work even when $y<0$.
Information about an event $\omega$ is denoted as $I(\omega)$. If the event is certain, its probability $P(\omega)=1$ and $I(\omega)=0$.
In other words, if the event is known to have happened, revealing it gives no information. If the event is unlikely to happen, revealing it convey significant information.
If there are $n$ possible events and the probability of event $\omega_k$ occurs is $P(\omega_k)$ where $k=1,2,\ldots,n$, then $I(\omega_k)=-\log(P(\omega_k))$.
The base of log is usually 2 to measure in bits (binary unit of information). If the base is $e$ (which is what physicists usually use), the result will be in "nat" (natural unit of information). Since $e>2, -\log_e(p)<-\log_2(p)$ as $p<1$, information measures to lower absolute value in nat than in binary, so is entropy.
This means one nat contains more information than one bit. The "exchange rate" is one bit (0 or 1 with 50-50 change) is 0.693 nat, and one nat is 1.443 bits.
Entropy is the expected value of $I(\omega_k)$:
$\displaystyle H(X) =E(I(X)) =\sum I(X)\cdot p(x)~\Delta x =\sum-\log(p(x)~\Delta x)\cdot p(x)~\Delta x =-\sum\Big(\log(p(x))\cdot p(x)~\Delta x+\log(\Delta x)\cdot p(x)~\Delta x\Big) ~.$
When $\displaystyle \Delta x\rightarrow 0,~~ E(I(X))\rightarrow-\sum\log(p(x))\cdot p(x)~\Delta x ~.$
$\displaystyle H(X) =-\int_{-\infty}^\infty p(x)\log_b(p(x))~dx =-\frac{1}{\ln b}\int_{-\infty}^\infty\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)} \ln\left(\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-(x-\mu)^2/(2\sigma^2)}\right)~dx$
$\displaystyle =-\frac{1}{\ln b}\int_{-\infty}^\infty\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-y^2} \ln\left(\frac{1}{\sqrt{2\sigma^2\pi}}~e^{-y^2}\right)~\sqrt{2\sigma^2}~dy =-\frac{1}{\ln b\sqrt{\pi}}\int_{-\infty}^\infty e^{-y^2}\left(-\frac{1}{2}\ln(2\sigma^2\pi)-y^2\right)~dy$
$\displaystyle =\frac{1}{\ln b\sqrt{\pi}}\left(\int_{-\infty}^\infty\frac{1}{2}\ln(2\sigma^2\pi)~e^{-y^2}~dy+\int_{-\infty}^\infty y^2~e^{-y^2}~dy\right) =\frac{1}{\ln b\sqrt{\pi}}\left(\frac{1}{2}\ln(2\sigma^2\pi)\cdot\sqrt{\pi}+\frac{\sqrt{\pi}}{2}\right) =\frac{1}{2\ln b}\left(\ln(2\sigma^2\pi)+1\right)$
$\displaystyle =\frac{1}{2\ln b}\ln\left(2\sigma^2\pi e\right) =\frac{1}{2}\log_b\left(2\sigma^2\pi e\right) ~.$
Note: $b$ is usually 2 for binary information unit, or "bit".
<< Probability - Counting | Top | Probability - Discrete Probability Distributions >> |