STAT 400

Mon. February 17th, 2020


Variance and Standard Deviation

We use the concept of variance to measure of the deviation of a random variable XX from its expected value μ\mu.

Definition. Let XX be a discrete random variable with probability mass function p(x)p(x) and expected value μ\mu. The variance of XX, denoted by V(X)V(X) or sometimes Var(X)\text{Var}(X) (or σX2\sigma_X^2 or σ2\sigma^2) is V(X)=yD(yμ)2p(y)=E((xμ)2).V(X)=\sum_{y\in D}(y-\mu)^2\cdot p(y)=E((x-\mu)^2).
We define the standard deviation (SD) of XX as σX=σX2=V(X).\sigma_X=\sqrt{\sigma_X^2}=\sqrt{V(X)}.


Ex. XX follows Bernoulli(α)\text{Bernoulli}(\alpha). Calculate V(X)V(X). E(X)=αV(X)=E((x2)2)=(0α)2(1α)+(1α)2(α)=α2(1α)+α(1α)2=α(1α)(α+(1α))=α(1α).\begin{aligned} E(X)&=\alpha\\ V(X)&=E((x-2)^2)\\ &=(0-\alpha)^2\cdot (1-\alpha)+(1-\alpha)^2\cdot (\alpha)\\ &=\alpha^2(1-\alpha)+\alpha(1-\alpha)^2\\ &=\alpha(1-\alpha)(\alpha+(1-\alpha))\\ &=\alpha(1-\alpha). \end{aligned}


Ex. XX follows Geo(p)\text{Geo}(p). Calculate V(X)V(X). E(X)=1pV(X)=k=1(k1p)2(1p)k1p=k=1(k22kp+1p2)(1p)k1p=(k=11p2(1p)k1p)(k=12kp(1p)k1p)+(k=1k2(1p)k1p)=1p2(k=1(1p)k1p)2p(k=1k(1p)k1p)+p(k=1k2(1p)k1)=1p2(1)2p(E(X))+pk=1k2(1p)k1=1p22p2+pk=1ddp(k(1p)k)=1p2+pddp(k=1k(1p)k)=1p2+pddp((1p)k=1k(1p)k1)=1p2+pddp((1p)E(X))=1p2+pddp(1p1)=(1p2+1p).\begin{aligned} E(X)&=\frac{1}{p}\\ V(X)&=\sum_{k=1}^\infin (k-\frac{1}{p})^2 \cdot (1-p)^{k-1}p\\ &=\sum_{k=1}^\infin (k^2-\frac{2k}{p}+\frac{1}{p^2}) \cdot (1-p)^{k-1}p\\ &=(\sum_{k=1}^\infin \frac{1}{p^2}(1-p)^{k-1}p) - (\sum_{k=1}^\infin \frac{2k}{p}(1-p)^{k-1}p) + (\sum_{k=1}^\infin k^2(1-p)^{k-1}p)\\ &=\frac{1}{p^2}(\sum_{k=1}^\infin (1-p)^{k-1}p) - \frac{2}{p}(\sum_{k=1}^\infin k(1-p)^{k-1}p) + p(\sum_{k=1}^\infin k^2(1-p)^{k-1})\\ &=\frac{1}{p^2}(1)-\frac{2}{p}(E(X))+p\sum_{k=1}^\infin k^2(1-p)^{k-1}\\ &=\frac{1}{p^2}-\frac{2}{p^2}+p\sum_{k=1}^\infin \frac{d}{dp} (k(1-p)^{k})\\ &=\frac{-1}{p^2}+p \frac{d}{dp} (\sum_{k=1}^\infin k(1-p)^{k})\\ &=\frac{-1}{p^2}+p \frac{d}{dp} ((1-p)\sum_{k=1}^\infin k(1-p)^{k-1})\\ &=\frac{-1}{p^2}+p \frac{d}{dp} ((1-p)E(X))\\ &=\frac{-1}{p^2}+p \frac{d}{dp} (\frac{1}{p}-1)\\ &=-(\frac{1}{p^2}+\frac{1}{p}). \end{aligned}


Proposition. V(X)=E(x2)[E(X)]2.V(X)=E(x^2)-[E(X)]^2. V(X)=E[(xμ)2]=E(x22μx+μ2)=E(x2)E(2μx)+μ2=E(x2)2μE(x)+μ2=E(x2)2μ2+μ2=E(x2)μ2=E(x2)[E(X)]2.\begin{aligned} V(X)&=E[(x-\mu)^2]\\ &=E(x^2-2\mu x+\mu^2)\\ &=E(x^2)-E(2\mu x)+\mu^2\\ &=E(x^2)-2\mu E(x)+\mu^2\\ &=E(x^2)-2\mu^2 + \mu^2\\ &=E(x^2)-\mu^2\\ &=E(x^2)-[E(X)]^2. \end{aligned} We can use this fact to simplify the calculation of V(X)V(X).


Variance on Linear Functions

In general, given any function hh, what does V(h(X))V(h(X)) look like? V(h(X))=yD(h(y)μh(X))2p(y).V(h(X))=\sum_{y\in D}(h(y)-\mu_{h(X)})^2\cdot p(y). This doesn’t tell us much for the general case, since calculating μh(X)\mu_{h(X)} is not trivial.
However, when hh is a linear function (that is, h(x)=ax+bh(x)=ax+b), then h(y)μax+b=ay+b(aμX+b)=a(yμX)h(y)-\mu_{ax+b}=ay+b-(a\mu_{X}+b)=a(y-\mu_X) and V(h(X))=V(ax+b)=yDa2(yμX)2p(y)=a2yD(yμX)2p(y)=a2V(X).\begin{aligned} V(h(X))&=V(ax+b)\\ &=\sum_{y\in D} a^2(y-\mu_X)^2 \cdot p(y)\\ &=a^2\sum_{y\in D} (y-\mu_X)^2 \cdot p(y)\\ &=a^2V(X). \end{aligned}

Thus, V(ax+b)=σax+b2=a2σX2.V(ax+b)=\sigma_{ax+b}^2=a^2\sigma_{X}^2.