Variance and Standard Deviation
We use the concept of variance to measure of the deviation of a random variable X X X from its expected value μ \mu μ .
Definition. Let X X X be a discrete random variable with probability mass function p ( x ) p(x) p ( x ) and expected value μ \mu μ . The variance of X X X , denoted by V ( X ) V(X) V ( X ) or sometimes Var ( X ) \text{Var}(X) Var ( X ) (or σ X 2 \sigma_X^2 σ X 2 or σ 2 \sigma^2 σ 2 ) is V ( X ) = ∑ y ∈ D ( y − μ ) 2 ⋅ p ( y ) = E ( ( x − μ ) 2 ) . V(X)=\sum_{y\in D}(y-\mu)^2\cdot p(y)=E((x-\mu)^2). V ( X ) = y ∈ D ∑ ( y − μ ) 2 ⋅ p ( y ) = E ( ( x − μ ) 2 ) .
We define the standard deviation (SD) of X X X as σ X = σ X 2 = V ( X ) . \sigma_X=\sqrt{\sigma_X^2}=\sqrt{V(X)}. σ X = σ X 2 = V ( X ) .
Ex. X X X follows Bernoulli ( α ) \text{Bernoulli}(\alpha) Bernoulli ( α ) . Calculate V ( X ) V(X) V ( X ) . E ( X ) = α V ( X ) = E ( ( x − 2 ) 2 ) = ( 0 − α ) 2 ⋅ ( 1 − α ) + ( 1 − α ) 2 ⋅ ( α ) = α 2 ( 1 − α ) + α ( 1 − α ) 2 = α ( 1 − α ) ( α + ( 1 − α ) ) = α ( 1 − α ) . \begin{aligned}
E(X)&=\alpha\\
V(X)&=E((x-2)^2)\\
&=(0-\alpha)^2\cdot (1-\alpha)+(1-\alpha)^2\cdot (\alpha)\\
&=\alpha^2(1-\alpha)+\alpha(1-\alpha)^2\\
&=\alpha(1-\alpha)(\alpha+(1-\alpha))\\
&=\alpha(1-\alpha).
\end{aligned} E ( X ) V ( X ) = α = E ( ( x − 2 ) 2 ) = ( 0 − α ) 2 ⋅ ( 1 − α ) + ( 1 − α ) 2 ⋅ ( α ) = α 2 ( 1 − α ) + α ( 1 − α ) 2 = α ( 1 − α ) ( α + ( 1 − α ) ) = α ( 1 − α ) .
Ex. X X X follows Geo ( p ) \text{Geo}(p) Geo ( p ) . Calculate V ( X ) V(X) V ( X ) . E ( X ) = 1 p V ( X ) = ∑ k = 1 ∞ ( k − 1 p ) 2 ⋅ ( 1 − p ) k − 1 p = ∑ k = 1 ∞ ( k 2 − 2 k p + 1 p 2 ) ⋅ ( 1 − p ) k − 1 p = ( ∑ k = 1 ∞ 1 p 2 ( 1 − p ) k − 1 p ) − ( ∑ k = 1 ∞ 2 k p ( 1 − p ) k − 1 p ) + ( ∑ k = 1 ∞ k 2 ( 1 − p ) k − 1 p ) = 1 p 2 ( ∑ k = 1 ∞ ( 1 − p ) k − 1 p ) − 2 p ( ∑ k = 1 ∞ k ( 1 − p ) k − 1 p ) + p ( ∑ k = 1 ∞ k 2 ( 1 − p ) k − 1 ) = 1 p 2 ( 1 ) − 2 p ( E ( X ) ) + p ∑ k = 1 ∞ k 2 ( 1 − p ) k − 1 = 1 p 2 − 2 p 2 + p ∑ k = 1 ∞ d d p ( k ( 1 − p ) k ) = − 1 p 2 + p d d p ( ∑ k = 1 ∞ k ( 1 − p ) k ) = − 1 p 2 + p d d p ( ( 1 − p ) ∑ k = 1 ∞ k ( 1 − p ) k − 1 ) = − 1 p 2 + p d d p ( ( 1 − p ) E ( X ) ) = − 1 p 2 + p d d p ( 1 p − 1 ) = − ( 1 p 2 + 1 p ) . \begin{aligned}
E(X)&=\frac{1}{p}\\
V(X)&=\sum_{k=1}^\infin (k-\frac{1}{p})^2 \cdot (1-p)^{k-1}p\\
&=\sum_{k=1}^\infin (k^2-\frac{2k}{p}+\frac{1}{p^2}) \cdot (1-p)^{k-1}p\\
&=(\sum_{k=1}^\infin \frac{1}{p^2}(1-p)^{k-1}p) - (\sum_{k=1}^\infin \frac{2k}{p}(1-p)^{k-1}p) + (\sum_{k=1}^\infin k^2(1-p)^{k-1}p)\\
&=\frac{1}{p^2}(\sum_{k=1}^\infin (1-p)^{k-1}p) - \frac{2}{p}(\sum_{k=1}^\infin k(1-p)^{k-1}p) + p(\sum_{k=1}^\infin k^2(1-p)^{k-1})\\
&=\frac{1}{p^2}(1)-\frac{2}{p}(E(X))+p\sum_{k=1}^\infin k^2(1-p)^{k-1}\\
&=\frac{1}{p^2}-\frac{2}{p^2}+p\sum_{k=1}^\infin \frac{d}{dp} (k(1-p)^{k})\\
&=\frac{-1}{p^2}+p \frac{d}{dp} (\sum_{k=1}^\infin k(1-p)^{k})\\
&=\frac{-1}{p^2}+p \frac{d}{dp} ((1-p)\sum_{k=1}^\infin k(1-p)^{k-1})\\
&=\frac{-1}{p^2}+p \frac{d}{dp} ((1-p)E(X))\\
&=\frac{-1}{p^2}+p \frac{d}{dp} (\frac{1}{p}-1)\\
&=-(\frac{1}{p^2}+\frac{1}{p}).
\end{aligned} E ( X ) V ( X ) = p 1 = k = 1 ∑ ∞ ( k − p 1 ) 2 ⋅ ( 1 − p ) k − 1 p = k = 1 ∑ ∞ ( k 2 − p 2 k + p 2 1 ) ⋅ ( 1 − p ) k − 1 p = ( k = 1 ∑ ∞ p 2 1 ( 1 − p ) k − 1 p ) − ( k = 1 ∑ ∞ p 2 k ( 1 − p ) k − 1 p ) + ( k = 1 ∑ ∞ k 2 ( 1 − p ) k − 1 p ) = p 2 1 ( k = 1 ∑ ∞ ( 1 − p ) k − 1 p ) − p 2 ( k = 1 ∑ ∞ k ( 1 − p ) k − 1 p ) + p ( k = 1 ∑ ∞ k 2 ( 1 − p ) k − 1 ) = p 2 1 ( 1 ) − p 2 ( E ( X ) ) + p k = 1 ∑ ∞ k 2 ( 1 − p ) k − 1 = p 2 1 − p 2 2 + p k = 1 ∑ ∞ d p d ( k ( 1 − p ) k ) = p 2 − 1 + p d p d ( k = 1 ∑ ∞ k ( 1 − p ) k ) = p 2 − 1 + p d p d ( ( 1 − p ) k = 1 ∑ ∞ k ( 1 − p ) k − 1 ) = p 2 − 1 + p d p d ( ( 1 − p ) E ( X ) ) = p 2 − 1 + p d p d ( p 1 − 1 ) = − ( p 2 1 + p 1 ) .
Proposition. V ( X ) = E ( x 2 ) − [ E ( X ) ] 2 . V(X)=E(x^2)-[E(X)]^2. V ( X ) = E ( x 2 ) − [ E ( X ) ] 2 . V ( X ) = E [ ( x − μ ) 2 ] = E ( x 2 − 2 μ x + μ 2 ) = E ( x 2 ) − E ( 2 μ x ) + μ 2 = E ( x 2 ) − 2 μ E ( x ) + μ 2 = E ( x 2 ) − 2 μ 2 + μ 2 = E ( x 2 ) − μ 2 = E ( x 2 ) − [ E ( X ) ] 2 . \begin{aligned}
V(X)&=E[(x-\mu)^2]\\
&=E(x^2-2\mu x+\mu^2)\\
&=E(x^2)-E(2\mu x)+\mu^2\\
&=E(x^2)-2\mu E(x)+\mu^2\\
&=E(x^2)-2\mu^2 + \mu^2\\
&=E(x^2)-\mu^2\\
&=E(x^2)-[E(X)]^2.
\end{aligned} V ( X ) = E [ ( x − μ ) 2 ] = E ( x 2 − 2 μ x + μ 2 ) = E ( x 2 ) − E ( 2 μ x ) + μ 2 = E ( x 2 ) − 2 μ E ( x ) + μ 2 = E ( x 2 ) − 2 μ 2 + μ 2 = E ( x 2 ) − μ 2 = E ( x 2 ) − [ E ( X ) ] 2 . We can use this fact to simplify the calculation of V ( X ) V(X) V ( X ) .
In general, given any function h h h , what does V ( h ( X ) ) V(h(X)) V ( h ( X ) ) look like? V ( h ( X ) ) = ∑ y ∈ D ( h ( y ) − μ h ( X ) ) 2 ⋅ p ( y ) . V(h(X))=\sum_{y\in D}(h(y)-\mu_{h(X)})^2\cdot p(y). V ( h ( X ) ) = y ∈ D ∑ ( h ( y ) − μ h ( X ) ) 2 ⋅ p ( y ) . This doesn’t tell us much for the general case, since calculating μ h ( X ) \mu_{h(X)} μ h ( X ) is not trivial.
However, when h h h is a linear function (that is, h ( x ) = a x + b h(x)=ax+b h ( x ) = a x + b ), then h ( y ) − μ a x + b = a y + b − ( a μ X + b ) = a ( y − μ X ) h(y)-\mu_{ax+b}=ay+b-(a\mu_{X}+b)=a(y-\mu_X) h ( y ) − μ a x + b = a y + b − ( a μ X + b ) = a ( y − μ X ) and V ( h ( X ) ) = V ( a x + b ) = ∑ y ∈ D a 2 ( y − μ X ) 2 ⋅ p ( y ) = a 2 ∑ y ∈ D ( y − μ X ) 2 ⋅ p ( y ) = a 2 V ( X ) . \begin{aligned}
V(h(X))&=V(ax+b)\\
&=\sum_{y\in D} a^2(y-\mu_X)^2 \cdot p(y)\\
&=a^2\sum_{y\in D} (y-\mu_X)^2 \cdot p(y)\\
&=a^2V(X).
\end{aligned} V ( h ( X ) ) = V ( a x + b ) = y ∈ D ∑ a 2 ( y − μ X ) 2 ⋅ p ( y ) = a 2 y ∈ D ∑ ( y − μ X ) 2 ⋅ p ( y ) = a 2 V ( X ) .
Thus, V ( a x + b ) = σ a x + b 2 = a 2 σ X 2 . V(ax+b)=\sigma_{ax+b}^2=a^2\sigma_{X}^2. V ( a x + b ) = σ a x + b 2 = a 2 σ X 2 .