STAT 400

Fri. February 21st, 2020


Hypergeometric Distribution

You have a box of NN balls. MM of them are red (RR), and the rest (NMN-M) are blue (BB). If you pick nn balls from the box without replacing any, let XX be the number of red balls picked.

In this case, we say XX follows a hypergeometric distributionXX follows hypergeometric(n,M,N)\text{hypergeometric}(n,M,N).

Note that the probability of choosing a red ball changes as different types of balls are removed from the box. This is the property that distinguishes this type of distribution.

Ex. Suppose N=20N=20, M=12M=12 (so NM=8N-M=8), and n=5n=5. Find the probability mass function of XX.

Possible values of XX: 0,1,2,...50,1,2,...5.
p(0)=P(X=0)=(85)(205)p(1)=P(X=1)=(121)(84)(205)p(2)=P(X=2)=(122)(83)(205)p(3)=P(X=3)=(123)(82)(205)...\begin{aligned} p(0)=P(X=0)&=\frac{8 \choose 5}{20\choose 5}\\ p(1)=P(X=1)&=\frac{ {12 \choose 1}\cdot {8 \choose 4} }{20 \choose 5}\\ p(2)=P(X=2)&=\frac{ {12 \choose 2}\cdot {8 \choose 3} }{20 \choose 5}\\ p(3)=P(X=3)&=\frac{ {12 \choose 3}\cdot {8 \choose 2} }{20 \choose 5}\\ ... \end{aligned}


In the general case, values of XX have to satisfy these constraints for p(k)p(k) to be nonzero:
a) 0kn0\le k \le n
b) kMk\le M
c) 0nkNM0\le n-k \le N-M

These can be combined into one constraint: Max{0, nN+M}kMin{n, M}.\text{Max}\{0,\ n-N+M\}\le k \le \text{Min}\{n,\ M\}.

For any kk satisfying this constraint, p(k)=P(X=k)=(Mk)(NMnk)(Nn)p(k)=P(X=k)=\frac{ {M \choose k}{N-M \choose n-k} }{N \choose n}

Here, pp is the probability mass function for the hypergeometric distribution. It is usually represented by the notation h(k;n,M,n)h(k;n,M,n).


Expected Value and Variance

Let XX follow hypergeometric(n,N,M)\text{hypergeometric}(n,N,M). Then E(X)=nMNE(X)=n\cdot \frac{M}{N} and V(X)=(NnN1)nMN(1MN).V(X)=(\frac{N-n}{N-1})\cdot n\cdot \frac{M}{N}\cdot (1-\frac{M}{N}).

Note that if we let p=MNp=\frac{M}{N}, then E(X)=npE(X)=np and V(X)=np(1p)(NnN1)V(X)=np(1-p)\cdot (\frac{N-n}{N-1}).
E(X)E(X) here is exactly the same as that of the binomial distribution, and V(X)V(X) is the same as well except for the introduction of the rightmost term.
If NN is much larger than nn (N>>nN>>n), then that rightmost term will be very close to 11. This reflects the fact that when N>>nN>>n you can approximate the pmf of a hypergeometric distribution by using a binomial distribution table, since the chance of success will be barely affected by previous trials in this case.


Negative Binomial Distribution

We repeat an experiment until we have rr successes. For each experiment, we have the same probability of success p=P({S})p=P(\{S\}). Let XX be the number of failures before the rrth success. We say XX follows a negative binomial distributionXX follows NB(r,p)\text{NB}(r,p).

Ex. Suppose r=3r=3 and p=13p=\frac{1}{3}.
Possible values of XX: 0,1,2,...0,1,2,..., any natural number

If X=5X=5, then we know eight trials were performed, and that the last one has to have been a success. Thus, P(X=5)P(X=5) should be the probability of three successes and five failures occurring, multiplied by the number of ways that two successes can occur within the first seven trials.P(X=5)=(72)(13)3(113)5.P(X=5)={7\choose 2}(\frac{1}{3})^3(1-\frac{1}{3})^5.


In general if XX foloows NB(r,p)\text{NB}(r,p), then p(k)=P(X=k)=(k+r1r1)pr(1p)k.p(k)=P(X=k)={k+r-1\choose r-1}\cdot p^r(1-p)^{k}.


Expected Value and Variance

Let XX follow NB(r,p).\text{NB}(r,p). Then E(X)=r(1p)pE(X)=\frac{r(1-p)}{p} and V(X)=r(1p)p2.V(X)=\frac{r(1-p)}{p^2}.