We use random variables to assign numbers to outcomes. This is notated as where is the random variable and is a sample space.
Definition. Any random variable whose only values are 0 or 1 is called a Bernoulli random variable.
Definition. A discrete random variable is one whose possible values are countable. (They can be listed in a finite or infinite sequence.) A continuous random variable is a random variable whose possible values are all the numbers in an interval or a union of such intervals, and the probability of each single value is 0 (that is, for all ).
Ex. Suppose we test a sequence of batteries until we reach one with an acceptable battery life. We denote acceptable battery life with and use otherwise. We define a random variable to be the number of failures in an outcome. We say that takes countably infinitely many possible values – thus it is discrete. This is different from, for example, a random variable that returns values over the interval of real numbers from 0 to 1, which contains uncountably infinitely many possible values.
Once we assign values to outcomes using random variables, the context of the problem doesn’t matter any more.
For example, when tossing a coin we can represent heads as 1 and tails as 0. Similarly, when performing an experiment we can represent success as 1 and failure as 0. We can now use the same expressions to represent both problems – we’ve abstracted the problems.
Definition. The probability distribution or probability mass function (pmf) of a discrete random variable is defined for every number by
By employing random variables, our probability distribution is now simply function that maps a discrete set of numbers to , so it’s easier for us to reason about.
Remark. By properties of probability, and
Ex. We have samples from five blood donors . Only and have O+ type blood. We test samples randomly one after another until we find one of type O+. Let be the number of tests until we find an O+ type sample. What is the probability distribution for this experiment?
Now that we have an exhaustive list of input-output pairs, we could write this data into a chart, line graph, or histogram for better clarity.
We can also find the value of expressions like and by summing the probabilities of the values that satisfy those constraints.
Definition. Suppose the probability mass function (pmf) depends on a quantity. Each possible value of this quantity determines a different probability distrubution. We call this such a quantity a parameter. The collection of all probability distributions corresponding to different parameters is called a family of probability distrubutions.