The geometric distribution
How to model failure using the mathematics of probability
Suppose that you work in sales, and you want to model the number of sales calls that you need to make before you make a sale. It turns out that there is a way to describe this using mathematics!
The geometric distribution is a discrete probability distribution that models the number of binary trials needed for the first success. These trials have only two outcomes (success or failure), and they are assumed to be independent and identically distributed Bernoulli trials.
The probability of success in each trial is denoted by θ.
Hence, the probability of failure in each trial is 1–θ.
There are two ways to parameterize the geometric distribution; the variation hinges on the definition of the random variable.
1) The first parameterization models the total number of trials, k, needed to get the first success, including the successful trial itself. Note that k starts from 1, because the first trial can be a success. I will denote this random variable as X.
2) The second parameterization models the number of failures, h, before the first success occurs. Note that h starts from 0, because it is possible to have zero failures if the first trial is a success. I will denote this random variable as Y.
Both definitions are valid and commonly used, but it’s important to be clear about which one you’re using in a given context. The choice depends on how you want to model the problem at hand.
This distribution is useful in scenarios where you want to model the number of attempts needed to achieve a success, such as
the number of coin flips until you get heads.
the number of sales calls until you make a sale.
the number of dates that you go on until you find a compatible romantic partner.
All of these examples assume that you want to find only one success, but that may not always be true. For example, you may want to close multiple sales deals, or you may seek multiple romantic partners. It is possible to extend the geometric distribution to model multiple successes (not just one). This involves the negative binomial distribution, and I will write about this in a future post.


