The binomial distribution and its 4 key assumptions
A generalization of the Bernoulli distribution
The binomial distribution is a discrete probability distribution that models the number of successes in a fixed number of independent Bernoulli trials1. Here is the probability mass function (PMF) of the binomial distribution.
For a random variable to follow a binomial distribution, it must fulfill 4 key assumptions:
Fixed number of trials (n): One binomial random variable consists of a sequence of n smaller Bernoulli trials, where n is a finite and pre-determined number.
Two mutually exclusive outcomes: Each Bernoulli trial has the same two possible outcomes, often labeled as "success2" and "failure." These outcomes are mutually exclusive, meaning that one cannot occur if the other does.
Independence of trials: The outcome of one trial does not affect the outcome of any other trial. Each trial is independent of the others.
Constant probability of success (𝛳): The probability of success (𝛳) remains the same for every trial. Consequently, the probability of failure (1−𝛳) also remains constant across all trials.
In summary, for a random process to be modelled by a binomial distribution, you need a fixed number of independent trials, each with only two possible outcomes and the same probability of success.
With a Master’s degree in statistics and over a decade of work experience, these 4 assumptions look simple and straightforward now. However, it took me a while to grasp all 4 of intuitively during my first statistics course. In a later post, I will use an illustrative example to make these assumptions more concrete. Please stay tuned!
Recall that the Bernoulli distribution is a discrete probability distribution that models a random variable with only 2 possible outcomes:
"success" (usually denoted by 1),
"failure" (usually denoted by 0).
An example is flipping a coin once; it can either land heads (success) or tails (failure).
I put the words “success” and “failure” in quotation marks, because there is nothing inherently positive or negative about these outcomes. “Success” often denotes the outcome of interest, but it does not have to be a good or preferable outcome. In biostatistics, medicine, and clinical trials, statisticians use the Bernoulli distribution to describe a patient who may die of a certain disease. Death is a bad outcome, so it is not a success in the regular sense of the word, but it is the outcome that draws attention in this context.