The multinomial distribution
A generalization of the categorical distribution and binomial distribution
In statistics and probability, the multinomial distribution describes the probability of obtaining a specific combination of outcomes when you perform a fixed number of independent trials. Each trial can result in one of several possible categories, with each category having a fixed probability.
For a novice, that is a lot to absorb. Thus, if you are unfamiliar with the multinomial distribution, I encourage you to first learn about the categorical distribution and the binomial distribution. The multinomial distribution is a generalization of both.
The categorical distribution describes the outcome of a single trial with k possible categories. An example is rolling a single die, which has 6 possible outcomes.
The multinomial distribution extends the categorical distribution to multiple trials. It describes the outcomes of n independent trials, each with k possible categories. An example is rolling 3 dice; in this case, n=3, and k=6.
The binomial distribution deals with multiple trials that have k=2 possible outcomes (like flipping a coin). In an earlier example, I wrote about 4 binomial random variables, each flipping a coin 3 times, resulting in 12 total flips. (If you are not familiar with the binomial distribution, it may help to learn that it is a generalization of the Bernoulli distribution.)
The multinomial distribution handles experiments with two or more possible outcomes (k ≥ 2). One multinomial random variable can be rolling a fair die 7 times. A sample can have 8 of those random variables, resulting in 56 total rolls.
The multinomial distribution is widely used in various fields like:
Natural Language Processing: To model the frequency of words in a document.
Genetics: To predict the distribution of different genotypes in a population.
Marketing: To analyze customer preferences among multiple product categories.
The probability mass function (PMF) of the multinomial distribution is:
n is the total number of trials
xᵢ is the number of successes for category i
pᵢ is the probability of success for category i
It is important to remember the following 2 facts about xᵢ and pᵢ.
The sum of successes for all categories must equal to n.
The sum of probabilities for all categories must equal to 1.
Finally, note that I am using the letter "n" to denote the number of trials; this is NOT the number of multinomial random variables in a sample. (For clarity and brevity, I have deliberately NOT added an extra subscript to denote the sample size. If you choose to do so, use another letter, such as "m".)