The multinomial distribution

A generalization of the categorical distribution and binomial distribution

Jul 14, 2025

In statistics and probability, the multinomial distribution describes the probability of obtaining a specific combination of outcomes when you perform a fixed number of independent trials. Each trial can result in one of several possible categories, with each category having a fixed probability.

For a novice, that is a lot to absorb. Thus, if you are unfamiliar with the multinomial distribution, I encourage you to first learn about the categorical distribution and the binomial distribution. The multinomial distribution is a generalization of both.

The categorical distribution describes the outcome of a single trial with k possible categories. An example is rolling a single die, which has 6 possible outcomes.

The multinomial distribution extends the categorical distribution to multiple trials. It describes the outcomes of n independent trials, each with k possible categories. An example is rolling 3 dice; in this case, n=3, and k=6.

The binomial distribution deals with multiple trials that have k=2 possible outcomes (like flipping a coin). In an earlier example, I wrote about 4 binomial random variables, each flipping a coin 3 times, resulting in 12 total flips. (If you are not familiar with the binomial distribution, it may help to learn that it is a generalization of the Bernoulli distribution.)

The multinomial distribution handles experiments with two or more possible outcomes (k ≥ 2). One multinomial random variable can be rolling a fair die 7 times. A sample can have 8 of those random variables, resulting in 56 total rolls.

The multinomial distribution is widely used in various fields like:

Natural Language Processing: To model the frequency of words in a document.
Genetics: To predict the distribution of different genotypes in a population.
Marketing: To analyze customer preferences among multiple product categories.

The probability mass function (PMF) of the multinomial distribution is:

\(P(X_1 = x_1, X_2 = x_2, \dots, X_k = x_k) = \frac{n!}{x_1!x_2!\dots x_k!}p_1^{x_1}p_2^{x_2}\dots p_k^{x_k}\)

n is the total number of trials
xᵢ is the number of successes for category i
pᵢ is the probability of success for category i

It is important to remember the following 2 facts about xᵢ and pᵢ.

The sum of successes for all categories must equal to n.

\(\sum_{i=1}^{k} x_i = n\)

The sum of probabilities for all categories must equal to 1.

\(\sum_{i=1}^{k} p_i = 1\)

Finally, note that I am using the letter "n" to denote the number of trials; this is NOT the number of multinomial random variables in a sample. (For clarity and brevity, I have deliberately NOT added an extra subscript to denote the sample size. If you choose to do so, use another letter, such as "m".)

The multinomial distribution

A generalization of the categorical distribution and binomial distribution

Discussion about this post