In statistics and probability, we often randomly draw items out of a group. A common example is choosing names out of a hat to determine the winners of a prize. You can use the NumPy package in Python to do this.
Here is an example of such sampling without replacement. It uses the “random.choice()
” function from the NumPy package. Note the "replace=False
" option. The following code chooses 3 names out of 5:
import numpy as np
names = ['Anne', 'Bill', 'Cindy', 'Evan', 'Fiona']
sample_names = np.random.choice(names, 3, replace=False)
display(sample_names)
Of course, if you want to sample with replacement, you simply use the option "replace=True
" instead.
There are practical applications of this function in statistics, data analysis, and data science. Here are 3 examples.
A/B Testing
Suppose that you want to test two different versions of a website (A and B) to see which one leads to higher conversion rates. You can randomly assign incoming users to either website A or website B using np.random.choice()
. The option replace=False
is suitable to the assignment process, because a user can only experience one version at a time. This random sampling ensures that the 2 groups are as similar as possible, minimizing bias and allowing for a fair comparison.
Survey Research:
If you conduct a survey but cannot interview every person in a large population, then you can use np.random.choice()
to select a random and representative sample of individuals from the population to participate in the survey. The option replace=False
ensures that the same individual isn't accidentally contacted multiple times.
Quality Control
If you work at a manufacturing plant and want to assess the quality of your products, you can use np.random.choice()
to choose a random sample of products for inspection. Then, each product will have an equal chance of being selected, and no product will be inspected twice from the same sampling effort.