NumPy Random Explained: Generate Random Arrays & Master Data Distributions in Python

   1.NumPy Random Overview
   2.Generating a Random Array
   3.Data Distribution
     a. Permutation and Shuffle
     b. Normal Distribution
     c. Binomial Distribution
     d. Poisson Distribution
     e. Uniform Distribution
     f. Logistic Distribution
     g. Multinomial Distribution
     h. Exponential Distribution
     i. Chi-Square Distribution
   4.Controlling Reproducibility with seed()

NumPy Random (numpy.random)

Random numbers play a crucial role in everything from data analysis to machine learning models. In NumPy, generating random data is simple and efficient thanks to the powerful numpy.random module.

With NumPy random functions, you can easily generate random numbers, create random arrays, and sample from probability distributions such as normal, binomial, and Poisson. These features are widely used for simulations, testing algorithms, and building data-driven models.

In this NumPy random tutorial, you'll learn how to generate random numbers in NumPy, explore different distributions, and understand when to use each one in practice.

For detailed instructions on how to install and set up Scikit-learn in Python, visit the NumPy installation guide.

If you need random sampling, you can use numpy.random module.

randint

The randint method generates an integer between 0 and the specified number:

from numpy import random

x = random.randint(50)
print(x)

The results may vary each time you run the code because they are randomly generated. The randint method generates an integer between 0-50 in the example above.

Generating a Random Array

randint

The randint generates a random array with the specified size:

from numpy import random

x=random.randint(100, size=(3))
print(x)

[28 45 69]

The results may vary each time you run the code because they are randomly generated. The size parameter specifies the shape of an array.

rand

The rand can generate a random array with floats. size parameter is used to specify the shape of the returned array.

from numpy import random

x = random.rand(2, 3)
print(x)

[[0.02119792 0.38030721 0.92025197]
[0.19259276 0.32973657 0.9148083 ]]

The example above generated a random array with 2 rows and 3 columns. The results may vary each time you run the code because they are randomly generated. The rand method generates arrays with floats.

choice

The choice method chooses an element from an array:

from numpy import random

x = random.choice([3, 5, 7, 9])
print(x)

The result is 7. The results may vary each time you run the code because they are randomly generated.

You can specify the shape of the array with the size parameter:

from numpy import random

x = random.choice([3, 5, 7, 9], size=(2,2))
print(x)

[[7 5]
[3 7]]

Data Distribution

You can use the random module for data distribution. It is used in machine learning and statistics.

Permutation and Shuffle

shuffle and permutation methods can be used. The shuffle method randomly shuffles elements of an array:

from numpy import random
import numpy as np

arr = np.array([2, 4, 6])
random.shuffle(arr)
print(arr)

[4 2 6]

The permutation method randomly permutes a sequence, or returns a permuted range.

from numpy import random
import numpy as np

x = np.array([2, 4, 6])
random.permutation(x)
print(x)

[4 2 6]

The results may vary each time you run the code because they are randomly generated.

**Functions like shuffle() and permutation() randomize the order of array elements. Without setting a seed, the result may differ each time you run the code. To ensure reproducibility, use np.random.seed() before calling these functions.

What's the Difference Between Shuffle and Permutation:

shuffle() changes the original array in place; it does not return a new array. permutation() returns a new shuffled array, leaving the original array unchanged. Quick analogy: shuffle() rearranges the items on the spot, while permutation() creates a shuffled copy.

Normal Distribution

The normal() method draws random samples from a normal (Gaussian) distribution. The data is symmetrically distributed around the mean.

from numpy import random

x = random.normal(size=(2,2))
print(x)

[[ 0.55432657 -0.80872333]
[-0.84319299 -0.53887013]]

The results may vary each time you run the code because they are randomly generated.

You can also use normal to create a random array with a size, mean and standard deviation:

from numpy import random

x = random.normal(loc=5, scale=2, size=(2, 3))
print(x)

[[9.66770454 4.29139593 3.98125015]
[3.19591534 5.3704507 5.30760289]]

The loc parameter refers to mean (5 in the example), the scale parameter (2 in the example) refers to standard deviation, and the size (2,3) shows the shape of the array in the example above.

If you want to show the visualization of your distribution, you can use the seaborn library.

When to Use the Normal Distribution

When to Use the Normal Distribution Use a normal distribution when modeling real-world data where most values cluster around the average and extreme values are rare. Common use cases include:
✓Heights and weights of people
✓Exam scores in large populations

Binomial Distribution

The binomial method draws samples from a binomial distribution. The binomial distribution is a discrete probability distribution, and it can have only two possible results.

from numpy import random

x = random.binomial(n=3, p=0.5, size=5)
print(x)

[0 2 2 2 1]

The parameter n, which denotes the number of independent trials in the distribution, should be greater than or equal to 0. Floats are accepted, but they will be truncated to integers. p is the parameter of the distribution between 0 and 1. size parameter shows the shape of the array.

The results may vary each time you run the code because they are randomly generated.

When to Use the Binomial Distribution

You should use a binomial distribution when modeling scenarios that involve a fixed number of independent trials, each with two possible outcomes (success or failure), and a constant probability of success. Common use cases include:
✓Flipping a coin a fixed number of times and counting heads
✓Answering yes/no survey questions and counting the number of “yes” responses

Poisson Distribution

The poisson distribution is a discrete probability distribution. It has 2 parameters: lam and size. The lam is the expected number of events occurring in a fixed-time interval. size is the output shape.

from numpy import random

x = random.poisson(lam=2, size=5)
print(x)

[0 0 4 2 1]

The results may vary each time you run the code because they are randomly generated.

See the difference between poisson distribution and exponential distribution.

When to Use the Poisson Distribution

You should use a Poisson distribution when modeling the number of events that occur in a fixed interval of time or space, especially when these events happen independently and at a constant average rate. Common use cases include:

✓Counting the number of cars passing through a toll booth in an hour
✓Counting the number of emails received in a day
✓Counting the number of phone calls at a call center in a fixed period

Uniform Distribution

All events have an equal chance of occurring in a uniform distribution. The uniform method draws samples from a uniform distribution. a parameter represents the lower bound and the default value is 0. b parameter represents the upper bound and the default value is 1. The size is the shape of the returned array:

from numpy import random

x = random.uniform(size=(2, 2))
print(x)

[[0.82522303 0.96689785]
[0.97534626 0.33729442]]

The results may vary each time you run the code because they are randomly generated.

When to Use the Uniform Distribution

You should use a uniform distribution when all outcomes in a given range are equally likely. It is ideal for modeling situations where no particular value is more probable than another. Common use cases include:

✓Simulating a fair dice roll
✓Generating random numbers for games or simple simulations
✓Picking a random winner from a group of participants

Logistic Distribution

The logistic method helps to draw samples from a logistic distribution. logistic distribution is used to model growth.

The logistic method has 3 parameters: loc, scale, and size. The loc represents the mean of the array, and the default value is 0. The scale represents the standard deviation, the default value is 1.

from numpy import random

x = random.logistic(loc=3, scale=2, size=(2, 2))
print(x)

[[ 2.17125771 5.84361473]
[13.19869728 3.51022713]]

The results may vary each time you run the code because they are randomly generated.

When to Use the Logistic Distribution

You should use a logistic distribution when modeling growth processes or probabilities that follow an S-shaped curve, often called a sigmoid curve. It is similar to the normal distribution but has heavier tails, which means extreme values are more likely. Common use cases include:

✓Modeling population growth or spread of a virus over time
✓Predicting probabilities in logistic regression models
✓Simulating scenarios where outcomes grow rapidly at first, then level off

Multinomial Distribution

The multinomial is used to draw samples from the multinomial distribution. The multinomial distribution is similar to the binomial distribution, but it may have p possible outcomes. It has 3 parameters: n, pvals, size. n represents the number of possible outcomes. pvals represents the list of possibilities of outcomes.

from numpy import random

x = random.multinomial(n=3, pvals=[1/3, 1/3, 1/3])
print(x)

[0 2 1]

The results may vary each time you run the code because they are randomly generated.

When to Use the Multinomial Distribution

You should use a multinomial distribution when modeling the outcomes of multiple independent trials, each of which can result in more than two possible categories. Common use cases include:

✓Rolling a die multiple times and counting how many times each face appears
✓Simulating survey results with multiple-choice questions
✓Modeling outcomes in board games or card games with several possible results

Exponential Distribution

The exponential is used to draw samples from an exponential distribution. It has 2 parameters: scale and size. The scale represents the inverse rate, the default value is 1.

from numpy import random

x = random.exponential(scale=5, size=(2, 2))
print(x)

[[ 1.8336431 4.72914034]
[ 1.41002221 14.83021248]]

The results may vary each time you run the code because they are randomly generated.

Exponential distribution deals with the time between events, poisson distribution deals with the number of events taking place in a given (time) period. While exponential distribution is a continuous probability distribution, poisson distribution is a discrete probability distribution.

When to Use the Exponential Distribution

You should use an exponential distribution when modeling the time between independent events that occur at a constant average rate. Common use cases include:

✓Measuring the time between arrivals of customers at a store
✓Modeling the lifetime of light bulbs or electronic components
✓Predicting waiting times in queues or call centers

Chi-Square Distribution

The chi-square distribution represents the probability distribution of several independent squared random variables. The numpy chisquare method draws samples from a chi-square distribution. It has 2 parameters: df and size. df is the number of degrees of freedom, it represents the independent random variables.

from numpy import random

x = random.chisquare(df=2, size=(2, 3))
print(x)

[[0.14248928 1.72224754 0.6367974 ]
[3.29349291 3.20701167 4.95018512]]

When to Use the Chi-square Distribution

You should use a chi-square distribution when working with variance or goodness-of-fit tests, especially to test how well observed data fits expected probabilities. Common use cases include:

✓Testing whether a coin is biased based on multiple flips
✓Comparing counts in different categories, like favorite ice cream flavors among a group
✓Comparing sample variances in experiments

Controlling Randomness with seed()

numpy.random.seed() is used to initialize the random number generator to a fixed state. This is useful when you want your random operations (like shuffle() or permutation()) to produce the same results every time — perfect for reproducibility in experiments or tutorials.

import numpy as np

np.random.seed(42) # Set the seed
arr = np.array([1, 2, 3, 4])
shuffled = np.random.permutation(arr)
print(shuffled) # Always outputs the same shuffled array

[2 4 1 3]

The results may vary each time you run the code because they are randomly generated.

Conclusion

NumPy's random module is a powerful tool for generating and working with random data in Python. From simple array creation to complex statistical distributions, it provides everything you need for simulations, analysis, and machine learning. By understanding functions like seed(), shuffle(), and various distributions, you can build more robust and reproducible data-driven applications.

You can explore the full NumPy tutorial.