Sampling Methods in MATLAB
In this article I will cover some practical concepts around sampling methods to generate random numbers in MATLAB. As I have written before, one characteristic of many discrete event simulations is that they include elements that are random or stochastic. For example, interarrival times of entities or how long it takes to perform a certain activity by a server.
In such cases, a probability distribution could be used to represent those statistics. However, on other occasions it could be acceptable to model something that appears to be probabilistic as deterministic. By deciding to use an average value for example. The decision on whether to use one approach or the other is normally influenced by the variation seen in the system; so it is important to have a feel for what this variation may be. Remember that all models are approximations, and the aim is to develop a model that is appropriate for its intended purpose.
Some Principles
Random sampling is used within a discrete simulation so as to produce, from a probability distribution, a set of samples that have two properties:
MATLAB has built-in functions that allow for random sampling, and we will see this explained further down in this article. However, it is worth mentioning that random sampling algorithms are based on a two-stage process which is:
Random Number Generation
As explained in this page, MATLAB offers a number of methods to generate random numbers. In this article I will focus on two, rand for uniformly distributed random numbers and randn normally distributed random numbers. For controlling the repeatability of results, there exists the rng function that we can use to set the random generation seed and algorithm.
rand
rand is a function that can be used to generate random numbers from a uniform distribution. This one is particularly useful when trying to sample numbers from a another distribution. See below for an example that generates 1000 pairs of random numbers using the default setting. A histogram of values and a scatter plot are shown to have a quick overview of the values distribution. There are formal statistical methods to test random number generators (most notably chi-square and Kolmogorov-Smirnov tests); the following charts are used for illustration purposes.
A = rand(2,1000);
X = A(1,:);
Y = A(2,:);
figure(1);
scatter(X,Y);
figure(2);
n = histogram(X,30);
The example below is for generating 1000 pairs of numbers using the combined multiple recursive algorithm:
rng(5, 'combRecursive');
m = rand(2,1000);
X = m(1, :);
Y = m(2, :);
figure(1);
scatter(X,Y);
figure(2);
n = histogram(X,30);
randn
With randn we can generate random numbers from a normal distribution. The example below illustrates the generation of 10000 pairs of random numbers using the default setting.
rv = randn(10000,2);
X = rv(:, 1);
Y = rv(:, 2);
figure(1);
scatter(X,Y);
figure(2);
n = histogram(X,30);
Random Sampling
Top-Hat Sampling
Most random sampling algorithms are based on a method called "top-hat sampling", which relies on the conversion of a histogram into its cumulative probability form.
The name "top-hat" comes from the metaphor of drawing counters out of a hat. Take 100 counters and number them in the same proportion as the histogram would indicate. So if a value in the histogram represents 5% of the total, we would mark 5 counters with that value. If there was another value that represents 20% of the total, then we would mark another 20 values and so on. If we were going to place the counters then in the hat, shake it and then take one counter out, the hope is that is that if enough samples are taken, the sample histogram would be the same as the base histogram.
Naturally, in discrete event simulations hats are not needed, random numbers are used instead. These are then converted into the correct distribution by using a look-up table.
In MATLAB we can achieve this with the handy function randsample.
Given an array of values X, whose probability is known and stored in another array P, we can sample one random number as below:
P = [0.3 0.2 0.5];
X = [1 2 3];
randomValue = randsample(X, 1, true, P);
We can generate more values as needed by passing down the desired value on the second parameter. To illustrate this, I created a histogram based on 10000 random samples based on the example above:
Recommended by LinkedIn
Sampling from histograms
When data is available, we can use the fitdist function to generate numbers from a distribution that approximates that of our sample data. It is advisable to have an understanding of the the distribution we want to use; fortunately for us MATLAB provides ample resources in their help site.
Say we have been measuring the cycle time of a certain activity for a number of months and we've got the following data (units are days):
How can we generate random numbers that fit such distribution?
First of all we need to think about what distribution that is suitable for our needs. Given that our data is recorded in days, using positive integer numbers, we could argue that a discrete distribution is possibly a good fit. Out of the distributions we can choose in MATLAB, we can employ the Negative Binomial Distribution for this example.
The code to generate a random number from such distribution is as follows:
dataSample = [12; 22; 28; 10; ...];
pd = fitdist(dataSample, 'NegativeBinomial');
rn = random(pd);
In order to test if the distribution is a good fit for generating random numbers we can calculate the different statistical moments and see if they match the ones from our sample (that is average, median and standard deviation).
rn = random(pd,1000); % generate a 1000x1000 matrix of random numbers
fprintf('Average = %g \n', mean(rn,'all'));
fprintf('Median = %g \n', median(rn,'all'));
fprintf('Standard Deviation = %g \n', std(rn(:)));
The snippet above returns...
Average = 14.6243
Median = 14
Standard Deviation = 7.05002
Now that we have seen an example on how to generate random numbers from a discrete distribution, let's go over some useful distributions to use in discrete event simulations.
Useful Distributions
Negative Exponential
The Negative Exponential distribution is normally used to model interarrival times for generating entities (i.e. customers arriving to a supermarket, trucks that arrive to a depot, etc.)
The following code generates a sample of 10000 exponentially distributed random numbers with mean 5.
rn = random('Exponential',5,100);
Triangular
The triangular distribution is often used when it is possible to estimate a maximum, minimum and mode for a distribution but data is scarce or non-existent. This distribution can be used instead of a beta distribution in risk analysis.
The following snippet shows to create a triangular distribution with low limit equals to 5, highest of 30 and a mode value of 10. This distribution is then used to generate a 100x100 matrix of random numbers.
pd = makedist('Triangular','a',5,'b',10,'c',30);
rn = random(pd,100);
x = 0:1:30;
pdf_tri = pdf(pd, x);
plot(x,pdf_tri,'LineWidth',2);
hold;
figure(2);
n = histogram(rn(:),10);
The following charts were generated by the snippet above. They show the probability density function and the histogram of the 10000 random numbers.
Conclusion
This article covered few ways of generating random numbers in MATLAB and sampling from known distributions. As a DES modeler, it is crucial to know how to do this in order to better represent variation in the system of study. Luckily for us, MATLAB has a comprehensive library at our disposal that makes this task really easy.
Hopefully the methods shown above can serve as an introduction to the subject. For a more thorough overview of sampling methods I recommend taking a look at Michael Pidd's Computer simulation in management science.