Monte Carlo Simulations
What is Monte Carlo Simulations?
One of the main motivations to switch from spreadsheet-type tools (such as Microsoft Excel) to a program like R is for simulation modeling. R allows us to repeat the same (potentially complex and detailed) calculations with different random values over and over again.
Within the same software, we can then summarize and plot the results of these replicated calculations. Monte Carlo methods are used to perform this type of analysis they randomly sample from a set of values in order to generate and summarize a distribution of some statistic related to the sampled quantities.
Randomness
Random processes are an important aspect of simulation modeling. A random process is one that produces a different result each time it is run according to some rules. They are inextricably tied to the concept of uncertainty, you have no idea what will happen the next time the process is run.
There are two basic ways to introduce randomness in R
Random deviattes
Resampling
Random Deviates
Each person alive at the start of the year has the option of living or dying at the conclusion of the year. There are two possible endings here, and each person has an 80% probability of surviving. survive is the outcome of a binomial random process in which there were n individuals alive at the start of this year and p is the probability that any one of them would live to the next year.
In R, we can simulate a binomial random process with p=0.8 and n=100.
rbinom(n= 1, size =100,
prob= 0.8)
## [1] 80
At this time I got 73, but we almost certainly get different number than this one.
With a little tinkering, we can also plot it
survivors = rbinom(1000,
100, 0.8)
hist(survivors,
col = "skyblue")
We could also used other processes like log normal
The log normal process is another random process. It creates random numbers using a log of the values that is regularly distributed, with a mean of log mean and a standard deviation of log sd.
hist(rlnorm(1000,0,0.1),col="skyblue")
Need for sampling
There are several situations in probability, and more broadly in machine learning, where an analytical solution cannot be calculated immediately. In Machine Learning, a problem of class imbalance exists. In fact, some would argue that for most practical probabilistic models, accurate inference is impossible.
The desired calculation is usually a sum of discrete distributions or an integral of continuous distributions, and thus is computationally difficult. For a variety of reasons, such as the huge number of random variables, the stochastic nature of the domain, noise in the data, a shortage of observations, and more, the calculation may be intractable.
Resampling
Using random deviates to generate fresh random numbers is excellent, but what if we already have a set of numbers to which we want to add randomness? We can utilize resampling techniques to do this. To sample size elements from the vector x in R, use the sample() function.
##Resampling of 1 to 10
sample(x = 1:10, size =5)
## [1] 4 3 10 9 2
Sample with replacement
sample(x = c("a","b","c"), size = 10, replace = T)
## [1] "a" "a" "a" "b" "a" "b" "a" "c" "b" "b"
Sample with set probalilities
sample(x = c("live","die"),size = 10, replace = T, prob = c(0.8,0.2))
## [1] "live" "live" "die" "die" "live" "live" "live" "die" "live" "live"
Reproducing Randomness
We may want to receive the same precise random integers each time we run our script for reproducibility. To do so, we must first set the random seed, which is the starting point of our computer’s random number generator.
set.seed(1234)
rnorm(1)
## [1] -1.207066
Let’s try without random seed
rnorm(1)
## [1] 0.2774292
Each time we get different result.
Replication
To use Monte Carlo methods, we need to be able to replicate some random
process many times. There are two main ways this is commonly done,
either with replicate()
or with for()
loops.
The replicate() functions executes same expression many times and returns the output from each execution. Say we have a vector x, which represents 40 observations of an animal length(mm).
x = rnorm(30, 500,40)
We want to create the mean length sampling distribution “by hand.” We can take a random sample, determine the mean, and then repeat the process as many times as necessary.
Replication with “for” loop
A loop is a command in programming that repeats itself until it reaches a specified point. R has a few types of loops, repeat(), while(), and for(). for() loops are among the most common in simuation modeling.For each value in a vector, a for() loop performs an operation for the number of times you specify.
For loop syntax
for(var in seq){ expression(var) }
for( i in 1:5){
print(i^2)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25
nt = 100
N= NULL
N[1] = 1000
for(t in 2:nt){
N[t] = (N[t-1]*1.1*rlnorm(1,0,0.1))*(1-0.08)
}
Let’s plot it
plot(N, type= "l", pch = 15, xlab = "Year", ylab = "Abundance")
Summarization of simulation
After replicating a calculation many times, we will need to summarize the results.
Simulating Based Learning
mu = 500
sig = 30
random = rnorm(100,mu,sig)
p = seq(0.01, 0.99, 0.01)
random_q = quantile(random,p)
normal_q = qnorm(p,mu,sig)
plot(normal_q~random_q)
abline(c(0,1))
q = seq(400,600,10)
random_cdf = ecdf(random)
random_p =random_cdf(q)
normal_p = pnorm(q,mu,sig)
plot(normal_p~q, type= "l", col = "blue")
points(random_p~q,col = "red")
Comments