Random deviations of pert distribution - r

I'm working with monte carlo using R with the following code:
A)
mc_matrix = 1
for (i in 1:1000000){
mc_sample = rpert(n=1,min=629,max=1049,mode=739)
mc_matrix = rbind(mc_matrix, mc_sample)
}
mean(mc_matrix)
B)
mean_of_matrix = rpert(1000000, min=629, max=1049, mode=739)
Should these two instances of code not be the same? How come I'm not getting the same average having so many samples from the distribution?

as first it would be good to let everybody know the packages you need. In your case it is the package "freedom".
Than, the newest version needs the input with x.min, x.max and x.mode.
In a Monte Carlo simulation you simulate random variables to calculate for example the mean as in your example. The problem is that this is just an asymptotic approximation of the distribution.
If you try this with the implemented rnorm(n) function you get different results for the mean, even if the true mean of every simulated normal distributed random variable is the same.
So if you try
mean(rnorm(10000))
mean(rnorm(10000))
the results will slightly differ.
Every programming language has an implemented pseudo random variable generator. If you need the same random variables again and again you can use the function set.seed(seed) to start the random variable generator at the same point.
Try
set.seed(100)
mean(rnorm(1000))
set.seed(100)
mean(rnorm(1000))
you will get the same results.
You can try this with your example, but the results will still differ because you do some calculations in the first example while you just calculate the mean in the last. But you are right that the results should be the same with the same random variables because it's the same calculation.
Thats a basic principle of Monte Carlo Simulations to simulate n, with n large, random variables to approximate an asymptotic distribution.

Related

How to partition variance among parameters in a Monte Carlo simulation?

I have some doubts as how to better extract information from a Monte Carlo simulation, I will simplify the problem here, using some R pseudocode but the question is more general.
Let's say I have a function with three parameters, each of them with a mean and SD. For this example I will use normal distributions, but for more general cases let's assume one of them is another distribution.
f(x,y,z) = rnorm(mean_x, sd_x) * rnorm(mean_y, sd_y) * rnorm(mean_z, sd_z)
I am using a Monte Carlo simulation to quantify the uncertainty, quite straight-forward. I am interested though in understanding what % of the total uncertainty corresponds to each parameter, done computationally and not analytically.
One way I have envisioned would be:
Define total uncertainty as the 0.05-0.95 quantiles of the full MC simulation, and this would be 100% of the model uncertainty.
Do a new simulation, but now set one parameter fixed using the mean value, such as:
f(x,y,z) = mean_x * rnorm(mean_y, sd_y) * rnorm(mean_z, sd_z)
The difference between the "total uncertainty" and the 0.05-0.95 quantiles of this simulation would then be amount of uncertainty related to this specific parameter.
Repeat for the other parameters.
I know this is a simplification as it ignores interactions among parameters, but would this be correct? The problem in question is slightly more complex than this so other analytical approaches are not that feasible.

Generate multivariate nonnormal random numbers in R

Background
I want to generate multivariate distributed random numbers with a fixed variance matrix. For example, I want to generate a 2 dimensional data with covariance value = 0.5, each dimensional variance = 1. The first maginal of data is a norm distribution with mean = 0, sd = 1, and the next is a exponential distribution with rate = 2.
My attempt
My attempt is that we can generate a correlated multinormal distribution random numbers and then revised them to be any distribution by Inverse transform sampling.
In below, I give an example about transforming 2 dimensional normal distribution random numbers into a norm(0,1)+ exp(2) random number:
# generate a correlated multi-normal distribution, data[,1] and data[,2] are standard norm
data <- mvrnorm(n = 1000,mu = c(0,0), Sigma = matrix(c(1,0.5,0.5,1),2,2))
# calculate the cdf of dimension 2
exp_cdf = ecdf(data[,2])
Fn = exp_cdf(data[,2])
# inverse transform sampling to get Exponetial distribution with rate = 2
x = -log(1-Fn + 10^(-5))/2
mean(x);cor(data[,1],x)
Out:
[1] 0.5035326
[1] 0.436236
From the outputs, the new x is a set of exponential(rate = 2) random numbers. Also, x and data[,1] are correlated with 0.43. The correlated variance is 0.43, not very close to my original setting value 0.5. It maybe a issue. I think covariance of sample generated should stay more closer to initial setting value. In general, I think my method is not quite decent, maybe you guys have some amazing code snippets.
My question
As a statistics graduate, I know there exist 10+ methods to generate multivariate random numbers theoretically. In this post, I want to collect bunch of code snippets to do it automatically using packages or handy . And then, I will compare them from different aspects, like time consuming and quality of data etc. Any ideas is appreciated!
Note
Some users think I am asking for package recommendation. However, I am not looking for any recommendation. I already knew commonly used statistical theroms and R packages. I just wanna know how to generate multivariate distributed random numbers with a fixed variance matrix decently and give a code example about generate norm + exp random numbers. I think there must exist more powerful code snippets to do it in a decent way! So I ask for help right now!
Sources:
generating-correlated-random-variables, math
use copulas to generate multivariate random numbers, stackoverflow
Ross simulation, theoretical book
R CRAN distribution task View

Generate beta-binomial distribution from existing vector

Is it possible to/how can I generate a beta-binomial distribution from an existing vector?
My ultimate goal is to generate a beta-binomial distribution from the below data and then obtain the 95% confidence interval for this distribution.
My data are body condition scores recorded by a veterinarian. The values of body condition range from 0-5 in increments of 0.5. It has been suggested to me here that my data follow a beta-binomial distribution, discrete values with a restricted range.
set1 <- as.data.frame(c(3,3,2.5,2.5,4.5,3,2,4,3,3.5,3.5,2.5,3,3,3.5,3,3,4,3.5,3.5,4,3.5,3.5,4,3.5))
colnames(set1) <- "numbers"
I see that there are multiple functions which appear to be able to do this, betabinomial() in VGAM and rbetabinom() in emdbook, but my stats and coding knowledge is not yet sufficient to be able to understand and implement the instructions provided on the function help pages, at least not in a way that has been helpful for my intended purpose yet.
We can look at the distribution of your variables, y-axis is the probability:
x1 = set1$numbers*2
h = hist(x1,breaks=seq(0,10))
bp = barplot(h$counts/length(x1),names.arg=(h$mids+0.5)/2,ylim=c(0,0.35))
You can try to fit it, but you have too little data points to estimate the 3 parameters need for a beta binomial. Hence I fix the probability so that the mean is the mean of your scores, and looking at the distribution above it seems ok:
library(bbmle)
library(emdbook)
library(MASS)
mtmp <- function(prob,size,theta) {
-sum(dbetabinom(x1,prob,size,theta,log=TRUE))
}
m0 <- mle2(mtmp,start=list(theta=100),
data=list(size=10,prob=mean(x1)/10),control=list(maxit=1000))
THETA=coef(m0)[1]
We can also use a normal distribution:
normal_fit = fitdistr(x1,"normal")
MEAN=normal_fit$estimate[1]
SD=normal_fit$estimate[2]
Plot both of them:
lines(bp[,1],dbetabinom(1:10,size=10,prob=mean(x1)/10,theta=THETA),
col="blue",lwd=2)
lines(bp[,1],dnorm(1:10,MEAN,SD),col="orange",lwd=2)
legend("topleft",c("normal","betabinomial"),fill=c("orange","blue"))
I think you are actually ok with using a normal estimation and in this case it will be:
normal_fit$estimate
mean sd
6.560000 1.134196

Generate random data from arbitrary CDF in R?

I have an arbitrary CDF that is applied to a point estimate. I have a number of these point estimates with associated CDFs, that I need to simulate random data for a Monte Carlo simulation.
The CDF I'm generating by doing a spline fit to the arbitrary points provided in a table. For example, the quantile 0.1 is a product of 0.13 * point estimate. The quantile 0.9 is a product of 7.57 * point estimate. It is fairly crude and is based on a large study comparing these models to real world system -- ignore that for now please.
I fit the CDF using a spline fit as shown here.
If I take the derivative of this, I get the shape of the pdf (image).
I modified the function "samplepdf" found here, Sampling from an Arbitrary Density, as follows:
samplecdf <- function(n, cdf, spdf.lower = -Inf, spdf.upper=Inf) {
my_fun <- match.fun(cdf)
invcdf <- function(u) {
subcdf <- function(t) my_fun(t) - u
if (spdf.lower == -Inf)
spdf.lower <- endsign(subcdf, -1)
if (spdf.upper == Inf)
spdf.upper <- endsign(subcdf)
return(uniroot(subcdf, c(spdf.lower, spdf.upper))$root)
}
sapply(runif(n), invcdf)
}
This seems to work, OK - when I compare the quantiles I estimate from the randomly generated data they are fairly close to the initial values. However, when I look at the histogram something funny is happening at the tail where it is looks like my function is consistently generating more values than it should according to the pdf. This function consistently does that across all my point-estimates and even though I can look at the individual quantiles and they seem close, I can tell that the overall Monte Carlo simulation is demonstrating higher estimates for the 50% percentile than I expect. Here is a plot of my histogram of the random samples.
Any tips or advice would be very welcome. I think the best route would be to fit an exponential distribution to the CDF, but I'm struggling to do that. All "fitting" assumes that you have data that needs to be fitted -- this is more arbitrary than that.

Trying to do a simulation in R

I'm pretty new to R, so I hope you can help me!
I'm trying to do a simulation for my Bachelor's thesis, where I want to simulate how a stock evolves.
I've done the simulation in Excel, but the problem is that I can't make that large of a simulation, as the program crashes! Therefore I'm trying in R.
The stock evolves as follows (everything except $\epsilon$ consists of constants which are known):
$$W_{t+\Delta t} = W_t exp^{r \Delta t}(1+\pi(exp((\sigma \lambda -0.5\sigma^2) \Delta t+\sigma \epsilon_{t+\Delta t} \sqrt{\Delta t}-1))$$
The only thing here which is stochastic is $\epsilon$, which is represented by a Brownian motion with N(0,1).
What I've done in Excel:
Made 100 samples with a size of 40. All these samples are standard normal distributed: N(0,1).
Then these outcomes are used to calculate how the stock is affected from these (the normal distribution represent the shocks from the economy).
My problem in R:
I've used the sample function:
x <- sample(norm(0,1), 1000, T)
So I have 1000 samples, which are normally distributed. Now I don't know how to put these results into the formula I have for the evolution of my stock. Can anyone help?
Using R for (discrete) simulation
There are two aspects to your question: conceptual and coding.
Let's deal with the conceptual first, starting with the meaning of your equation:
1. Conceptual issues
The first thing to note is that your evolution equation is continuous in time, so running your simulation as described above means accepting a discretisation of the problem. Whether or not that is appropriate depends on your model and how you have obtained the evolution equation.
If you do run a discrete simulation, then the key decision you have to make is what stepsize $\Delta t$ you will use. You can explore different step-sizes to observe the effect of step-size, or you can proceed analytically and attempt to derive an appropriate step-size.
Once you have your step-size, your simulation consists of pulling new shocks (samples of your standard normal distribution), and evolving the equation iteratively until the desired time has elapsed. The final state $W_t$ is then available for you to analyse however you wish. (If you retain all of the $W_t$, you have a distribution of the trajectory of the system as well, which you can analyse.)
So:
your $x$ are a sampled distribution of your shocks, i.e. they are $\epsilon_t=0$.
To simulate the evolution of the $W_t$, you will need some initial condition $W_0$. What this is depends on what you're modelling. If you're modelling the likely values of a single stock starting at an initial price $W_0$, then your initial state is a 1000 element vector with constant value.
Now evaluate your equation, plugging in all your constants, $W_0$, and your initial shocks $\epsilon_0 = x$ to get the distribution of prices $W_1$.
Repeat: sample $x$ again -- this is now $\epsilon_1$. Plugging this in, gives you $W_2$ etc.
2. Coding the simulation (simple example)
One of the useful features of R is that most operators work element-wise over vectors.
So you can pretty much type in your equation more or less as it is.
I've made a few assumptions about the parameters in your equation, and I've ignored the $\pi$ function -- you can add that in later.
So you end up with code that looks something like this:
dt <- 0.5 # step-size
r <- 1 # parameters
lambda <- 1
sigma <- 1 # std deviation
w0 <- rep(1,1000) # presumed initial condition -- prices start at 1
# Show an example iteration -- incorporate into one line for production code...
x <- rnorm(1000,mean=0,sd=1) # random shock
w1 <- w0*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*x*sqrt(dt) -1)) # evolution
When you're ready to let the simulation run, then merge the last two lines, i.e. include the sampling statement in the evolution statement. You then get one line of code which you can run manually or embed into a loop, along with any other analysis you want to run.
# General simulation step
w <- w*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*rnorm(1000,mean=0,sd=1)*sqrt(dt) -1))
You can also easily visualise the changes and obtain summary statistics (5-number summary):
hist(w)
summary(w)
Of course, you'll still need to work through the details of what you actually want to model and how you want to go about analysing it --- and you've got the $\pi$ function to deal with --- but this should get you started toward using R for discrete simulation.

Resources