Simulation in R, for loop - r

I am trying to simulate the data for 10 times in R but I did not figure out how to achieve that. The code is shown below, you could run it in R straightway! When I run it, it will give me 5 numbers of "w" as output, I think this is only one simulation, but actually what I want is 10 different simulations of that 5 numbers.
I know I will need to write a for loop for it but I did not get that, could anyone help please?
# simulate 10 times
# try N = 10, for loop?
# initial values w0 and E
w0=1000
E= 1000
data = c(-0.02343731, 0.045509474 ,0.076144158,0.09234636,0.0398257)
constant = exp(cumsum(data))
exp.cum = cumsum(1/constant)
w=constant*(W0 - exp.cum)- E
w

You'll want to generate new values of data in each simulation. Do this within the curly brackets that follow the for loop. Then, before closing the curly brackets, be sure to save your statistical output in the appropriate position in a object, like a vector. For a simple example,
W0=1000
E= 1000
n_per_sim <- 5
num_sims <- 10
set.seed(12345) #seed is necessay for reproducibility
sim_output_1 <- rep(NA, times = num_sims) #This creates a vector of 10 NA values
for (sim_number in 1:num_sims){ #this starts your for loop
data <- rnorm(n=n_per_sim, mean=10, sd=2) #generate your data
average <- mean(data)
sim_output_1[sim_number] <- average #this is where you store your output for each simulation
}
sim_output_1 #Now you can see the average from each simulation
Note that if you want to save five values from each simulation, you can make use a matrix object instead of a vector object, as shown here
matrix_output <- matrix(NA, ncol=n_per_sim, nrow=num_sims) #This creates a 10x5 matrix
for (sim_number in 1:num_sims){ #this starts your for loop
data <- rnorm(n=n_per_sim, mean=10, sd=2) #generate your data
constant = exp(cumsum(data))
exp.cum = cumsum(1/constant)
w=constant*(W0 - exp.cum)- E
matrix_output[sim_number, ] <- w #this is where you store your output for each simulation
}
matrix_output #Now you can see the average from each simulation

Related

Generating n new datasets by randomly sampling existing data, and then applying a function to new datasets

For a paper I'm writing I have subsetted a larger dataset into 3 groups, because I thought the strength of correlations between 2 variables in those groups would differ (they did). I want to see if subsetting my data into random groupings would also significantly affect the strength of correlations (i.e., whether what I'm seeing is just an effect of subsetting, or if those groupings are actually significant).
To this end, I am trying to generate n new data frames by randomly sampling 150 rows from an existing dataset, and then want to calculate correlation coefficients for two variables in those n new data frames, saving the correlation coefficient and significance in a new file.
But, HOW?
I can do it manually, e.g., with dplyr, something like
newdata <- sample_n(Random_sample_data, 150)
output <- cor.test(newdata$x, newdata$y, method="kendall")
I'd obviously like to not type this out 1000 or 100000 times, and have been trying things with loops and lapply (see below) but they've not worked (undoubtedly due to something really obvious that I'm missing!).
Here I have tried to assign each row to a different group, with 10 groups in total, and then to do correlations between x and y by those groups:
Random_sample_data<-select(Range_corrected, x, y)
cat <- sample(1:10, 1229, replace=TRUE)
Random_sample_cats<-cbind(Random_sample_data,cat)
correlation <- function(c) {
c <- cor.test(x,y, method="kendall")
return(c)
}
b<- daply(Random_sample_cats, .(cat), correlation)
Error message:
Error in cor.test(x, y, method = "kendall") :
object 'x' not found
Once you have the code for what you want to do once, you can put it in replicate to do it n times. Here's a reproducible example on built-in data
result = replicate(n = 10, expr = {
newdata <- sample_n(mtcars, 10)
output <- cor.test(newdata$wt, newdata$qsec, method="kendall")
})
replicate will save the result of the last line of what you did (output <- ...) for each replication. It will attempt to simplify the result, in this case cor.test returns a list of length 8, so replicate will simplify the results to a matrix with 8 rows and 10 columns (1 column per replication).
You may want to clean up the results a little bit so that, e.g., you only save the p-value. Here, we store only the p-value, so the result is a vector with one p-value per replication, not a matrix:
result = replicate(n = 10, expr = {
newdata <- sample_n(mtcars, 10)
cor.test(newdata$wt, newdata$qsec, method="kendall")$p.value
})

How can I repeat these two lines of code 100+ times?

I'm still new to the programming world and looking for some guidance on a model I am building for individual animal growths over time.
The goal for the code I'm working with is to
i) Generate random starting sizes of animals from a given distribution
ii) Give each of these individuals a starting growth rate from a given distribution
iii) Calculate new size of individual after 1 year
iv) Assign a new growth rate from above distribution
v) Calculate the new size of individual after another year.
So far I have the code below, and what I want to do is repeat the last two lines of code x amount of times without I having to physically run the code over and over.
# Generate starting lengths
lengths <- seq(from=4.4, to=5.4, by =0.1)
# Generate starting ks (growth rate)
ks <- seq(from=0.0358, to=0.0437, by =0.0001)
#Create individuals
create.inds <- function(id = NaN, length0=NaN, k1=NaN){
inds <- data.frame(id=id, length0 = length0, k1=k1)
inds
}
# Generate individuals
inds <- create.inds(id=1:n.initial,
length=sample(lengths,100,replace=TRUE),
k1=sample(ks, 100, replace=TRUE))
# Calculate new lengths based on last and 2nd last columns and insert into next column
inds[,ncol(inds)+1] <- 326*(1-exp(-(inds[,ncol(inds)])))+
(inds[,ncol(inds)-1]*exp(-(inds[,ncol(inds)])))
# Calculate new ks and insert into last column
inds[,ncol(inds)+1] <- sample(ks, 100, replace=TRUE)
Any and all assistance would be appreciated, also if you think there is a better way to write this please let me know.
i think what you are asking is a simple loop:
for (i in 1:100) { #replace 100 with the desired times you want this to excecute
inds[,ncol(inds)+1] <- 326*(1-exp(-(inds[,ncol(inds)])))+
(inds[,ncol(inds)-1]*exp(-(inds[,ncol(inds)])))
# Calculate new ks and insert into last column
inds[,ncol(inds)+1] <- sample(ks, 100, replace=TRUE)
}

two sided ks test loop, get p.value

I have a column of data from which I am taking randomized sub samples of 50%.
I'm running a two sided ks test to compare the distribution of 50% of the data against 100% of the data to see if the distribution is still a significant fit.
In order to meet my objectives I want to run this as a loop of say 1000 to get an average p-value from 1000 randomized sub samples. This line of code gives me a single p-value for a random subset of 50% of my sample:
dat50=dat[sample(nrow(dat),replace=F,size=0.50*nrow(dat)),]
ks.test(dat[,1],dat50[,1], alternative="two.sided")
I need a line of code that will run this 1000 times saving the resulting (different) p value each time in a column which I can then average. The code I'm trying to get to work looks like this:
x <- numeric(100)
for (i in 1:100){
x<- ks.test(dat[,7],dat50[,7], alternative="two.sided")
x<-x$p.value
}
However this does not store multiple p-values
Also tried this:
get.p.value <- function(df1, df2) {
x <- rf(5, df1=df1, df2=df2)
p.value <- ks.test(dat[,6],dat50[,6], alternative="two.sided")$p.value
}
replicate (2000, get.p.value(df1 = 5, df2 = 10))
I hope that is clear and I would appreciate any help solving this so much!
Q
In your for loop you are overwriting x in each iteration meaning that you will only save the p-value for the last iteration. Try this instead:
x <- numeric(100)
for (i in 1:length(x))
x[i] <- ks.test(dat[,17], dat[sample(nrow(dat), replace=F, size=0.5*nrow(dat)),7])$p.value
You can get the same result using replicate with:
replicate(100, ks.test(dat[,7], dat[sample(nrow(dat), replace=F, size=0.5*nrow(dat)),7])$p.value)

How to Generate Normal Random Samples within Mean±3Sigma

I want to draw normal random numbers in an array of order ((100*8)*5000) with a specific Mean (M) and Standard Deviation (S) but I want them to be only within the range M±3S, so that I don't have any outliers in my array exceeding those limits.
Any Suggestion? I want to write a program in R based on this array for some simulation studies. I am using following R Code to generate my Data Set:
for(i in 1:5000){
for(j in 1:8){
Dat[,j,i]=rnorm(100,mean=muu[j],sd=sigma[j])
}
}
Now, We want to get rid of those values which are higher than muu±3sigma in the above data. Definitely, We have to replace discarded values with fresh values so that the dimension of the Dat array keep intact.
First Solution
Here is a start but I bet there is a more elegant solution.
First generate a sample next step is to subset it to your desired values. Of course you have to adjust values to your desire.
set.seed(123)
rs <- rnorm(10000, mean = 10, sd = 3)
rs1 <- rs[ rs >= -19 & rs <= 19 ]
Second (better) solution
I think my first solutions didn't work so well. I have just written some code that might be perfect for your purposes. Here are the steps.
create an array of NAs with the required dimensions
fill it with random numbers
create a logical vector where TRUEs are for the desired conditions
subset the data based on that vector and replace the values where TRUE is TRUE (pardon my words game) with the mean used to generate samples
data <- array(NA, dim = c(100, 8, 5000))
for(i in 1:5000){
data[ , , i] <- rnorm(800, 3, 1)
}
bound <- 3 + c(-1, 1)*3*1
pr <- data <= bound[1] | data >= bound[2]
data[pr] <- 3

Storing For Loop values after simulation

I'm brand new to R and trying to implement a simple model (which I will extend later) that deals with corporate bond defaults.
For starters, I'm using only two clients.
Parameters:
- two clients (which I name "A" and "B")
- a cash flow of $10,000 will be received from each client if they do not default within 10 years
- pulling together concepts using standard normal random variables, dependent uniform random variables and Gaussian copulas
- run some number of simulations
- store the sum of Client A cash flow plus Client B cash flow and store in a vector named "result"
- finally, take the average of the result vector
My code is:
# define variables
nSim <- 5 # of simulations
rho <- 0.3 # rho
lambda <- 0.01 # default intensity
T <- 10 # time to default
for (i in 1:nSim){
# Step 1: generate 2 independent standard normal random variables
z1 <- rnorm(1, mean=0, sd=1)
z2 <- rnorm(1, mean=0, sd=1)
# Step 2: map the normals into correlated normals
# by Cholesky composition of the correlation matrix
# w1 = z1
# w2 = rho(z1)+sqrt(1-(rho^2))*z2
w1 <- z1
w2 <- rho*z1 - sqrt(1-(rho^2))*z2
# Step 3: using the correlated normals, generate two dependent uniform variables
u <- runif(1, min=0, max=1)
v <- runif(1, min=0, max=1)
# Step 4: using the dependent uniforms, generate two dependent exponentials
tau.A <- (-1/lambda)*log(u)
tau.B <- (-1/lambda)*log(v)
payout.A <- if (tau.A > 10) {10000} else {0}
payout.B <- if (tau.B > 10) {10000} else {0}
result[i] = (payout.A[i] + payout.B[i])
}
# calculate expected value of portfolio
mean(result)
When I run this code, I'm getting an error of "NA" and can't figure out why (again, I'm brand new to R). I don't think each of the simulation values is being stored in the results vector, but don't know how to diagnose the problem.
Thanks in advance to anyone who can help!
--Sarah
Everything works until the results[i] <- (payout.A[i] + payout.B[i]) line. The problem is you never set results.
Before your for loop, add the line:
results <- vector('numeric', length = nSim)
This will create a vector of 0s with a length of nSim. In R is is best to preallocate the space instead of dynamically growing a vector using c().
No the problem is the presence of the [i] assignments in the results[i] <- (payout.A[i] + payout.B[i]) line.
The [i] assignment is okay for the results parameter but not the two payout parameters because each of these are being generated in each loop. So simply remove them to form the line:
results[i] <- (payout.A + payout.B)
will solve your issue. If you wish to keep each payout in its own vector then you need to assign it as such, but it seems that you don't.

Resources