How to create a loop to find the mean of values found by a nested loop? - r

Below I have code that finds the relative standard deviation of a bootstrap population that were bootstrapped from sample sizes ranging between 2 and 30.
I would like to create a loop that runs this loop for 10 iterations, finding the mean standard deviation for each sample size (2->30), and puts it into a data frame, so instead of the output being n 2:30 with the subsequent standard deviation, the standard deviation is instead a mean standard deviation (from 10 loops). I hope that makes sense.
n_range <- 2:29
bResultsRan <- vector("double", 28)
set.seed(30)
for (b in n_range) {
bRowsRan<-Random[sample(nrow(Random), b), ]
base <- read.table("base.csv", header=T, sep="," )
base$area<-5036821
base$quadrea <- base$area * 16
bootRan <- boot(data=bRowsRan$count, average, R=1000)
base$data<- bootRan$t
base$popsize<-(base$data*base$quadrea)
bValue <- sd(base$popsize)/mean(base$popsize)
bResultsRan[[b - 1]] <- bValue}
BRRan <- data.frame(n = n_range, bResultsRan)
plot(BRRan)

Related

Using R loop to find probability

I know the basic loop format, but I'm unsure how to incorporate 'population' into the loop to find the probability of collecting a sample with a mean of 42 or larger.
Use a loop to find out the probability of collecting a sample (n=10) with a mean of 42 (or larger) from the dataset produced by the following code:
set.seed(1)
population<-rnorm(n=500,mean=35,sd=10)
One approach to this problem is to repeatedly sample from population and compute the frequency that the mean of these samples is greater than or equal to 42.
set.seed(1);
population <- rnorm(n=500, mean=35, sd=10)
nsim <- 100000 # the number of time we will do this
vec_mean <- numeric(nsim) # a vector to hold the sample means
for (i in 1:nsim) {
samp <- sample(population, size = 10, replace = TRUE)
vec_mean[i] <- mean(samp)
}
sum(vec_mean >= 42) / nsim
# [1] 0.01727
This can be interpreted as the (frequentist) probability of collecting a sample of size 10 from this population with a mean of 42 or larger.

for loop in R — generating and calculate different results

Giving rnorm(100) I need to create a for loop in R to calculate the mean and the standard deviation generating 100 different numbers each time, and store those results in a vector.
How can I accomplish that?
so far
a <- rnorm(100) # generate 100 random numbers with normal distribution
sample <- 100 # number of samples
results <- rep(NA, 100) # vector creation for storing the results
for (i in 1:sample){
results[i] <- as.data.frame() # then i stuck here, lol, the most important part
}
This function will return the mean, standard deviation and sample size. The only argument required is the sample size (n).
results <- function(n) {
data.frame(Mean = mean(rnorm(n)),
Stdev = sd(rnorm(n)),
n = n
) -> out
return(out)
}

Change certain values of a vector based on mean and standard deviation of its subsets

I am trying to inject anomalies into a dataset, essentially changing certain values, based on a condition. I have a dataset, there are 10 subsets. The condition is that anomalies would be 2.8-3 times the standard deviation of each segment away from the mean of that subset. For that, I am dividing the dataset into 10 equal parts, then calculating the mean and standard deviation of each subset, and changing certain values by putting them 3 standard deviations of that subset away from the mean of that subset. The code looks like the following:
set.seed(1)
x <- rnorm(sample(1:35000, 32000, replace=F),0,1) #create dataset
y <- cumsum(x) #cumulative sum of dataset
j=1
for(i in c(1:10)){
seg = y[j:j+3000] #name each subset seg
m = mean(seg) #mean of subset
print(m)
s = sd(seg) # standard deviation of subset
print(s)
o_data = sample(j:j+3000,10) #draw random numbers from j to j + 3000
print(o_data)
y[o_data] = m + runif(10, min=2.8, max=3) * s #values = mean + 2.8-3 * sd
print(y[o_data])
j = j + 3000 # increment j
print(j)
}
The error I get is that standard deviation is NA, so I am not able to set the values.
What other approach is there by which I can accomplish the task? I have the inject anomalies which are 2.8-3 standard deviations away from the rolling mean essentially.
You have a simple error in your code. when you wrote
seg = y[j:j+3000] I believe that you meant seg = y[j:(j+3000)]
Similarly o_data = sample(j:j+3000,10) should be o_data = sample(j:(j+3000),10)

Generate random values in R with a defined correlation in a defined range

For a science project, I am looking for a way to generate random data in a certain range (e.g. min=0, max=100000) with a certain correlation with another variable which already exists in R. The goal is to enrich the dataset a little so I can produce some more meaningful graphs (no worries, I am working with fictional data).
For example, I want to generate random values correlating with r=-.78 with the following data:
var1 <- rnorm(100, 50, 10)
I already came across some pretty good solutions (i.e. https://stats.stackexchange.com/questions/15011/generate-a-random-variable-with-a-defined-correlation-to-an-existing-variable), but only get very small values, which I cannot transform so the make sense in the context of the other, original values.
Following the example:
var1 <- rnorm(100, 50, 10)
n <- length(var1)
rho <- -0.78
theta <- acos(rho)
x1 <- var1
x2 <- rnorm(n, 50, 50)
X <- cbind(x1, x2)
Xctr <- scale(X, center=TRUE, scale=FALSE)
Id <- diag(n)
Q <- qr.Q(qr(Xctr[ , 1, drop=FALSE]))
P <- tcrossprod(Q) # = Q Q'
x2o <- (Id-P) %*% Xctr[ , 2]
Xc2 <- cbind(Xctr[ , 1], x2o)
Y <- Xc2 %*% diag(1/sqrt(colSums(Xc2^2)))
var2 <- Y[ , 2] + (1 / tan(theta)) * Y[ , 1]
cor(var1, var2)
What I get for var2 are values ranging between -0.5 and 0.5. with a mean of 0. I would like to have much more distributed data, so I could simply transform it by adding 50 and have a quite simililar range compared to my first variable.
Does anyone of you know a way to generate this kind of - more or less -meaningful data?
Thanks a lot in advance!
Starting with var1, renamed to A, and using 10,000 points:
set.seed(1)
A <- rnorm(10000,50,10) # Mean of 50
First convert values in A to have the new desired mean 50,000 and have an inverse relationship (ie subtract):
B <- 1e5 - (A*1e3) # Note that { mean(A) * 1000 = 50,000 }
This only results in r = -1. Add some noise to achieve the desired r:
B <- B + rnorm(10000,0,8.15e3) # Note this noise has mean = 0
# the amount of noise, 8.15e3, was found through parameter-search
This has your desired correlation:
cor(A,B)
[1] -0.7805972
View with:
plot(A,B)
Caution
Your B values might fall outside your range 0 100,000. You might need to filter for values outside your range if you use a different seed or generate more numbers.
That said, the current range is fine:
range(B)
[1] 1668.733 95604.457
If you're happy with the correlation and the marginal distribution (ie, shape) of the generated values, multiply the values (that fall between (-.5, +.5) by 100,000 and add 50,000.
> c(-0.5, 0.5) * 100000 + 50000
[1] 0e+00 1e+05
edit: this approach, or any thing else where 100,000 & 50,000 are exchanged for different numbers, will be an example of a 'linear transformation' recommended by #gregor-de-cillia.

Storing For Loop values after simulation

I'm brand new to R and trying to implement a simple model (which I will extend later) that deals with corporate bond defaults.
For starters, I'm using only two clients.
Parameters:
- two clients (which I name "A" and "B")
- a cash flow of $10,000 will be received from each client if they do not default within 10 years
- pulling together concepts using standard normal random variables, dependent uniform random variables and Gaussian copulas
- run some number of simulations
- store the sum of Client A cash flow plus Client B cash flow and store in a vector named "result"
- finally, take the average of the result vector
My code is:
# define variables
nSim <- 5 # of simulations
rho <- 0.3 # rho
lambda <- 0.01 # default intensity
T <- 10 # time to default
for (i in 1:nSim){
# Step 1: generate 2 independent standard normal random variables
z1 <- rnorm(1, mean=0, sd=1)
z2 <- rnorm(1, mean=0, sd=1)
# Step 2: map the normals into correlated normals
# by Cholesky composition of the correlation matrix
# w1 = z1
# w2 = rho(z1)+sqrt(1-(rho^2))*z2
w1 <- z1
w2 <- rho*z1 - sqrt(1-(rho^2))*z2
# Step 3: using the correlated normals, generate two dependent uniform variables
u <- runif(1, min=0, max=1)
v <- runif(1, min=0, max=1)
# Step 4: using the dependent uniforms, generate two dependent exponentials
tau.A <- (-1/lambda)*log(u)
tau.B <- (-1/lambda)*log(v)
payout.A <- if (tau.A > 10) {10000} else {0}
payout.B <- if (tau.B > 10) {10000} else {0}
result[i] = (payout.A[i] + payout.B[i])
}
# calculate expected value of portfolio
mean(result)
When I run this code, I'm getting an error of "NA" and can't figure out why (again, I'm brand new to R). I don't think each of the simulation values is being stored in the results vector, but don't know how to diagnose the problem.
Thanks in advance to anyone who can help!
--Sarah
Everything works until the results[i] <- (payout.A[i] + payout.B[i]) line. The problem is you never set results.
Before your for loop, add the line:
results <- vector('numeric', length = nSim)
This will create a vector of 0s with a length of nSim. In R is is best to preallocate the space instead of dynamically growing a vector using c().
No the problem is the presence of the [i] assignments in the results[i] <- (payout.A[i] + payout.B[i]) line.
The [i] assignment is okay for the results parameter but not the two payout parameters because each of these are being generated in each loop. So simply remove them to form the line:
results[i] <- (payout.A + payout.B)
will solve your issue. If you wish to keep each payout in its own vector then you need to assign it as such, but it seems that you don't.

Resources