how to create a for loop in R to run a simple random sample and calculate the average of each set - r

data = read.csv(file= "~/Downloads/data.csv")
temp=(data$temp)
n=75
N=length(temp)
s=sample(1:N, n)
ybar=mean(temp[s])
I want to run the sample 100 times where n is 75. Then calculate average of each sample, and subtract each average from a set number (50).

Maybe a short loop is the way to go. Notice the bracket on the left-side of the equals sign in the last line of code - it's the key to using a loop for your calculation!
# set a seed - always a good idea when using randomness like 'sample()'
set.seed(123)
# pre-allocate an "empty" vector to fill in with results
ybar_vec = vector(length=n)
# do your calculation "n" times
for(i in 1:n) {
s = sample(N)
ybar_vec[i] = 50 - mean(temp[s]) # store i^th calc as i^th element of ybar_vec
}

Related

How can I make my for loop in R run faster? Can I vectorize this?

#Start: Initialize values
#For each block lengths (BlockLengths) I will run 10 estimates (ThetaL). For each estimate, I simulate 50000 observarions (Obs). Each estimate is calculated on the basis of the blocklength.
Index=0 #Initializing Index.
ThetaL=10 #Number of estimations of Theta.
Obs=50000 #Sample size.
Grp=vector(length=7) #Initializing a vector of number of blocks. It is dependent on block lengths (see L:15)
Theta=matrix(data=0,nrow=ThetaL,ncol=7) #Initializing a matrix of the estimates of Thetas. There are 10 for each block length.
BlockLengths<-c(10,25,50,100,125,200,250) #Setting the block lengths
for (r in BlockLengths){
Index=Index+1
Grp[Index]=Obs/r
for (k in 1:ThetaL){
#Start: Constructing the sample
Y1<-matrix(data=0,nrow=Obs,ncol=2)
Y1[1,]<-runif(2,0,1)
Y1[1,1]<--log(-(Y1[1,1])^2 +1)
Y1[1,2]<--log(-(Y1[1,2])^2 +1)
for (i in 2:Obs)
{
Y1[i,1]<-Y1[i-1,2]
Y1[i,2]<-runif(1,0,1)
Y1[i,2]<--log(-(Y1[i,2])^2 +1)
}
X1 <- vector(length=Obs)
for (i in 1:Obs){
X1[i]<-max(Y1[i,])
}
#End: Constructing the sample
K=0 #K will counts number of blocks with at least one exceedance
for (t in 1:Grp[Index]){ #For loop from 1 to number of groups
a=0
for (j in (1+r*(t-1)):(t*r)){ #Loop for the sample within each group
if (X1[j]>quantile(X1,0.99)){ #If a value exceeds high threshold, we add 1 to some variable a
a=a+1
}
}
if(a>=1){ #For the group, if a is larger than 1, we have had a exceedance.
K=K+1 #Counts number of blocks with at least one exceedance.
}
}
N<-sum(X1>=quantile(X1,0.99)) #Summing number of exceedances
Theta[k,Index]<- (1/r) * ((log(1-K/Grp[Index])) / (log(1-N/Obs))) #Estimate
#Theta[k,Index]<-K/N
}
}
I have been running the above code without errors and it took me about 20 minutes, but I want to run the code for larger sample and more repetitions, which makes the run time absurdly large. I tried to only have the necessary part inside the loops to optimize it a little. Is it possible to optimize it even further or should I use another programming language as I've read R is bad for "for loop". Will vectorization help? In case, how can I vectorize the code?
First, you can define BlockLengths before Grp and Theta as both of them depend on it's length:
Index = 0
ThetaL = 2
Obs = 10000
BlockLengths = c(10,25)
Grp = vector(length = length(BlockLengths))
Theta = matrix(data = 0, nrow = ThetaL, ncol = length(BlockLengths))
Obs: I decreased the size of the operation so that I could run it faster. With this specification, your original loop took 24.5 seconds.
Now, for the operation, there where three points where I could improve:
Creation of Y1: the second column can be generated at once, just by creating Obs random numbers with runif(). Then, the first column can be created as a lag of the second column. With only this alteration, the loop ran in 21.5 seconds (12% improvement).
Creation of X1: you can vectorise the max function with apply. This alteration saved further 1.5 seconds (6% improvement).
Calculation of K: you can, for each t, get all the values of X1[(1+r*(t-1)):(t*r)], and run the condition on all of them at once (instead of using the second loop). The any(...) does the same as your a>=1. Furthermore, you can remove the first loop using lapply vectorization function, then sum this boolean vector, yielding the same result as your combination of if(a>=1) and K=K+1. The usage of pipes (|>) is just for better visualization of the order of operations. This by far is the more important alteration, saving more 18.4 seconds (75% improvement).
for (r in BlockLengths){
Index = Index + 1
Grp[Index] = Obs/r
for (k in 1:ThetaL){
Y1 <- matrix(data = 0, nrow = Obs, ncol = 2)
Y1[,2] <- -log(-(runif(Obs))^2 + 1)
Y1[,1] <- c(-log(-(runif(1))^2 + 1), Y1[-Obs,2])
X1 <- apply(Y1, 1, max)
K <- lapply(1:Grp[Index], function(t){any(X1[(1+r*(t-1)):(t*r)] > quantile(X1,0.99))}) |> unlist() |> sum()
N <- sum(X1 >= quantile(X1, 0.99))
Theta[k,Index] <- (1/r) * ((log(1-K/Grp[Index])) / (log(1-N/Obs)))
}
}
Using set.seed() I got the same results as your original loop.
A possible way to improve more is substituting the r and k loops with purrr::map function.

How can I repeat these two lines of code 100+ times?

I'm still new to the programming world and looking for some guidance on a model I am building for individual animal growths over time.
The goal for the code I'm working with is to
i) Generate random starting sizes of animals from a given distribution
ii) Give each of these individuals a starting growth rate from a given distribution
iii) Calculate new size of individual after 1 year
iv) Assign a new growth rate from above distribution
v) Calculate the new size of individual after another year.
So far I have the code below, and what I want to do is repeat the last two lines of code x amount of times without I having to physically run the code over and over.
# Generate starting lengths
lengths <- seq(from=4.4, to=5.4, by =0.1)
# Generate starting ks (growth rate)
ks <- seq(from=0.0358, to=0.0437, by =0.0001)
#Create individuals
create.inds <- function(id = NaN, length0=NaN, k1=NaN){
inds <- data.frame(id=id, length0 = length0, k1=k1)
inds
}
# Generate individuals
inds <- create.inds(id=1:n.initial,
length=sample(lengths,100,replace=TRUE),
k1=sample(ks, 100, replace=TRUE))
# Calculate new lengths based on last and 2nd last columns and insert into next column
inds[,ncol(inds)+1] <- 326*(1-exp(-(inds[,ncol(inds)])))+
(inds[,ncol(inds)-1]*exp(-(inds[,ncol(inds)])))
# Calculate new ks and insert into last column
inds[,ncol(inds)+1] <- sample(ks, 100, replace=TRUE)
Any and all assistance would be appreciated, also if you think there is a better way to write this please let me know.
i think what you are asking is a simple loop:
for (i in 1:100) { #replace 100 with the desired times you want this to excecute
inds[,ncol(inds)+1] <- 326*(1-exp(-(inds[,ncol(inds)])))+
(inds[,ncol(inds)-1]*exp(-(inds[,ncol(inds)])))
# Calculate new ks and insert into last column
inds[,ncol(inds)+1] <- sample(ks, 100, replace=TRUE)
}

How to Generate Normal Random Samples within Mean±3Sigma

I want to draw normal random numbers in an array of order ((100*8)*5000) with a specific Mean (M) and Standard Deviation (S) but I want them to be only within the range M±3S, so that I don't have any outliers in my array exceeding those limits.
Any Suggestion? I want to write a program in R based on this array for some simulation studies. I am using following R Code to generate my Data Set:
for(i in 1:5000){
for(j in 1:8){
Dat[,j,i]=rnorm(100,mean=muu[j],sd=sigma[j])
}
}
Now, We want to get rid of those values which are higher than muu±3sigma in the above data. Definitely, We have to replace discarded values with fresh values so that the dimension of the Dat array keep intact.
First Solution
Here is a start but I bet there is a more elegant solution.
First generate a sample next step is to subset it to your desired values. Of course you have to adjust values to your desire.
set.seed(123)
rs <- rnorm(10000, mean = 10, sd = 3)
rs1 <- rs[ rs >= -19 & rs <= 19 ]
Second (better) solution
I think my first solutions didn't work so well. I have just written some code that might be perfect for your purposes. Here are the steps.
create an array of NAs with the required dimensions
fill it with random numbers
create a logical vector where TRUEs are for the desired conditions
subset the data based on that vector and replace the values where TRUE is TRUE (pardon my words game) with the mean used to generate samples
data <- array(NA, dim = c(100, 8, 5000))
for(i in 1:5000){
data[ , , i] <- rnorm(800, 3, 1)
}
bound <- 3 + c(-1, 1)*3*1
pr <- data <= bound[1] | data >= bound[2]
data[pr] <- 3

R: Find the maximum value of each simulated iteration in R

I need to record the maximum value of 5 observations, simulated 10,000 times. This is in order to find the mean of these maximum values.
My current code looks like this:
mc.sim1 = function(){
mu = 20; sigma = 4
(rnorm(5, mu, sigma))
}
m = replicate(10000, mc.sim1())
I'm at a loss to explain how to record the maximum of each iteration, rather than the maximum of the whole simulated set using the 'max' function.
If to follow your code, use dim to understand what m is.
it is a matrix, with a row for each observation of the simulation, and a column for each replication.
a really simple and intuitive loop can look like
recorded_max=numeric(dim(m)[2])
for (i in 1:dim(m)[2])
{
recorded_max[i]=max(m[,i])
}
Just a simple sapply would work
mc.sim1 = function(...){
mu = 20; sigma = 4
max(rnorm(5, mu, sigma))
}
maxdraw <- sapply(seq.int(10000), mc.sim1)
At the end maxdraw will be a vector of length 10,000 with the maximum of the five random normal variables from each iteration.
I would recommend using apply to calculate the maximum value per column (since each column represents an iteration in m):
colMax <- apply(m, 2, max)
Also note #shadow's comment below about the colMaxs function from the matrixStats package.

Plot a table of binomial distributions in R

For a game design issue, I need to better inspect binomial distributions. Using R, I need to build a two dimensional table that - given a fixed parameters 'pool' (the number of dice rolled), 'sides' (the number of sides of the die) has:
In rows --> minimum for a success (ranging from 0 to sides, it's a discrete distribution)
In columns --> number of successes (ranging from 0 to pool)
I know how to calculate it as a single task, but I'm not sure on how to iterate to fill the entire table
EDIT: I forgot to say that I want to calculate the probability p of gaining at least the number of successes.
Ok, i think this could be a simple solution. It has ratio of successes on rows and success thresholds on dice roll (p) on columns.
poolDistribution <- function(n, sides=10, digits=2, roll.Under=FALSE){
m <- 1:sides
names(m) <- paste(m,ifelse(roll.Under,"-", "+"),sep="")
s <- 1:n
names(s) <- paste(s,n,sep="/")
sapply(m, function(m.value) round((if(roll.Under) (1 - pbinom(s - 1, n, (m.value)/sides))*100 else (1 - pbinom(s - 1, n, (sides - m.value + 1)/sides))*100), digits=digits))
This gets you half of the way.
If you are new to R, you might miss out on the fact that a very powerful feature is that you can use a vector of values as an index to another vector. This makes part of the problem trivially easy:
pool <- 3
sides <- 20 # <cough>D&D<cough>
# you need to strore the values somewhere, use a vector
NumberOfRollsPerSide <- rep(0, sides)
names(NumberOfRollsPerSide) <- 1:sides # this will be useful in table
## Repeast so long as there are still zeros
## ie, so long as there is a side that has not come up yet
while (any(NumberOfRollsPerSide == 0)) {
# roll once
oneRoll <- sample(1:sides, pool, TRUE)
# add (+1) to each sides' total rolls
# note that you can use the roll outcome to index the vector. R is great.
NumberOfRollsPerSide[oneRoll] <- NumberOfRollsPerSide[oneRoll] + 1
}
# These are your results:
NumberOfRollsPerSide
All you have left to do now is count, for each side, in which roll number it first came up.

Resources