Error for using for-loop when producing simulations - r

I'm trying to use for loop to simulate 1,000 portfolios with 3 bonds in each portfolio, and finding the probability that two out of three bonds default.
Here's my code (with comments):
#Reproducibility
set.seed(33)
#Number of trials
n<-1000
#Initialize variables
numberofdefaults<-0
counter<-0
portfolio <- 0
for (i in 1:n){
portfolio[i] <- rbinom(3, 1, prob = 0.127) # generate three random binomial deviates with probabiltiy of sucess("default" in my case)0.127 and store them in a vector
numberofdefaults[i] <- sum(portfolio[i] == 1) # find the number of defaults in the vector (1 for default) and add them up
if (numberofdefaults[i] == 2) { # if number of defaults is 2, then add 1 to the counter
counter<-counter+1
}
}
When I execute the code, I keep getting an error message: number of items to replace is not a multiple of replacement length
Thnx so much for taking your time. Any suggestions would be appreciated.

Your code is not working as intended. Portfolio is a vector, so when you run rbinom(), which has 3 elements, you are attempting to cram 3 elements into one element (the ith element of that particular for loop). It gives you a warning that it can't do that (and only stores in the first element each time). Instead you want Portfolio to be a list.
set.seed(33)
#Number of trials
n<-1000
#Initialize variables
numberofdefaults<-0
counter<-0
portfolio <- list() # Change this
for (i in 1:n){
portfolio[[i]] <- rbinom(3, 1, prob = 0.127) # Change this
numberofdefaults[i] <- sum(portfolio[[i]] == 1) # Change this
if (numberofdefaults[i] == 2) {
counter<-counter+1
}
}

Related

How can I make my for loop in R run faster? Can I vectorize this?

#Start: Initialize values
#For each block lengths (BlockLengths) I will run 10 estimates (ThetaL). For each estimate, I simulate 50000 observarions (Obs). Each estimate is calculated on the basis of the blocklength.
Index=0 #Initializing Index.
ThetaL=10 #Number of estimations of Theta.
Obs=50000 #Sample size.
Grp=vector(length=7) #Initializing a vector of number of blocks. It is dependent on block lengths (see L:15)
Theta=matrix(data=0,nrow=ThetaL,ncol=7) #Initializing a matrix of the estimates of Thetas. There are 10 for each block length.
BlockLengths<-c(10,25,50,100,125,200,250) #Setting the block lengths
for (r in BlockLengths){
Index=Index+1
Grp[Index]=Obs/r
for (k in 1:ThetaL){
#Start: Constructing the sample
Y1<-matrix(data=0,nrow=Obs,ncol=2)
Y1[1,]<-runif(2,0,1)
Y1[1,1]<--log(-(Y1[1,1])^2 +1)
Y1[1,2]<--log(-(Y1[1,2])^2 +1)
for (i in 2:Obs)
{
Y1[i,1]<-Y1[i-1,2]
Y1[i,2]<-runif(1,0,1)
Y1[i,2]<--log(-(Y1[i,2])^2 +1)
}
X1 <- vector(length=Obs)
for (i in 1:Obs){
X1[i]<-max(Y1[i,])
}
#End: Constructing the sample
K=0 #K will counts number of blocks with at least one exceedance
for (t in 1:Grp[Index]){ #For loop from 1 to number of groups
a=0
for (j in (1+r*(t-1)):(t*r)){ #Loop for the sample within each group
if (X1[j]>quantile(X1,0.99)){ #If a value exceeds high threshold, we add 1 to some variable a
a=a+1
}
}
if(a>=1){ #For the group, if a is larger than 1, we have had a exceedance.
K=K+1 #Counts number of blocks with at least one exceedance.
}
}
N<-sum(X1>=quantile(X1,0.99)) #Summing number of exceedances
Theta[k,Index]<- (1/r) * ((log(1-K/Grp[Index])) / (log(1-N/Obs))) #Estimate
#Theta[k,Index]<-K/N
}
}
I have been running the above code without errors and it took me about 20 minutes, but I want to run the code for larger sample and more repetitions, which makes the run time absurdly large. I tried to only have the necessary part inside the loops to optimize it a little. Is it possible to optimize it even further or should I use another programming language as I've read R is bad for "for loop". Will vectorization help? In case, how can I vectorize the code?
First, you can define BlockLengths before Grp and Theta as both of them depend on it's length:
Index = 0
ThetaL = 2
Obs = 10000
BlockLengths = c(10,25)
Grp = vector(length = length(BlockLengths))
Theta = matrix(data = 0, nrow = ThetaL, ncol = length(BlockLengths))
Obs: I decreased the size of the operation so that I could run it faster. With this specification, your original loop took 24.5 seconds.
Now, for the operation, there where three points where I could improve:
Creation of Y1: the second column can be generated at once, just by creating Obs random numbers with runif(). Then, the first column can be created as a lag of the second column. With only this alteration, the loop ran in 21.5 seconds (12% improvement).
Creation of X1: you can vectorise the max function with apply. This alteration saved further 1.5 seconds (6% improvement).
Calculation of K: you can, for each t, get all the values of X1[(1+r*(t-1)):(t*r)], and run the condition on all of them at once (instead of using the second loop). The any(...) does the same as your a>=1. Furthermore, you can remove the first loop using lapply vectorization function, then sum this boolean vector, yielding the same result as your combination of if(a>=1) and K=K+1. The usage of pipes (|>) is just for better visualization of the order of operations. This by far is the more important alteration, saving more 18.4 seconds (75% improvement).
for (r in BlockLengths){
Index = Index + 1
Grp[Index] = Obs/r
for (k in 1:ThetaL){
Y1 <- matrix(data = 0, nrow = Obs, ncol = 2)
Y1[,2] <- -log(-(runif(Obs))^2 + 1)
Y1[,1] <- c(-log(-(runif(1))^2 + 1), Y1[-Obs,2])
X1 <- apply(Y1, 1, max)
K <- lapply(1:Grp[Index], function(t){any(X1[(1+r*(t-1)):(t*r)] > quantile(X1,0.99))}) |> unlist() |> sum()
N <- sum(X1 >= quantile(X1, 0.99))
Theta[k,Index] <- (1/r) * ((log(1-K/Grp[Index])) / (log(1-N/Obs)))
}
}
Using set.seed() I got the same results as your original loop.
A possible way to improve more is substituting the r and k loops with purrr::map function.

How to access values of predict() function in R for storage?

I'm training a K-nearest neighbors model for a class. The catch is that they ask us to train it with the whole database, except for the row being predicted.
My plan is to initialize a vector for storage and run a for loop to loop over every row omitting that specific row for training, then appending the prediction value to the vector, and calculating accuracy after the loop:
results <- c()
for (i in nrow(data) {
model.kknn <- train.kknn(data[-i,11]~., data = data[-i,1:10],kmax = 7, scale = TRUE)
pred <- predict(model,data[i,1:10])
results <- c(results,pred)
}
I'm expecting the vector results to be a series of 1s and 0s. However, I tried looping just the first row and the value of results is 2.
When printing pred the value is:
[1] 1
Levels: 0 1
Any idea how I can get the 1 to append to the vector results?
Specify 1:N in the for() part, and it's best not to "grow" a vector but rather to initialize an empty vector of the appropriate length and fill it in.
N <- nrow(data)
results <- vector(length=N)
for (i in 1:N) {
model.knn <- train.kknn(data[-i,11]~., data=data[-i,1:10], kmax=7, scale=T)
results[i] <- predict(model.knn, data[i,1:10,drop=F])
}

looping through a specific parts of a nested list

I am creating a function that allows me to multiply my data by random proportions, sums them thus creating a mixture of my data multiplied by this Proportion.
For example, if I have 4 data sets, I create a Proportion of 4 random numbers that sum 100 and multiply each data set by each Proportion and sum the result.
Besides that, I want that my function iterates through my dataset and also through my proportions as to permutate through all possible combinations of Proportion dataset multiplication
A sample data set can be seen:
library(LCF)
data(stdmix)
My function currently stands at this Point:
library(combinat)
props <- function(corr.spec.standards = specdat, size, nprop){
if (size < 2) stop("number must be greater than 1")
## create progress bar
try(pb <- txtProgressBar(min = 1, max = nprop, style = 3), silent = TRUE)
## initial loop for proportions
for (i in 1:nprop) {
prop <- sample.int(100, size = size)
prop <- (prop/sum(prop))
permut <- permn(prop)
## permutation loop
for (i in permut[[i]]) {
mapply(`*`,permut, rep(specdat[i]$data$corr.spec$cor.absorption,each=length(permut)))
}
}
My Problem is that specdat is a nested list, which in this example is a list of 8 and that the only members to be multiplied by the Permutation are specdat[i]data$corr.spec%cor.absorption
Thus my Question is: How to loop through a (very) nested list only on a specific member of the list?

Creating an R loop for simulated trials of an experiment

I have an experiment with outcomes that are equally as likely. The sample size is 68 billion -- 4^18. I am trying to create a loop that for x trials generates a list of x elements that are random numbers from 1 to 4^18. I want to make it so the function has a operator that lets me choose the sample size. Here is what I have.
N = 4^18;N
trialsample <- function(x){
Trials <- list()
for(i in 1:x) {
Trials[[i]] <- round(runif(1, 1, N))
}
}
test <- trialsample(5)
test
NULL

simulation of binomial distribution and storing value in matrix in r

set.seed(123)
for(m in 1:40)
{
u <- rbinom(1e3,40,0.30)
result[[m]]=u
}
result
for (m in 1:40) if (any(result[[m]] == 1)) break
m
m is the exit time for company, as we change the probability it will give different result. Using this m as exit, I have to find if there was a funding round inbetween, so I created a random binomial distribution with some prob, when you will get a 1 that means there is a funding round(j). if there is a funding round i have to find the limit of round using the random uniform distribution. I am not sure if the code is right for rbinom and is running till m. And imat1<- matrix(0,nrow = 40,ncol = 2) #empty matrix
am gettin the y value for all 40 iteration I Need it when I get rbinom==1 it should go to next loop. I am trying to store the value in matrix but its not getting stored too. Please help me with that.
mat1<- matrix(0,nrow = 40,ncol = 2) #empty matrix
for(j in 1:m) {
k<- if(any(rbinom(1e3,40,0.42)==1)) #funding round
{
y<- runif(j, min = 0, max = 1) #lower and upper bound
mat1[l][0]<-j
mat1[l][1]<-y #matrix storing the value
}
}
resl
mat1
y
The answer to your first question:
result <- vector("list",40)
for(m in 1:40)
{
u <- rbinom(1e3,40,0.05)
print(u)
result[[m]]=u
}
u
The second question is not clear. Could you rephrase it?
To generate 40 vectors of random binomial numbers you don't need a loop at all, use ?replicate.
u <- replicate(40, rbinom(1e3, 40, 0.05))
As for your second question, there are several problems with your code. I will try address them, it will be up to you to say if the proposed corrections are right.
The following does basically nothing
for(k in 1:40)
{
n<- (any(rbinom(1e3,40,0.05)==1)) # n is TRUE/FALSE
}
k # at this point, equal to 40
There are better ways of creating a T/F variable.
#matrix(0, nrow = 40,ncol = 2) # wrong, don't use list()
matrix(0, nrow = 40,ncol = 2) # or maybe NA
Then you set l=0 when indices in R start at 1. Anyway, I don't believe you'll need this variable l.
if(any(rbinom(1e3,40,0.30)==1)) # probably TRUE, left as an exercise
# in probability theory
Then, finally,
mat1[l][0]<-j # index `0` doesn't exist
Please revise your code, and tell us what you want to do, we're glad to help.

Resources