subscript out of bounds R for large for-loop - r

I have a for loop which works perfectly for (relative) small amount of repetitions times=10,100. But for larger values for "times" I get an error by filling-in the matrix: subscript out of bounds... (See code below, and code-explanation)
Error:
Error in M_zp_var[j, (1:n)] : subscript out of bounds
To fill the matrix I use a second for-loop in the first one
M_zp<-matrix(numeric(1),B,N)
for(j in 1:dim(M_zp)[1]){ M_zp[j,]<-Z_pi(x,y) }
and I even tried it with
M_zp<-t(replicate(B, Z_pi(x,y)))
instead, but it does not work either.
As I said the code works if my outer-for-loop is small. I do not understand why should it work differently if I choose the variable "times" larger (1000, 5000).
I know the meaning of "subscript out of bounds"
I hope you can help me I would be so greatfull! :(
Code:
T_p<- matrix(numeric(1),times,Bvar)
H0 <- numeric(times)
for(i in 1:times){
# Each loop starts with a new random vector
x<-rnorm(n) #x<- rdist(n,...); #
y<-rnorm(m) #y<- rdist(m,...); #
# Order statstic with T_pi's T(Z(pi))
n<-length(x)
m<-length(y)
N<-n+m
#Permutate (x,y) B times
**M_zp<-matrix(numeric(1),B,N)
for(j in 1:dim(M_zp)[1]){ M_zp[j,]<-Z_pi(x,y) }** #see below for Z_pi() funct.
#M_zp<-t(replicate(B, Z_pi(x,y)))
M_zp_var<-unique(M_zp)
Bf<-dim(unique(M_zp))[1]
#Test statistic computation for each one of the Permutations in M_zp_var
T_pvec<-numeric(Bvar)
for(j in 1:Bvar){
xp<-M_zp_var[j,(1:n)]
yp<-M_zp_var[j,((n+1):(n+m))]
m_xp<- mstern(xp)
m_yp<- mstern(yp)
T_pvec[j]<-(sd(xp)-sd(yp))/sqrt(m_xp/n+m_yp/m)
}
T_p[i,]<-sort(T_pvec)
}
Z_pi<-function(x,y){
n<-length(x)
m<-length(y)
x<-sort(x)
y<-sort(y)
wicy<-function(s){ wi<-which(y==s); return(wi)}
wicx<-function(s){ wi<-which(x==s); return(wi)}
r<-sample(1:min(n,m),1) #interger beween 1 and min(m+n) = length of entries to interchange between x and y
zwy<-sample(y,r) #vector of length r out of entries in y (randomly) =intries to interchange with x
wy<-unlist(lapply(zwy,wicy))
zwx<-sample(x,r)
wx<-unlist(lapply(zwx,wicx))
yp<-y
xp<-x
yp[wy]<-zwx
xp[wx]<-zwy
zp<-c(sort(xp), sort(yp))
return(zp)
}
Explanation:
The loop simulates "times"-times a permutation test (Monte-Carlo). One loop generates 2 random samples and then permutates them randomly B=16000 times (a permutation is made with the function Z_pi() ), then it takes only the different permuatations (unique() ) And out of each permutation it calculates a Test statistic... The codes stops working at the marked row (in bold) which involves the function Z_pi().

Related

For Loop : number of items to replace is not a multiple of replacement length

I am working on an R Markdown document using a DICE (climate-economy) model.
I have a function YPC that depends on two variables tt and r. My goal is to obtain a data frame that contains all possible images of YPC (for all possible values of r and tt), with the cell YPC[i,j] containing the value of YPC for i=tt and j=r. For each line of my matrix (each value of tt), I want every cell after a certain value r=l[tt] to be filled with the value 0.
Example : l[1]=100 ; Then all the cells on the first line after the 100th column should be filled with zero.
I create my matrix and define its cells as follows
YPC<-matrix(nrow = NT, ncol = l[100])
for (i in 1:NT){
for (j in 1:l[100]) {
if (j<=l[i]) {
YPC[i,j] <- fYPC(Y, r=j, tt=i)}
else{
YPC[i,j] <- 0}
}
}
When I print the generated cells, I see that it works great up to the l[i] value. At the l[i] value, it prints "Inf" and a warning message : number of items to replace is not a multiple of replacement length. At the end of the computation, I obtain a "no loop for break/next, jumping to top level" error.
I don't write all my code here as it is quite long and only few elements seem relevant.
Also, a reproducible example is hard to provide as I don't use a regular dataset here. I only have simple data for parameters of the model that my code builds on to create several equations that use those parameters.
EDIT
Thanks for your answers.
Here is how fYPC is defined :
fYPC <- function(Y, r, tt=NULL){
if (is.null(tt)){
output=(exp(Y/l+et2*sqrt(2)*erfinv(2*r/l-1))+exp(Y/l+et2*sqrt(2)*erfinv(2(r-1)/l-1)))
}else {
output=(exp(Y[tt]/l[tt]+et2*sqrt(2)*erfinv(2*r/l[tt]-1))+exp(Y[tt]/l[tt]+et2*sqrt(2)*erfinv(2*(r-1)/l[tt]-1)))
}
return(output)
}
In the is.null part, Y is a value so the output is also of length 1.
In the else part, Y is a vector, then Y[tt] and l[tt] are length 1 and so is the final output
About NT, you can actually replace it by 100. And l[i] is defined as follows (with pop0, popasym, popadj respectively 100, 1000, 0.134) :
l = pop0
for(i in 2:NT) l[i] <- l[i-1] * (popasym / l[i-1])**popadj

Forloop, Conditional Statements, Storing Data

What i am trying to accomplish in the forloop and am still having difficulty with:
prb[i]=(prb[i-1]*ER)+b[i]
prb[1]=(prb[0]*ER)+b[1]
prb[2]=(prb[1]*ER)+b[2]
and then output prb[1,2,3....] from the left hand side of the equation.
Also, defining SS to reflect prb at the previous time step (i.e. (prb-1))
I have attempted to save the results from my forloop in the empty vectors. However, the values outputted into the vectors are the same values(i.e. from the first iteration) and it doesn't appear to be having the additive affect I am attempting. Somethings appears to be wrong with the logic of my code. I would like for the values from b[i] to be used in b[i+1] for the next run of the forloop. Does anyone have any ideas or solutions to problem? Best!
Matthew
#Parameters
c=0.2
A=5
d=8
d0=5
s=0.5
e=0.1
p=0.6
ER=e/A
#Colonization Equation Probabilities
C2 = c*A*exp(-d*s/d0) #ML to SS
#Empty Vectors
l=vector(mode="numeric", length=100) #open vectors to store the different probability values from the forloop
b=vector(mode="numeric", length=100)
prb=(l*ER) + b #total probability of SS being colonized
#Island States
ML=1
SS=prb
n.I=c(ML, SS)
#Forloop and Conditional Statements
for(i in 2:101) {
(SS[i]=prb[i-1])
(prb[i]=(l[i]*ER)+b[i])
if (SS < 1 ) {
(l[i]=prb[i-1])
} else if(SS < 1){
b[i]=C2
}
}
I copied your code and ran it. My observations are:
First of all, in R, vector indices start from 1 not 0.
So SS=prb[0:1] yields prb[0] which is just 0.
Since to execute the line for populating prb is conditional on SS being >= 1, you can never run this line with this condition. When prb[1] is not 1 or larger, you cannot alter it so there is a vicious circle.
Even if you can run it by satisfying the condition (a different one of course), since for the first iteration prb[0] is non existent so prb[0]*ER just yields numeric(0), which is an empty vector item!
So first you should modify your vector indices, then you should modify your condition so that it allows for a jump start.

the for loop shows NA for every iteration except for the last one

I wanted to run a t.test on every row in the matrix I have constructed. Then I tried to pull out the values for the confidence intervals and save them in separate vectors with one output per each iteration. However, after I run the code I get L1=NA NA NA....8.155677. I would be greatful if you could point out the mistakes.
(I understand there are numerous ways to write this code cleaner but, I tried to write it step-by-step.)
set.seed(1234)
n= 24 # sample size or a number of RV's
N=100 # number of exrtractions or a number of sums for each rv
X=rnorm(N*n, 9, 1.5 ) # generate rv's
XMat=matrix(X,nrow=N)
#Problem Part:
L1=c()
L2=c()
for(i in N)
{
s=XMat[i,1:n]
K=t.test(s,conf.level=0.95)
M=K$conf.int
l1=M[1]
l2=M[2]
L1[i]=l1
L2[i]=l2
}
Change the loop control to:
for (i in seq(N))
You are running the loop for a single value of i in your code.

Indexing variables in R

I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.

vector-matrix multiplication in r

I want to multiply 1000 random variables to a matrix so as to get 1000 different resultant matrices.
I'm running the following code :
Threshold <- runif(1000,min=0,max=1) #Generating 1000 random variables so that we can see 1000 multiple results of Burstscore
Burstscore <- matrix(data=0,nrow=nrow(Fm2),ncol=ncol(Fpre2))
#Calculating the final burst score
for (i in 1:nrow(Fm2)){
for (j in 1:ncol(Fpre)){ #Dimentions of all the matrices (Fpre,Fm,Growth,TD,Burstscore) are 432,24
{
Burstscore[i,j]= ((as.numeric(Threshold))*(as.numeric(Growth[i,j]))) + ((1-(as.numeric(Threshold)))*(as.numeric(TD[i,j])))
}
}
}
I'm getting the following error -
'Error in Burstscore[i, j] = ((as.numeric(Threshold)) * (as.numeric(Growth[i, :
number of items to replace is not a multiple of replacement length'
You are trying to put in one cell of the Burstscore matrix 1000 values (as you are multiplying each [i,j] one by the entire "Threshold" vector). Apart from this, your code contains unnecesary elements (brackets or as.numeric() statements). And, of course, as said above, it is not fully reproducible, and I had to "invent" several matrices.
I guess that what you want to do is the following:
Threshold <- runif(1000,min=0,max=1)
Growth <- matrix(runif(432*24), ncol=24)
Burstscore <- vector("list", length(Threshold))
for (i in 1:length(Threshold)) {
Burstscore[[i]]= (Threshold[i]*Growth) + ((1-Threshold[i])*TD)
}
In R, it would be even more elegant to use a lapply() function:
Burstscore <- lapply(Threshold, function(x) (x*Growth)+((1-x)*TD))
Finally, I suggest you also put a more meaningful title to your question, so it could potentially be helpful to others also.

Resources