simulation a while loop - r

there might be some threads on while loops but I am struggling with them. It would be great if someone could help an R beginner out.
So I am trying to do 10000 simulations from a an out of sample regression forecast using the forecast parameters: mean, sd. Thankfully, my data is normal.
This is what I have
N<-10000
i<-1:N
k<-vector(,N)
while(i<N+1){k(,i)=vector(,rnorm(N,mean=.004546,sd=.00464163))}
...and I get this error
Error in vector(, rnorm(5000, mean = 0.004546, sd = 0.00464163)) :
invalid 'length' argument
In addition: Warning message:
In while (i < N + 1) { : the condition has length > 1 and only the first element will be used
I can't seem to get my head around it.

No reason to create a loop here. If you want to put 10000 samples, normal distributed around mean = 0.004546 and sd = 0.00464163 into vector k, just do:
k <- rnorm(10000,mean = 0.004546, sd = 0.00464163)

try this
N<-10
i<-1
k<-matrix(0,1,N)
while(i<N+1){k[i]=rnorm(1,mean=.004546,sd=.00464163)
i=i+1
}
print(k)

To solve your problem, use #Esben Friis' answer. You are taking a hard approach to an easy problem.
To adress the questions you had about the error messages you got however:
Error in vector(, rnorm(5000, mean = 0.004546, sd = 0.00464163)) :
invalid 'length' argument
This is the wrong way to go as vector() will produce a vector of a set length instead of a set of values. You are thinking about the as.vector() function:
as.vector(rnorm(5000, mean = 0.004546, sd = 0.00464163))
This is however not needed as this will only create a new vector of your values, which are already in a vector structure of the type double. Using this function will therefore not change anything.
It is best to simply use:
rnorm(5000, mean=0.004546, sd=0.00464163)
Further:
In addition: Warning message:
In while(i<N+1){: the condition has length>1 and only the first element will be used
This warning stems from i being a vector 1:N with a length larger than 1. The warning states that only the first index in i will be recycled (used in all instances of the loop) which is the same as doing i[1] .
while(i<N+1){ }
#is the same as
while(i[1]<N+1){ }
Instead you want to loop a new value to N. Furthermore you can use the <= (less or equal to) operator instead of doing <N+1 .
while(newVal<=N){ }
This method will bring up new problems which could be solved by using a for() loop instead, but that is however out of the scope of the question and really not the right approach to your problem, as stated in the beginning. Hope you learned something and good luck!

Related

Compute the mean in R subject to conditions

I tried following the advice on this post (Conditional mean statement), however it did not seem to work for me. I get the error Error in x[j] : only 0's may be mixed with negative subscripts.
So I have a database called data with a few different columns and many rows. There is one indicator column, z, which takes value 0 or 1. I want to compute the mean of the column base if z depending on whether z=0 or z=1. So I have used the following line of code:
mean(data[data$z==1, data$base], na.rm = TRUE)
But as mentioned, I get the error Error in x[j] : only 0's may be mixed with negative subscripts. I'm unsure why I am getting this error, or what I could/should do instead. I do not actually understand the error.
Thanks.
Using the comment by GKi worked and solved my problem. In the end I used
mean(data$base[data$z==1], na.rm = TRUE)

R: invalid 'times' argument calculating CRPS

I'm trying to calculate crps using the verification package in R. The data appears to read in ok, but I get an error when trying to compute the CRPS itself: "invalid 'times' argument", however all values are real, no negative values and I'm testing for nan/na values and ignoring those. Having searched around I can't find any solution which explains why I'm getting this error. I'm reading the data in from netcdf files into larger arrays, and then computing CRPS for each grid cell in those arrays.
Any help would be greatly appreciated!
The relevant snipped from the code I'm using is:
##for each grid cell, get obs (wbarray) and 25 ensemble members of forecast eps (fcstarray)
for(x in 1:3600){
for(y in 1:1500){
obs=wbarray[x,y]
eps=fcstarray[x,y,1:25]
if(!is.na(obs)){
print(obs)
print(eps)
print("calculating CRPS - real value found")
crpsfcst=(crpsDecomposition(obs,eps)$CRPS)
CRPSfcst[x,y,w]=crpsfcst}}}
(w is specified in an earlier loop)
And the output I get:
obs: 0.3850737
eps: 0.3382506 0.3466184 0.3508921 0.3428135 0.3416993 0.3423528 0.3307764
0.3372431 0.3394377 0.3398165 0.3414395 0.3531360 0.3319155 0.3453161
0.3362813 0.3449474 0.3340050 0.3278898 0.3380596 0.3379150 0.3429202
0.3467927 0.3419354 0.3472489 0.3550797
"calculating CRPS - real value found"
Error in rep(0, nObs * (nMember +1)) : invalid 'times' argument
Calls: crpsDecomposition
Execution halted
If you type crpsDecomposition on your R command prompt you'll get the source code for the function. The first few lines show:
function (obs, eps)
{
nMember = dim(eps)[2]
nObs <- length(obs)
Since your eps data object appears to be (from your output) a one-dimensional vector, the second element of its dimension is going to be NULL, which sets nMember to NULL. Thus nObs*(nMember + 1) gets evaluated to 0. I imagine you simply need to re-examine what form eps should take because it would appear that it needs to be a matrix where each column corresponds to a different "member" (whatever that means in this context).

read file from memory for regression (R)

when trying to use the shglm function of the speedglm package I have a problem. As the file is too large to read into memory, I wanted to use a link function as outlined in the help pages for the package. The link function is
make.data<-function(filename, chunksize,...){
conn<-NULL
function(reset=FALSE){
if(reset){
if(!is.null(conn)) close(conn)
conn<<-file(filename,open="r")
} else{
rval<-read.table(conn, nrows=chunksize,...)
if ((nrow(rval)==0)) {
close(conn)
conn<<-NULL
rval<-NULL
}
return(rval)
}
}
}
load(ti.RData)
I then take my data fram (called ti) and write it to table
write.table(ti,"data1.txt",row.names=FALSE,col.names=FALSE)
as in the example here http://www.inside-r.org/packages/cran/speedglm/docs/shglm. Afterwards
da<-make.data("data1.txt",chunksize=10000,col.names=colnames(ti))
rm(ti)
b1<-shglm(T2D~factor(SIBCO)+factor(POCOD),datafun=da,family=binomial())
But I get an error
Error in dev.resids(y, mu, weights) :
argument mu must be a numeric vector of length 1 or length 802
I am happy to upload my data set but can somebody maybe roughly tell me where to start debugging? I think when reading in data1.txt through the link function ( with the read.table) some factors in the original data frame are by this operation converted to integers. This is the reason I put factor around the variables. Any suggestion wpould be very helpful
The short answer is that there is probably something wrong with your input data. Without the input data it is hard to say but based on my experience to run shglm with a binomial glm with factors this is where I would start.
As a general debugging strategy you can try something like the following:
add the lines debug(shglm) and options(error=recover) to your script
turn on the trace=T option for shglm
start R and load your script as source("myscript.R")
step through the debugger and use ls() to see the variables currently present and inspect them with dim() colnames() etc.
Now in my experience shglm returns rather cryptic error messages that may change depending on the size of your input chunks (as this changes the data and the factors the model knows about). Below I list a couple of things to check in your data and some common errors that I encountered while getting it to work which may help you to get your own model running.
Regarding the data, make sure that:
The dependent variable is 0/1 or that it is a proportion 0 <= y <= 1 (in case you have successes and failures, you can use the weights parameter to give the total number of tries and calculate the proportion in the formula, i.e., success/(success + failures), common errors are:
Error in if (any(y < 0 | y > 1)) stop("y values must be 0 <= y <= 1") :
missing value where TRUE/FALSE needed
Calls: shglm -> eval -> eval
Specify all the levels of the factors (don't forget default values) and make sure that they are sorted, i.e., factor(age, levels("24andbelow, 25to49, "50to74", "75andover")), otherwise you will get errors like:
Error in crossprod(weights, y) : non-conformable arguments Calls: shglm -> crossprod -> crossprod
Error in XTX[rownames(Ax), colnames(Ax)] : subscript out of bounds
Calls: shglm
Now I did not get your specific error but something close enough that I thought I should mention. Here I tried to supply a formula with two columns (for successes and failures as you can in regular glm), i.e., cbind(success, failures)~factor(var1) + factor(var2)
Error in dev.resids(y, mu, weights) :
argument wt must be a numeric vector of length 1 or length 10
Calls: shglm -> dev.resids
I guess the main take away is to check your input data.

String Loop to Form a Variable name in R

calld=data.frame(matrix(rnorm(100*50,0,1),1000,50))
for (x in names(calld)) {
assign(paste("calld$",x,sep=""),pnorm(get(paste("calld$",x,sep="")),0,1,lower.tail=T,log.p=F))
}
Error in get(paste("calld$", x, sep = "")) : object 'calld$X1' not found
Am I using the get function correctly?? I am trying to concatenate the names of the data set via a loop and paste of it's existing valued by passing the values through a pnorm (cumulative normal distribution function). But I keep getting an error. The function works when I call the variable names in the "calld" dataframe. The problem is the concentration process of creating the loop. Where am I going wrong? I appreciate your help
Update::
I took your advice guys and reedited the loop, to.
for (n in names(calld)) {
get("calld")[[n]]=pnorm(get("calld")[[n]],0,1,lower.tail=T,log.p=F)
}
Error in get("calld")[[n]] = pnorm(get("calld")[[n]], 0, 1, lower.tail = T, :
target of assignment expands to non-language object
But now I am getting this new error. So everything on the right hand side of the equation in the loop when I tested it it works. The error arises when I set it the value equal to itself, replacing the prior values.
Have mercy on kittens!
You can't use assign this way, nor get.
calld[] <- lapply(calld, pnorm, mean = 0, sd = 1)
Explanantion: calld[]<- replaces all existing columns of calld (whilst retaining the structure as a data.frame) with the results of lapply(calld, pnorm, mean = 0, sd = 1) which cycles through all columns of calld, applying pnorm on each one.
library(fortunes)
fortune(312)
The problem here is that the $ notation is a magical shortcut and like any other magic if used incorrectly is likely to do the programmatic equivalent of turning yourself into a toad.
-- Greg Snow (in response to a user that wanted to access a column whose name is stored in y via x$y rather than x[[y]])
R-help (February 2012)

Why the parameter I am trying to estimate is "not found"?

I am trying to optimise my likelihood function of R_j and R_m using optim to estimate al_j, au_j, b_j and sigma_j. This is what I did.
a = read.table("D:/ff.txt",header=T)
attach(a)
a
R_j R_m
1 2e-03 0.026567295
2 3e-03 0.009798475
3 5e-02 0.008497274
4 -1e-02 0.012464578
5 -9e-04 0.002896023
6 9e-02 0.000879473
7 1e-02 0.003194435
8 6e-04 0.010281122
The parameters al_j, au_j, b_j and sigma_j need to be estimated.
llik=function(R_j,R_m)
if(R_j< 0)
{
sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2]
}else if(R_j>0)
{
sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2]
}else if(R_j==0)
{
sum(log(pnorm(au_j,mean=b_j*R_m,sd=sigma_j)-pnorm(al_j,mean=b_j*R_m,sd=sigma_j)))
}
start.par=c(al_j=0,au_j=0,sigma_j=0.01,b_j=1)
out1=optim(llik,par=start.par,method="Nelder-Mead")
Error in pnorm(au_j, mean = b_j * R_m, sd = sigma_j) :
object 'au_j' not found
It is difficult to tell where to start on this.
As #mac said, your code is difficult to read. It also contains errors.
For example, if you try sum[c(1,2)] you will get an error: you should use sum(c(1,2)). In any case, you seem to be taking the sum in the wrong place. You cannot use if and else if on vectors, and need to use ifelse. You have nothing to stop the standard deviation going negative. There is more.
The following code runs without errors or warnings. You will still have to decide whether it does what you want.
a <- data.frame( R_j = c(0.002,0.003,0.05,-0.01,-0.0009,0.09,0.01,0.0006),
R_m = c(0.026567295,0.009798475,0.008497274,0.012464578,
0.002896023,0.000879473,0.003194435,0.010281122) )
llik = function(x)
{
al_j=x[1]; au_j=x[2]; sigma_j=x[3]; b_j=x[4]
sum(
ifelse(a$R_j< 0, log(1/(2*pi*(sigma_j^2)))-
(1/(2*(sigma_j^2))*(a$R_j+al_j-b_j*a$R_m))^2,
ifelse(a$R_j>0 , log(1/(2*pi*(sigma_j^2)))-
(1/(2*(sigma_j^2))*(a$R_j+au_j-b_j*a$R_m))^2,
log(pnorm(au_j,mean=b_j*a$R_m,sd=sqrt(sigma_j^2))-
pnorm(au_j,mean=b_j*a$R_m,sd=sqrt(sigma_j^2)))))
)
}
start.par = c(0, 0, 0.01, 1)
out1 = optim(llik, par=start.par, method="Nelder-Mead")
Let's start with the error message:
Error in pnorm(au_j, mean = b_j * R_m, sd = sigma_j) :
object 'au_j' not found
So R is telling you that when it got to the pnorm call, it couldn't find anything called 'au_j' to use in that call. Your next step should be to look at your function, llik, and try to identify how you expect the variable 'au_j' to be defined within that function.
At this point, the answer should be fairly clear (maybe!). Nowhere in llik is the variable 'au_j' assigned a value. So it won't be 'created' inside the function. R's scoping rules will then cause it to look outside the function in the global environment for something called 'au_j'.
And you might say that here is where things should work, since you assigned 'au_j' a value within start.par. But that's a list, and R can't find the named object 'au_j' inside a list like that.
So the solution here is most likely to rework your function llik so that it takes as arguments everything that it will use, so you're going to add everything in start.par to the arguments of llik. Something like:
llik <- function(par=c(al_j,au_j,sigma_j,b_j),R_j,R_m){...}
and then within llik you'll refer to al_j using par[1] and so forth. Then the optim call should look something like:
optim(start.par,llik,R_j=a$R_j,R_m=a$R_m)
Since you've attached your data, in a, you probably don't have explicitly pass the arguments R_j and R_m in the optim call, but it's probably good practice to do so.
I think I've reconstructed what you're trying to accomplish here (modulo the math, which I haven't even glanced at), but I confess that your code is a bit hard to parse. I would suggest spending some time with the examples in ?optim to make sure you understand how that function is called.

Resources