I am using R to compute an instrumental variable regression. Specifically, I am calling felm from package lfe. The response variable is cost, and id and dates are factors that will be used for fixed effects. I am almost sure that my data is not rank deficient in any way, but no matter how I slice it, I keep getting this error:
Error in if (rank == N) return(chol(mat)) : argument is of length zero
When I tried debugging line by line, I was stepping deeper and deeper into various package function calls, and I couldn't make sense of it. Here's summary information about the data frame, limited to just the rows I'm feeding into the call. You can see that the only NA's are in the "cost" field.
Here is the call that generates the "argument of length zero" error.
trial_model = felm(formula = cost ~ covariate.P_t | id + dates | (covariate.TiPt ~ covariate.AiPt) | id,data=rawDataSimple,subset=rows_to_use)
I get the same error when I include this argument:
na.action=na.omit
I get the same error when I also include this argument:
exactDOF="rM"
So I'm stuck. Any thoughts on how to diagnose this problem, either in the felm call or in my data frame?
It turned out to be an issue with the specific versions of R and R-Studio that I had installed. We upgraded to the latest R and R-Studio, and felm worked just fine after that.
I've gotten a similar error with felm calls in the past at various times for different reasons. For posterity, here is my list of solutions that worked at some point, in increasing order of effort required:
Restarting R/RStudio/Computer
Updating installed packages
Updating RStudio
Updating R
Removing a bad RAM module
[I updated my answer because I ran into this again.]
Related
Without the ability to post the actual or sandbox data (apologies), I am running the following model using a sample of about 1,000,000 observations, about 150 factor levels (X4 = calendar time dummies), and about 600 clusters (X5 = cross sectional units):
lm <- lm_robust(Y ~ X1+X2+X3+as.factor(X4), cluster = X5, df)
which generates the following error
Error in FUN(newX[, i], ..., rstudio.notebook.executing = FALSE) :
operator needs one or two arguments
What I don't understand is why summary(lm) generates an output that does seem reasonable. And my internet search has not resulted in anything posted that addresses or references this issue.
Does anyone have an idea what is going on or where I could look for what generates this error. Or is there another model I should use?
Thanks for any input!
Using again RStudio 2021.09.3, but now estimating the above regression under R 4.1.2 with the updated package estimatr version 0.30.6, there were no errors. As a matter of fact, none of the many similar models I then subsequently estimated had an issue.
At this point, I assume that there must have been some incompatibilities between some package versions and/or R 4.1.0 and updating R and the packages resolved the problem.
Asking for a friendly soul that knows how to fix this error or that understands why this is appearing.
Not exactly sure what happened but this error appears every time I try to do a GAM with random effects (bs="re", with mgcv package). This is strange since appears not only to new models but even to models that previously worked (multiple times).
I made sure the data has no NA's, scientific data, or random formulas. Also, I am not using the date format to avoid errors has previously worked as it is.
I also tried to transform the data into a data frame via as.data.frame(x) but the same error occurred.
I have been playing a bit with the formula and it appears that every time the random effects bs="re" are present, either the 2 of them (Site, State) or only one of them (Site), it is when the error occurs. If I take them completely out of the formula it works perfectly.
I am thinking that could be:
Some incompatibility with another package that I may have installed but tried to solve this with no effect. Removed all the most recently installed packages and the error persisted.
Other could be any update to the mgcv package?
Update: It works in R software just not in R studio.
Does anyone have an idea on how to fix this or why this is appearing
The following model was previously working but not anymore, giving me every single time the mentioned error
gam_2a <- gam(Total_Items ~ s(DayI0, k=14) + s(Site, State, bs="re"), offset(log(EffortDayC)),data = x,family=poisson(link="log"),method = "REML")
Description of the variables:
Total_Items = Number of items of debris found per event;
DayI0 = Number of days since first clean up (numeric);
Site = Site of sampling (Sites are within States);
State = State of sampling;
EffortDayC = Effort(Length of the beach, number of volunteers, duration of sampling)*DayC(interval of sampling);
The str(x) output below:
enter image description here
And the head of the data to understand a bit better:
enter image description here
enter image description here
Sorted! the package agricolae was causing some incompatibility with the mgcv package.
Good afternoon, all--thank you in advance for your help! I'm somewhat new to R, so my apologies if this is a trivial or otherwise inappropriate question.
TL;DR: I'm trying to determine Variable Importance (VIM) for factor variables with a random forest model built-in RandomForestSRC, which is not a built-in feature of that package. Using both the LIME and DALEX packages, I encounter the same error: cannot coerce class 'c("rfsrc, "predict", "class")' to a data.frame. Any assistance resolving this error, or alternate approaches, would be greatly appreciated!
I have a random forest model I've built in R, using the RandomForestSRC package. The model seems to work great--training and testing went fine, got the predicted output I needed, results seem in-line with what I would expect. Unfortunately, one of the requirements is that I need to be able to indicate how the model arrived at its conclusions (eg, I need to also include variable importance as a part of the output), for both continuous and factor variables.
This doesn't seem to be a built-in feature with the RandomForestSRC package, so I've looked into both the LIME and DALEX packages, both of which should be able to break out VIM from the existing RF model. Unfortunately, neither have native support for the RFSRC package, which means I've needed to build in the prediction functions myself, as recommended by this vignette:https://uc-r.github.io/dalex
model_type.rfsrc <- function (x, ...) {
return ('classification')
}
predict_model.rfsrc <- function (x, newdata, type, ...) {
as.data.frame(predict(x, newdata, ...)
}
Unfortunately, in running the VIM section of the model (in both LIME and DALEX), I'm asked to pass both the predicted output and the model that created that output. In doing so, it hits an error with the above predict_model function:
error in as.data.frame.default(predict(model, (newdata))):
cannot coerce class 'c("rfsrc, "predict", "class")' to a data.frame
And, like...of course, it can't; it's trying to turn the model itself into a data frame. Unfortunately, while I think I understand why R is giving me that error, that's about as far as I've been able to figure out on my own.
Additionally, I'm using the RandomForestSRC package for two reasons: it doesn't put a limit on the number of factor variables, and it can handle imbalanced data. I'm working with medical data, so both of these are necessary (eg, there are ~100,000 different medical codes that can be encoded in a single data variable, and the ratio of "people-who-don't-have-this-condition" vs "people-who-do-have-this-condition" is frequently 100 to 1). If anyone has any suggestions for alternative packages that handle these issues, though, and have built-in VIM functionality (or integrate with DALEX / LIME), that would be fantastic as well.
Thank you all very much for your help!
I am trying to compute a first order VAR model using the plm package, using first differences for the variables and the instruments. First, I am trying to solve the equation in respect to one variable, as I think the library only works in this fashion. The code I used is the following:
model <- pgmm(variable1~lag(variable2, 1) | lag(variable1, 1), data=d, effect="twoways", model="onestep", transformation = "ld")
R returns an error:
Error in solve.default(crossprod(WX, t(crossprod(WX, A2)))) :
system is computationally singular: reciprocal condition number = 1.62316e-2
What is wrong with the equation? I have tried everything that comes to my mind. Specifically the part after the | sign interests me. I thought that the instruments are supposed to go there, but after reading the manual I am not sure exactly what is supposed to be put there.
Here is more info regarding the package: https://cran.r-project.org/web/packages/plm/vignettes/plm.pdf // Pages 23 -->
EDIT: Upon restricting the model to "individual", I got a warning. I suppose everything is working fine as the result looks all right.
If you shorten the sample period it is very likely that the error will not appear. For example, if it is a period from 1990-2020 it is very likely that it will be an error, but if you change the period to 1990-2000 the model will be computed.
Heading
My model failed with the following error:
Compiling rjags model...
Error: The following error occured when compiling and adapting the model using rjags:
Error in rjags::jags.model(model, data = dataenv, inits = inits, n.chains = length(runjags.object$end.state), :
Error in node Y[34,10]
Observed node inconsistent with unobserved parents at initialization.
Try setting appropriate initial values.
I have done some diagnosis and found that there was a problem with initial values in chain 3. However, this can happen from time to time. Is there any way to tell run.jags or JAGS itself to re-try and re-run the model in such cases? For example, to tell him to make another N attempts to initialize the model properly. That would be very logical thing to do instead of just failing. Or do I have to do it manually with some tryCatch thing?
P.S.: note that I am currently using run.jags to run JAGS from R.
There is no facility for that provided within runjags, but it would be fairly simple to write yourself like so:
success <- FALSE
while(!success){
s <- try(results <- run.jags(...))
success <- class(s)!='try-error'
}
results
[Note that if this model NEVER works, the loop will never stop!]
A better idea might be to specify an initial values function/list that provides initial values that are guaranteed to work (if possible).
In runjags version 2, it will be possible to recover successful simulations when some simulations have crashed, so if you ran (say) 5 chains in parallel then if 1 or 2 crashed you would still have 3 or 4. That should be released within the next couple of weeks, and contains a large number of other improvements.
Usually when this error occurs it is an indication of a serious underlying problem. I don't think a strategy of "try again" is useful in general (and especially because default initial values are deterministic).
The default initial values generated by JAGS are given by a "typical" value from the prior distribution (e.g. mean, median, or mode). If it turns out that this is inconsistent with the data then there are typically two possible causes:
A posteriori constraints that need to be taken into account, such as
when modelling censored survival data with the dinterval
distribution
Prior-data conflict, e.g. the prior mean is so far
away from the value supported by the data that it has zero
likelihood.
These problems remain the same when you are supplying your own initial values.
If you think you can generate good initial values most of the time, with occasional failures, then it might be worth repeated attempts inside a call to try() but I think this is an unusual case.