margin() function in the R randomForest package not working - r

I am going through the package documentation for randomForest to see what different functions do. I got to the margin() function and the example in the documentation did not work for me.
The code I was trying was
library(randomForest)
set.seed(1)
data(iris)
iris.rf <- randomForest(Species ~ ., iris, keep.forest=FALSE)
plot(margin(iris.rf))
I get the error
Error in unit(c(t, r, b, l), unit) : (list) object cannot be
coerced to type 'double'
I have no idea what is going wrong.
I am on using Windows 10, R version 3.3.2 (2016-10-31), randomForest v. 4.6-12

I restarted my session and everything worked again.
I must have had some sort of setting earlier in the documentation that screwed with how it interpreted the random forest.

Just jumped into this issue myself. As stated in a comment, the solution lies in specifying the package under which the margin function is desired to be used from; in this case randomForest
Thus the code line to change is
plot(randomForest::margin(iris.rf))

Related

lm_robust() generates "Error in FUN(...): operator needs one or two arguments" but results seem correct

Without the ability to post the actual or sandbox data (apologies), I am running the following model using a sample of about 1,000,000 observations, about 150 factor levels (X4 = calendar time dummies), and about 600 clusters (X5 = cross sectional units):
lm <- lm_robust(Y ~ X1+X2+X3+as.factor(X4), cluster = X5, df)
which generates the following error
Error in FUN(newX[, i], ..., rstudio.notebook.executing = FALSE) :
operator needs one or two arguments
What I don't understand is why summary(lm) generates an output that does seem reasonable. And my internet search has not resulted in anything posted that addresses or references this issue.
Does anyone have an idea what is going on or where I could look for what generates this error. Or is there another model I should use?
Thanks for any input!
Using again RStudio 2021.09.3, but now estimating the above regression under R 4.1.2 with the updated package estimatr version 0.30.6, there were no errors. As a matter of fact, none of the many similar models I then subsequently estimated had an issue.
At this point, I assume that there must have been some incompatibilities between some package versions and/or R 4.1.0 and updating R and the packages resolved the problem.

Robust standard error (HC3) using vcovHC(), coeftest for plm object

I want to estimate a fixed effect model and use a robust variance-covariance matrix with the HC3 small-sample adjustment.
For the model itself I use following lines of code:
require(plm)
require(sandwich)
require(lmtest)
require(car)
QSFE <- plm(log(SPREAD)~PERIOD, data = na.omit(QSREG), index = c("STOCKS", "TIME"), model = "within")
This works very fine, now to calculate the HC3 robust standard error, I use used the function coeftest with vcovHC in it.
coeftest(x = QSFE, vcov = vcovHC(QSFE, type = "HC3", method = "arellano"))
And this does not work. The returned error goes as follows:
Error in 1 - diaghat : non-numeric argument to binary operator
The issue is in vcovHC: when one sets the type to "HC3". It uses the function hatvalues() to calculate "diaghat", which does not support plm objects and returns the error:
Error in UseMethod("hatvalues") :
no applicable method for 'hatvalues' applied to an object of class "c('plm', 'panelmodel')"
Does anyone know, how to use the HC3 (HC2) estimator for plm. I think it should depend on the function hatvalues used in vcov, since HC0/HC1 works fine, because this do not need it.
plm developer here. While the efficiency issue is interesting computationally, from a statistical viewpoint these small-sample corrections are not needed when you have a 300 x 300 panel. You can happily go with HC0 (or if you definitely want a panel small-sample correction "sss" (panel DF) would be best anyway, and the latter is computationally much lighter).
The fact that small-sample corrections become useless when the data size increases is the main reason we did not allocate scarce developer time into making them more efficient.
Also, from a statistical viewpoint please be aware that the properties of "clustering" vcovs like White-Arellano are less than ideal for T ~ N, they are meant for N >> T.
Lastly, one clarification re: your original post: while originally vcovHC is a generic function in the 'sandwich' package, in a panel context the specialized method vcovHC.plm from the 'plm' package is applied.
Better explanation here: https://www.jstatsoft.org/article/view/v082i03
In the method supplied by plm for plm objects, there is no function hatvalues in package plm, the word "hatvalues" is not even in plm's source code. Be sure to have package plm loaded when you execute coeftest. Also, be sure to have the latest version of plm installed from CRAN (currently, version 2.2-3).
If you have package plm loaded, the code should work. It does with a toy example on my machine. To be sure, you may want to force the use of vcovHC as supplied by plm:
Fist, try vcovHC(QSFE, type = "HC3", method = "arellano"). If that gives the same error, try plm::vcovHC(QSFE, type = "HC3", method = "arellano").
Next, please try:
coeftest(QSFE, vcov.=function(x) vcovHC(QSFE, method="arellano", type="HC3"))
Edit:
Using the data set supplied, it is clear that dispatching to vcovHC.plm works correctly. Package sandwich is not involved here. The root cause is the memory demand of the function vcovHC.plm with the argument type set to "HC3" (and others). This also explains your comment about the function working for a subset of the data.
Edit2:
Memory demand of vcovHC.plm's small sample adjustments is significantly lower from plm version 2.4-0 onwards (internal function dhat optimized) and the error does not happen anymore.
vcovHC(QSFE, type = "HC3", method = "arellano")
Error in 1 - diaghat : non-numeric argument to binary operator
Called from: omega(uhat, diaghat, df, G)
Browse[1]> diaghat
[1] "Error : cannot allocate vector of size 59.7 Gb\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError: cannot allocate vector of size 59.7 Gb>

Error in cv.glmnet for poisson with an offset

I am facing an error when trying to run a cross validation on glmnet for family = poisson using an offset.
I managed to replicate the error with the very simple example below:
library(glmnet)
#poisson
N=500; p=20
nzc=5
x=matrix(rnorm(N*p),N,p)
beta=rnorm(nzc)
f = x[,seq(nzc)]%*%beta
mu=exp(f)
y=rpois(N,mu)
exposure=rep(0.5,length(y))
#cross validation
cv=cv.glmnet(x,y,family="poisson",offset=log(exposure),nlambda=50,nfolds=3)
which returns the following error:
Error: No newoffset provided for prediction, yet offset used in fit of
glmnet
I can't figure out what I'm doing wrong here. And wasn't able to find any help on the internet. Would anyone have ideas?
Thanks a lot!
EDIT : this issue is obsolete, and was linked to the version 2.0-12 of the glmnet package - fixed when updating to version 2.0-13
This works:
predict(cv,x,newoffset=log(exposure))
From the documentation for glmnet for the offset parameter:
If supplied, then values must also be supplied to the predict
function.

Difficulty getting Caret GLM with Repeated CV to execute

I have been doing 10X10-fold cv logistic models for a long time using homebrew code, but recently have figured that it might be nice to let caret handle the messy stuff for me.
Unfortunately, I seem to be missing some of the nuances that caret needs to be able to function.
Specifically, I keep getting this error:
>Error in { : task 1 failed - "argument is not interpretable as logical"
Please see if you can pick up what I am doing wrong...
Thanks in advance!
Data set is located here.
dataset <- read.csv("Sample Data.csv")
library(caret)
my_control <- trainControl(
method="repeatedcv",
number=10,
repeats = 10,
savePredictions="final",
classProbs=TRUE
)
This next block of code was put in there to make caret happy. My original dependent variable was a binary that I had turned into a factor, but caret had issues with the factor levels being "0" and "1". Not sure why.
dataset$Temp <- "Yes"
dataset$Temp[which(dataset$Dep.Var=="0")] <- "No"
dataset$Temp <- as.factor(dataset$Temp)
Now I (try) to get caret to run the 10X10-fold glm model for me...
testmodel <- train(Temp ~ Param.A + Param.G + Param.J + Param.O, data = dataset,
method = "glm",
trControl = my_control,
metric = "Kappa")
testmodel
> Error in { : task 1 failed - "argument is not interpretable as logical"
Though you already found a fix by updating R and caret, I'd like to point out there is (was) a bug in your code which caused the error, and which I can reproduce here with an older version of R and caret:
The savePredictions of trainControl is meant to be set to either TRUE or FALSE instead of 'final'. Seems you simply mixed it with the returnResamp parameter, which would take exactly this parameter.
BTW: R and caret have restrictions on level names of factors, which is why caret complained when you handed 0 and 1 level names for the dependent variable to it. Using a simple dataset$Dep.Var <- factor(paste0('class', dataset$Dep.Var)) should do the trick in such cases.
I don't have enough reputation to comment, so I am posting this as an answer. I ran your exact code, and it worked fine for me, twice. I did get this warning:
glm.fit: fitted probabilities numerically 0 or 1 occurred
As per the author, this error had something to do with the savePredictions parameter. Have a look at this issue:
https://github.com/topepo/caret/issues/304
Thanks to #Sumedh, I figured that the problem might not be with my code, and I updated all my packages.
Surprise! Now it works. So I wasn't crazy after all.
Sorry all for the fire drill.

Error with R lfe package felm call: rank problems, chol() problems

I am using R to compute an instrumental variable regression. Specifically, I am calling felm from package lfe. The response variable is cost, and id and dates are factors that will be used for fixed effects. I am almost sure that my data is not rank deficient in any way, but no matter how I slice it, I keep getting this error:
Error in if (rank == N) return(chol(mat)) : argument is of length zero
When I tried debugging line by line, I was stepping deeper and deeper into various package function calls, and I couldn't make sense of it. Here's summary information about the data frame, limited to just the rows I'm feeding into the call. You can see that the only NA's are in the "cost" field.
Here is the call that generates the "argument of length zero" error.
trial_model = felm(formula = cost ~ covariate.P_t | id + dates | (covariate.TiPt ~ covariate.AiPt) | id,data=rawDataSimple,subset=rows_to_use)
I get the same error when I include this argument:
na.action=na.omit
I get the same error when I also include this argument:
exactDOF="rM"
So I'm stuck. Any thoughts on how to diagnose this problem, either in the felm call or in my data frame?
It turned out to be an issue with the specific versions of R and R-Studio that I had installed. We upgraded to the latest R and R-Studio, and felm worked just fine after that.
I've gotten a similar error with felm calls in the past at various times for different reasons. For posterity, here is my list of solutions that worked at some point, in increasing order of effort required:
Restarting R/RStudio/Computer
Updating installed packages
Updating RStudio
Updating R
Removing a bad RAM module
[I updated my answer because I ran into this again.]

Resources