Let's consider data following
library(plm)
library(pglm)
data("EmplUK", package="plm")
I will add new column with 0 and 1 randomly placed. After that I want to perform logit random effects model.
df1<-EmplUK
#adding 0's and 1's
df1<-cbind(df1,'binary'=sample(0:1,1031,replace=T))
#Performing logit regression
pglm(binary~output+wage, data=df1, family=quasibinomial(link='logit'), start = NULL, model = 'random')
And the following problem occurs :
Error in maxRoutine(fn = logLik, grad = grad, hess = hess, start = start, :
argument "start" is missing, with no default
I'm not sure exactly what's the reason, I've read about this error and it seems that there are some problems when you trying to estimate 'within' model, but I get this error for every model type. Could you please give me a hand pointing out reason of this error ?
I don't think the quasibinomial family is setup in this function. Inside pglm there is a function pglm:::starting.values that looks for specific families:
"binomial"
"ordinal"
"poisson"
"negbin"
"gaussian"
"tobit"
Negative binomial allows for modelling of the variance so that may suit your needs else binomial(link='logit') works ok if there's no evidence of overdispersion.
edit: happy to be corrected on this, I haven't worked with this package before :)
Related
As stated in the title, running the same glm model with caret returns different accuracies and errors (no error OR glm.fit: fitted probabilities numerically 0 or 1 occurred OR 1: In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : prediction from a rank-deficient fit may be misleading). If I set the seed and always run it with the seed and then the model, predictably I always get the same accuracy and error (or no error) message.
When running the same model with the glm() function, coefficients are always the same (as with caret), but I never ever get any of the errors in this case. Should I just interpret this as being an issue with resample or may the errors provided by the glm of the caret package have any important meaning, if they depend on seed?
I've searched for this and though I assume it has something to do with resampling, I am not quite able to understand how it works and would like assistance in understanding this. Also, I'm trying to use the caret package for all the modelling, so I would also like some help trying to understand if I should instead start my process by always running glm() instead of through the caret package, as this will always provide me the same error message straight away no matter the seed.
Data is from a client, so I'd prefer not to share it. The formula I'm using is (example) simply train(Y ~ X + Z + A, data = df, method = "glm") for the caret version and glm(Y ~ X + Z + A, data = df, family = binomial()) in the glm() function.
I am using SuperLearner R package.
I am trying to generate predicted y values for both train and test set.
After fitting a superlearner model without defining a "newX" to get predictions on the train set first so that I can compute MSE and plot predictions vs. actual Y values, I use "predict" command to predict Y values for the test set by running the following code:
sl.cv<-SuperLearner(Y = label, X = train,
SL.library=c("SL.randomForest", "SL.glmnet", "SL.svm"),
method = "method.NNLS", verbose=TRUE, cvControl=list(V=10))
pred.sl.cv <- predict(sl.cv, newdata=test, onlySL = T)
Then, I get the following error after "predict":
"Error in object$whichScreen : $ operator is invalid for atomic vectors"
I browsed many online sources to learn how to use "predict" after fitting a SuperLearner model, and I am doing just as what others do: That is, to put the object name of the fitted SuperLearner model (in this case, "sl.cv") followed by the new test set. I didn't even type $ operator.
Why am I getting this error message? How do I solve this problem?
Another question is: Does adding cvControl=list(V=10) as an option make any change? I think the default setting for SuperLearner model is to conduct 10-fold cross-validation. So, removing "cvControl=list(V=10)" will not change anything, right?
I would appreciate your advice. Thank you!
The problem is you are using matrices for your train and/or test data. You should use a data.frame. So change your code to the following:
sl.cv<-SuperLearner(Y = label, X = as.data.frame(train),
SL.library=c("SL.randomForest", "SL.glmnet", "SL.svm"),
method = "method.NNLS", verbose=TRUE, cvControl=list(V=10))
pred.sl.cv <- predict(sl.cv, newdata=as.data.frame(test), onlySL = T)
Also, make sure your labels are a list.
I'm performing some experiments with logistic regression in R with the Auto dataset included in R.
I've get the training part (80%) and the test part (20%) normalizing each part individually.
I can create the model without any problem with the line:
mlr<-glm(mpg ~
displacement + horsepower + weight, data =train)
I can even predict train$mpg with the train set:
trainpred<-predict(mlr,train,type="response")
And with this calculate the sample error:
etab <- table(trainpred, train[,1])
insampleerror<-sum(diag(etab))/sum(etab)
The problem comes when I want predict with the test set. I use the following line:
testpred<-predict(model_rl,test,type="response")
Which gives me this warning:
'newdata' had 79 rows but variables found have 313 rows
but it doesn't work, because testpred have the same length of trainpred (should be less). When I want calculate the error in test using testpred with the following line:
etabtest <- table(testpred, test[,1])
I get the following error:
Error en table(testpred, test[, 1]) :
all arguments must have the same length
What I'm doing wrong?
I response my own question if someone have the same problem:
When I put the arguments in glm I'm saying what I want to predict, this is Auto$mpg labels with train data, hence, my glm call must be:
attach(Auto)
mlr<-glm(mpg ~
displacement + horsepower + weight, data=Auto, subset=indexes_train)
If now I call predict, table, etc there isn't any problem of structures sizes. Modifying this mistake it works for me.
As imo says:
"More importantly, you might check that this creates a logistic regression. I think it is actually OLS. You have to set the link and family arguments."
set familiy = 'binomial'
I have a mixed effect model with binomial outcome fitted with glmer. For plotting purposes I would like to predict population-level values for a small dataset.
Below is an example illustrating my approach:
silly <- glmer(Sex ~ distance +age + (1|Subject), data=Orthodont, family=binomial)
sillypred <- expand.grid(distance=c(20, 25), age=unique(Orthodont$age))
sillypred$fitted <- predict(silly, sillypred, re.form=NA, type="response")
I get the following warning message:
Warning message:
In model.frame.default(delete.response(Terms), newdata, na.action = na.action, :
variable 'Sex' is not a factor
However, when I check, it looks like it is:
str(Orthodont["Sex"])
The variable fitted is still created and the values make sense, but I'm curious about this message. Is there something I should be concerned about? Otherwise, what is the purpose of this message.
It might seem like a trivial question (after all, it all seems to work), in which case I apologize, but I want to make sure that I don't overlook something important.
This appears to be a harmless bug that will always occur when predicting from a model with a factor response (the problem is that we use the xlev argument to model.frame including the levels of the response variable). I've posted this at https://github.com/lme4/lme4/issues/205 . Thanks for the report!
You could either ignore the warning or coerce the response variable to a binary value (which will give identical results).
I'm using clmm function from ordinal package in R in order to fit cumulative mixed models to my data. It worked fine until I tried to get predicted probabilities. I can't get either SE or confidence intervals by specifying se.fit=TRUE and interval=TRUE. It looks like this:
mod1<-clmm2(response~X0+X1+X2+X3+X4+X5+X7+X0*X2*X3+X2*X3*X4+X0:X4, random=X6,
data=df,link ="logistic", threshold ="flexible",
Hess=TRUE, nAGQ=7)
As you can see there a bunch of interaction there (all important). I've tried to create a dummy dataset for my problem to be reproducible but clmm can't achieve convergence with a simpler dataset. I took the wine dataset included in the package ordinal and did some changes with the formula to mimic my own (I don't think it makes any sense though):
library(ordinal)
data(wine)
fm1 <- clmm2(rating ~ temp + contact+bottle+temp:contact:bottle+temp:contact+ temp:bottle+bottle:contact,random=judge, data=wine,link ="logistic", threshold ="flexible",
Hess=TRUE, nAGQ=7)
head(do.call("cbind", predict(fm1, se.fit=TRUE, interval=TRUE)))
And then I get this error:
Error in head(do.call("cbind", predict(fm1, se.fit = TRUE, interval = TRUE))) :
error in evaluating the argument 'x' in selecting a method for function 'head' : Erreur dans do.call("cbind", predict(fm1, se.fit = TRUE, interval = TRUE)) : second argument must be a list
My guess is that predict does'nt even compute SE and IC in a case like this. Does anybody knows why? Is there anyway to get those values?
Thanks a lot!
The predict method for clmm2 objects does not offer std-errors. See its help page. This is in keeping with the usual practice of R package authors when dealing with mixed effects models.