Getting predicted classes from R glmnet object - r

I am trying to build simple multi-class logistic regression models using glmnet in R. However when I try to predict the test data and obtain contingency table I get an error. A sample session is reproduced below.
> mat = matrix(1:100,nrow=10)
> test = matrix(1:50,nrow=5)
> classes <- as.factor(11:20)
> model <- glmnet(mat, classes, family="multinomial", alpha=1)
> pred <- predict(model, test)
> table(pred, as.factor(11:15))
Error in table(pred, as.factor(11:15)) :
all arguments must have the same length
Any help will be appreciated. R noob here.
Thanks.

The predict method for a glmnet object requires that you specify a value for the argument s, which indicates which values of the regularization parameter for which you want predictions.
(glmnet fits the model for several values of this regularization parameter simultaneously.)
So if you don't specify a value for s, predict.glmnet returns predictions for all the values. If you want just a single set of predictions, you need to either set a value for s when you call predict, or you need to extract the relevant column after the fact.

Related

Can dismo::evaluate() be used for a model fit with glmnet() or cv.glmnet()?

I'm using the glmnet package to create a species distribution model (SDM) based on a lasso regression. I've succesfully fit models using glmnet::cv.glmnet(), and I can use the predict() function to generate predicted probabilities for a given lambda value by setting s = lambda.min and type = "response".
I'm creating several different kinds of SDMs and had been using dismo::evaluate() to generate fit statistics (based on a testing dataset) and thresholds to convert probabilities to binary values. However, when I run dismo::evaluate() with a cv.glmnet (or glmnet) model, I get the following error:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'as.matrix': not-yet-implemented method for <data.frame> %*%
This is confusing to me as I think the x argument in evaluate() isn't needed when I'm providing a matrix with predictor values at presence locations (p) and another matrix with values at absence locations (a). I'm wondering whether evaluate() doesn't work with these types of models? Thanks, and apologies if I've missed something obvious!
After spending more time on this, I don't think dismo::evaluate() works with glmnet objects when supplying "p" and "a" as matrices of predictor values. dismo::evaluate() converts them to data.frames before calling the predict() function. To solve my problem, I was able to create a new function based on dismo::evaluate() that supplies p or a as a matrix to the predict() function.

SuperLearner Predict error

I am using SuperLearner R package.
I am trying to generate predicted y values for both train and test set.
After fitting a superlearner model without defining a "newX" to get predictions on the train set first so that I can compute MSE and plot predictions vs. actual Y values, I use "predict" command to predict Y values for the test set by running the following code:
sl.cv<-SuperLearner(Y = label, X = train,
SL.library=c("SL.randomForest", "SL.glmnet", "SL.svm"),
method = "method.NNLS", verbose=TRUE, cvControl=list(V=10))
pred.sl.cv <- predict(sl.cv, newdata=test, onlySL = T)
Then, I get the following error after "predict":
"Error in object$whichScreen : $ operator is invalid for atomic vectors"
I browsed many online sources to learn how to use "predict" after fitting a SuperLearner model, and I am doing just as what others do: That is, to put the object name of the fitted SuperLearner model (in this case, "sl.cv") followed by the new test set. I didn't even type $ operator.
Why am I getting this error message? How do I solve this problem?
Another question is: Does adding cvControl=list(V=10) as an option make any change? I think the default setting for SuperLearner model is to conduct 10-fold cross-validation. So, removing "cvControl=list(V=10)" will not change anything, right?
I would appreciate your advice. Thank you!
The problem is you are using matrices for your train and/or test data. You should use a data.frame. So change your code to the following:
sl.cv<-SuperLearner(Y = label, X = as.data.frame(train),
SL.library=c("SL.randomForest", "SL.glmnet", "SL.svm"),
method = "method.NNLS", verbose=TRUE, cvControl=list(V=10))
pred.sl.cv <- predict(sl.cv, newdata=as.data.frame(test), onlySL = T)
Also, make sure your labels are a list.

Predicting with plm function in R

I was wondering if it is possible to predict with the plm function from the plm package in R for a new dataset of predicting variables. I have create a model object using:
model <- plm(formula, data, index, model = 'pooling')
Now I'm hoping to predict a dependent variable from a new dataset which has not been used in the estimation of the model. I can do it through using the coefficients from the model object like this:
col_idx <- c(...)
df <- cbind(rep(1, nrow(df)), df[(1:ncol(df))[-col_idx]])
fitted_values <- as.matrix(df) %*% as.matrix(model_object$coefficients)
Such that I first define index columns used in the model and dropped columns due to collinearity in col_idx and subsequently construct a matrix of data which needs to be multiplied by the coefficients from the model. However, I can see errors occuring much easier with the manual dropping of columns.
A function designed to do this would make the code a lot more readable I guess. I have also found the pmodel.response() function but I can only get this to work for the dataset which has been used in predicting the actual model object.
Any help would be appreciated!
I wrote a function (predict.out.plm) to do out of sample predictions after estimating First Differences or Fixed Effects models with plm.
The function is posted here:
https://stackoverflow.com/a/44185441/2409896

Unable to specify type="response" in Caret's predict function

I am trying to plot a ROC curve for my glmnet regression model. In order to do that, I am trying to predict using type = "response" in predict function:
pred_glmnet_s10_2class <- predict(model_train_glmnet_s10_2class,
newdata=testing_s10_2class,
s = "model_train_glmnet_s10_2class$finalModel$lambdaOpt",
type="response")
and I get the following error:
Error in predict.train(model_train_glmnet_s10_2class, newdata = testing_s10_2class, :
type must be either "raw" or "prob"
My predictions and class labels are binary 0 and 1 and have been factored. Any help is really appreciated. Also, any ideas on how to plot AUC (Area Under ROC curve) vs number of features? Thanks!
Assuming that model_train_glmnet_s10_2class was generated by train (showing the code would be helpful)...
Using predict(model_train_glmnet_s10_2class) is using predict.train and uses the optimal lambda values determined by train automatically. If you want the probabilities, just use type = "prob".
Your syntax is consistent with predict.glmnet and not predict.train.
As said in the documentation, it is a really bad idea to use model_train_glmnet_s10_2class$finalModel directly to do the predictions

How to fit a model I built to another data set and get residuals?

I fitted a mixed model to Data A as follows:
model <- lme(Y~1+X1+X2+X3, random=~1|Class, method="ML", data=A)
Next, I want to see how the model fits Data B and also get the estimated residuals. Is there a function in R that I can use to do so?
(I tried the following method but got all new coefficients.)
model <- lme(Y~1+X1+X2+X3, random=~1|Class, method="ML", data=B)
The reason you are getting new coefficients in your second attempt with data=B is that the function lme returns a model fitted to your data set using the formula you provide, and stores that model in the variable model as you have selected.
To get more information about a model you can type summary(model_name). the nlme library includes a method called predict.lme which allows you to make predictions based on a fitted model. You can type predict(my_model) to get the predictions using the original data set, or type predict(my_model, some_other_data) as mentioned above to generate predictions using that model but with a different data set.
In your case to get the residuals you just need to subtract the predicted values from observed values. So use predict(my_model,some_other_data) - some_other_data$dependent_var, or in your case predict(model,B) - B$Y.
You model:
model <- lme(Y~1+X1+X2+X3, random=~1|Class, method="ML", data=A)
2 predictions based on your model:
pred1=predict(model,newdata=A,type='response')
pred2=predict(model,newdata=B,type='response')
missed: A function that calculates the percent of false positives, with cut-off set to 0.5.
(predicted true but in reality those observations were not positive)
missed = function(values,prediction){sum(((prediction > 0.5)*1) !=
values)/length(values)}
missed(A,pred1)
missed(B,pred2)

Resources