Missing values in the outcome of predict.glmnet - r

I've built a lasso model with glmnet function.
Then I try to use this model in the function predict with a test set of 28055 rows but the prediction output has only 25118 values. I guess it did not include NA values because the predictors in the test set have some missing values. I know that for the glm package one can deal with this with na.action = na.pass but it does not seem to exist in the glmnet package. Any suggestion ?
EDIT : my test does not have any missing values neither has my train set

Related

Can dismo::evaluate() be used for a model fit with glmnet() or cv.glmnet()?

I'm using the glmnet package to create a species distribution model (SDM) based on a lasso regression. I've succesfully fit models using glmnet::cv.glmnet(), and I can use the predict() function to generate predicted probabilities for a given lambda value by setting s = lambda.min and type = "response".
I'm creating several different kinds of SDMs and had been using dismo::evaluate() to generate fit statistics (based on a testing dataset) and thresholds to convert probabilities to binary values. However, when I run dismo::evaluate() with a cv.glmnet (or glmnet) model, I get the following error:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'as.matrix': not-yet-implemented method for <data.frame> %*%
This is confusing to me as I think the x argument in evaluate() isn't needed when I'm providing a matrix with predictor values at presence locations (p) and another matrix with values at absence locations (a). I'm wondering whether evaluate() doesn't work with these types of models? Thanks, and apologies if I've missed something obvious!
After spending more time on this, I don't think dismo::evaluate() works with glmnet objects when supplying "p" and "a" as matrices of predictor values. dismo::evaluate() converts them to data.frames before calling the predict() function. To solve my problem, I was able to create a new function based on dismo::evaluate() that supplies p or a as a matrix to the predict() function.

Unable to specify type="response" in Caret's predict function

I am trying to plot a ROC curve for my glmnet regression model. In order to do that, I am trying to predict using type = "response" in predict function:
pred_glmnet_s10_2class <- predict(model_train_glmnet_s10_2class,
newdata=testing_s10_2class,
s = "model_train_glmnet_s10_2class$finalModel$lambdaOpt",
type="response")
and I get the following error:
Error in predict.train(model_train_glmnet_s10_2class, newdata = testing_s10_2class, :
type must be either "raw" or "prob"
My predictions and class labels are binary 0 and 1 and have been factored. Any help is really appreciated. Also, any ideas on how to plot AUC (Area Under ROC curve) vs number of features? Thanks!
Assuming that model_train_glmnet_s10_2class was generated by train (showing the code would be helpful)...
Using predict(model_train_glmnet_s10_2class) is using predict.train and uses the optimal lambda values determined by train automatically. If you want the probabilities, just use type = "prob".
Your syntax is consistent with predict.glmnet and not predict.train.
As said in the documentation, it is a really bad idea to use model_train_glmnet_s10_2class$finalModel directly to do the predictions

Error with the train function in caret when using gbm with custom weights

I am trying to perform parameter tunning for gbm using train function in caret package (RStudio) using custom weights argument and receiving an error. The error is
Error in {: task 1674 failed - inputs must be factors
The original dataset consists of 1649 observations and its split into training and test set using a 60/40 split. The tuning parameters are defined using trainControl and a grid for trying different parameter values.
The column weights contains 1 or 10 for yes/no class and is a numeric vector of values in the data frame. The main function call is given below:
model <- train(train[,predictors],train[,class], method="gbm", weights=df$weights, trControl=trainControl obj, tuneGrid=Grid obj, metric="ROC")
df$weights is the vector of weights for each observation. The metric "ROC" is used due to class imbalance. The class was converted into factor containing yes, no values prior to running the model. Also if I do not mentioned the weights argument then the function works fine.
I will apprecite if someone can shed some light on how to overcome this error or if someone has had the same trouble and how they rectified it.
Thanks.

using predict() and table() in r

I have used glm on the learning data set which without NAs has 49511 observations.
glmodel<-glm(RESULT ~ ., family=binomial,data=learnfram)
Using that glm, I tried to predict the probability for the test data set which has 49943 without NAs. My resulting prediction has only 49511 elements.
predct<-predict(glmodel, type="response", data=testfram)
Why is it that the result of predict is not for 49511 elements?
I want to look for false positives and false negatives. I used table, but it is throwing error:
table(testfram$RESULT, predct>0.02)
## Error in table(testfram$RESULT, predct> 0.02) :
## all arguments must have the same length
How can I get my desired result?
You used the wrong parameter name in predict. It should be newdata=, not data=. So the reason you get 49511 elements is that the default for predict when you don't specify new data is to output the predicted values for the data you created the model with. Hence you're getting the predicted values for your original data.

Getting predicted classes from R glmnet object

I am trying to build simple multi-class logistic regression models using glmnet in R. However when I try to predict the test data and obtain contingency table I get an error. A sample session is reproduced below.
> mat = matrix(1:100,nrow=10)
> test = matrix(1:50,nrow=5)
> classes <- as.factor(11:20)
> model <- glmnet(mat, classes, family="multinomial", alpha=1)
> pred <- predict(model, test)
> table(pred, as.factor(11:15))
Error in table(pred, as.factor(11:15)) :
all arguments must have the same length
Any help will be appreciated. R noob here.
Thanks.
The predict method for a glmnet object requires that you specify a value for the argument s, which indicates which values of the regularization parameter for which you want predictions.
(glmnet fits the model for several values of this regularization parameter simultaneously.)
So if you don't specify a value for s, predict.glmnet returns predictions for all the values. If you want just a single set of predictions, you need to either set a value for s when you call predict, or you need to extract the relevant column after the fact.

Resources