predict function on the 'grplasso' package - r

I use 'grplasso' package for train and test datasets. I find the best lambda (minimum AIC) by the fitting model on train dataset. This lambda name is 'lambdaopt'.
BestTrainFit <- grplasso(Outcome ~. , data = traindata, lambda = lambdaopt, model = LogReg(), center = TRUE,standardize = TRUE)
I want to calculate performance model on the test dataset. So, Which ways below corrected?
1. The calculation 'grplasso' model again by 'lambdaopt' on the test dataset
BestTestFit <- grplasso(Outcome ~. , data = testdata, lambda = lambdaopt, model = LogReg(), center = TRUE,standardize = TRUE)
p1 = BestTestFit$fitted
Using the 'predict' function on the 'grplasso' package
p2 = predict(BestTrainFit,testdata,type = 'response')

Related

How to obtain prediction from an INLA model in R without fitting it?

Let's imagine I want to fit an INLA model with a large dataset and want to do some prediction using the fitted model. One solution is to include into the date the covariates used to predict the outcome, assigning NA values to the outcome.
In my case, I want to regularly calculate the predicted outcome based on updated covariates. Ideally, I would like to run the model once to fit it, save it, and use it later to regularly make prediction (e.g for linear model 'lm()' in R using the 'predict()' function).
I haven't found a way to do it in INLA. Below is a simple reproducible example.
library(INLA)
#simulate data
n = 100; a = 1; b = 1; tau = 100
z = rnorm(n)
eta = a + b*z
scale = exp(rnorm(n))
prec = scale*tau
y = rnorm(n, mean = eta, sd = 1/sqrt(prec))
plot(z,y)
#run INLA model
data = list(y=y, z=z)
formula = y ~ 1+z
result = inla(formula, family = "gaussian", data = data)
summary(result)
#define prediction data
data.pred$z = c(data$z,seq(2,4,length.out=100))
data.pred$y = c(data$y, y=rep(NA,100))
#run INLA model with prediction
result = inla(formula, family = "gaussian", data = data,
control.compute=list(config = TRUE))
summary(result)
#get posterior samples of the predictions
post.samples <- inla.posterior.sample(n = 10, result = result)
pred <- do.call(cbind,
lapply(post.samples,
function(X) X$latent[startsWith(rownames(X$latent), "Pred")]))

Making a Prediction from a qda function in r

I am attempting to make a QDA Model in r. My code for the Model is below, and the model works (It makes a prediction for the training data and creates a working confusion matrix.
Model3=qda(TARGET_FLAG~KIDSDRIV+PARENT1+MSTATUS+CAR_USE+TIF+CAR_TYPE
+CLM_FREQ+REVOKED+MVR_PTS+ URBANICITY +SQRT_TRAVTIME +SQRT_BLUEBOOK+SQRT_INCOME
+EDUCATION+JOB, data = train)
Model3
summary(Model3)
summary(Model3)
predmodel.train.qda = predict(Model3, data=train)
table(Predicted=predmodel.train.qda$class, TARGET_FLAG=train$TARGET_FLAG)
predmodel.test.qda = predict(Model3, newdata=modtest)
table(Predicted=predmodel.test.qda$class, TARGET_FLAG=modtest$TARGET_FLAG)
Model3=qda(TARGET_FLAG~KIDSDRIV+PARENT1+MSTATUS+CAR_USE+TIF+CAR_TYPE
+CLM_FREQ+REVOKED+MVR_PTS+ URBANICITY +SQRT_TRAVTIME +SQRT_BLUEBOOK+SQRT_INCOME
+EDUCATION+JOB, data = data)
Model3Prediction <- predict(Model3, type = "response")
data$Model3Prediction=Model3Prediction$class
confusionMatrix(data$Model3Prediction, data$TARGET_FLAG)
This produces the desired effects, but when I apply the model to the Test Data i get the following error:
"Error in $<-.data.frame(*tmp*, P_TARGET_FLAG, value = list(class = c(1L, :
replacement has 2 rows, data has 2141"
test$P_TARGET_FLAG <- predict(Model3, newdata = test, type = "response")
How do I get the model to predict the value of my test data?
I hope, you are already splitting your data in train and test -
trainset = (data)
test = Data[!trainset,]
Once you are done, Try to use below code.
Model3 <- qda(TARGET_FLAG~KIDSDRIV+PARENT1+MSTATUS+CAR_USE+TIF+CAR_TYPE +CLM_FREQ+REVOKED+MVR_PTS+ URBANICITY +SQRT_TRAVTIME +SQRT_BLUEBOOK+SQRT_INCOME +EDUCATION+JOB, data = data, subset=trainset) qda.preds <- predict(Model3 , new =test) 'cm.f <- table(test$predictor, qda.preds$class) 'cm.f

Retrain best model on full dataset in R

I have two models to select from and using some criteria I choose one of the two. (The below is just an example, I know it doesn't make much sense)
library(forecast)
set.seed(4)
sample_dat= sample(1:nrow(cars), 5)
train = cars[-sample_dat, ]
test = cars[sample_dat, ]
models = list(lm(dist ~ speed, train), glm(dist ~ speed, train, family = "poisson"))
test_res = sapply(models, function(x) accuracy(predict(x, test, type = "response"), test$dist)[2]) #Getting the RMSE for each model
best_model = models[which.min(test_res)]
How can I retrain the best model using the full dataset (train + test)? I checked the update and update.formula functions but these don't seem to be updating the data part.
update(best_model[[1]],data = rbind(train,test))
You do not want to change the formula since that is the best model but rather update the data
Base R using your own logic, first creating a list mirroring the models list:
set.seed(4)
sample_dat= sample(1:nrow(cars), 5)
train = cars[-sample_dat, ]
test = cars[sample_dat, ]
models = list(lm(dist ~ speed, train), glm(dist ~ speed, train, family = "poisson"))
model_application = list(as.expression("lm(dist ~ speed, cars)$call"),
as.expression("glm(dist ~ speed, cars, family = 'poisson'))$call"))
test_res = sapply(models,
function(x){
# Store a function to caclulate the RMSE: rmse => function
rmse <- function(actual_vec, pred_vec){sqrt(mean((pred_vec - actual_vec)**2))}
# Getting the RMSE for each model: numeric scalar => .GlobalEnv
rmse(test$dist, predict(x, data = test, type = "response"))
}
)
best_model = models[[which.min(test_res)]]
applied_model <- eval(eval(as.expression(parse(text = model_application[[which.min(test_res)]]))))

How to pass predictions from tunRanger to confusion matrix?

I am trying to predict binary outcome (class1 and class2) by tuneRanger function in r as
library(mlr)
library(tuneRanger)
task = makeClassifTask(data = train, target = "outcome")
estimateTimeTuneRanger(task)
res = tuneRanger(task, measure = list(multiclass.brier),
num.trees = 1000,num.threads = 8, iters = 70)
a<-predict(res$model, newdata = test)
My question is how to get confusion matrix after this? Predict gives me probabilities and if I use
confusionMatrix(a, test$outcome, positive = 'Class2')
I will have the error: Error: data and reference should be factors with the same levels.
Do I need to define another random forest model and use the optimal parameters from tuneRanger?
In advance thank you for your attention
I had the same problem and I used:
a<-predict(res$model**$model.learner**, newdata = test)
From there you can get a$predictions that you could use to get the confussion matrix.

Pass model formula as argument in R

I need to cross-validate several glmer models on the same data so I've made a function to do this (I'm not interested in preexisting functions for doing this). I want to pass an arbitrary glmer model to my function as the only argument. Sadly, I can't figure out how to do this, and the interwebz won't tell me.
Ideally, I would like to do something like:
model = glmer(y ~ x + (1|z), data = train_folds, family = "binomial"
model2 = glmer(y ~ x2 + (1|z), data = train_folds, family = "binomial"
And then call cross_validation_function(model) and cross_validation_function(model2). The training data within the function is called train_fold.
However, I suspect I need to pass the model formula in different way using reformulate.
Here is an example of my function. The project is about predicting autism(ASD) from behavioral features. The data variable is da.
library(pacman)
p_load(tidyverse, stringr, lmerTest, MuMIn, psych, corrgram, ModelMetrics,
caret, boot)
cross_validation_function <- function(model){
#creating folds
participants = unique(da$participant)
folds <- createFolds(participants, 10)
cross_val <- sapply(seq_along(folds), function(x) {
train_folds = filter(da, !(as.numeric(participant) %in% folds[[x]]))
predict_fold = filter(da, as.numeric(participant) %in% folds[[x]])
#model to be tested should be passed as an argument here
train_model <- model
predict_fold <- predict_fold %>%
mutate(predictions_perc = predict(train_model, predict_fold, allow.new.levels = T),
predictions_perc = inv.logit(predictions_perc),
predictions = ifelse(predictions_perc > 0.5, "ASD","control"))
conf_mat <- caret::confusionMatrix(data = predict_fold$predictions, reference = predict_fold$diagnosis, positive = "ASD")
accuracy <- conf_mat$overall[1]
sensitivity <- conf_mat$byClass[1]
specificity <- conf_mat$byClass[2]
fixed_ef <- fixef(train_model)
output <- c(accuracy, sensitivity, specificity, fixed_ef)
})
cross_df <- t(cross_val)
return(cross_df)
}
Solution developed from the comment: Using as.formula strings can be converted into a formula which can passed as arguments to my function in the following way:
cross_validation_function <- function(model_formula){
...
train_model <- glmer(model_formula, data = da, family = "binomial")
...}
formula <- as.formula( "y~ x + (1|z"))
cross_validation_function(formula)
If you aim is to extract the model formula from a fitted model, the you can use
attributes(model)$call[[2]]. Then you can use this formula when fitting model with the cv folds.
mod_formula <- attributes(model)$call[[2]]
train_model = glmer(mod_formula , data = train_data,
family = "binomial")

Resources