Extract the Intercept from a Caret LASSO Model - r

I'm using the caret package in R to fit a LASSO regression model. My code runs fine, however I would like to extract the Intercept for the final model so I can build a scoring key using the selected predictors and coefficients.
For example, if "Extraversion" is the variable I am trying to model using survey items, I would like to produce the following scoring key:
Intercept + Survey_Item_1*Slope + Survey_Item_2*Slope + and so on
FWIW, I am able to extract the coefficients for the predictors.
My code for reference:
##Create Training & test set
set.seed(9808)
ind <- sample(0:1, nrow(df), replace=T, prob=c(.75,.25))
train <- df[ind==0,]
test <- df[ind==1,]
ctrl <- trainControl(method = "repeatedcv", number=5, repeats = 5)
##Train Lasso model
fit.lasso <- train(Extraversion ~., , data=train, method="lasso", preProc=c('scale','center','nzv'), trControl=ctrl)
fit.lasso
predict.enet(fit.lasso$finalModel, type='coefficients', s=fit.lasso$bestTune$fraction, mode='fraction')
##Fit models to test data
lasso_test<- predict(fit.lasso, newdata=test, na.action="na.pass")
postResample(pred = lasso_test, obs = test[,c(1)])

Related

DALEX for classification problems

I built a logistic regression model with penalty with caret and then i try to create an object through DALEX::explain to subsequently analyze the various aspects of the model.
Perhaps the problem lies in having a binary classification model.
Here my reproducible code
library(DALEX)
library(modelStudio)
set.seed(10)
data<- as.data.frame(mtcars)
data$vs<- as.factor(data$vs)
set.seed(10)
trc<- trainControl(method = "repeatedcv", number=3,repeats=4, classProbs=FALSE)
library(caret)
model <- caret::train(vs~., data=data, trControl= trc, family="binomial", method = "regLogistic")
explainer<- DALEX::explain(
model = model,
data= as.data.frame(data[, -which(colnames(data) %in% "vs")]),
y = as.numeric(as.character(data$vs)),
predict_function = predict,
label = "regLogistic")
modelStudio::modelStudio(explainer)

Can I use random forest to do feature selection caret

I am using the caret R package do model training, I am totally new to machine learning.
I don't know if I can use the idea shows below to do feature selection and train a model
My code is shown as below:
The idea is that first I will run random forest in train function, then I will select the top20 important features(based on varImp function),and re-train my data based on these top20 features.
I am not sure if this method works?
1.first I will train a random forest model with all features
ctrl <- trainControl(method="cv",
number = 5,
summaryFunction=twoClassSummary,
classProbs=T,
savePredictions = T)
##################################### train model
set.seed(1234)
model <-
train(Bgroup ~ .,
data=all_data,
method="rf", preProc=c("center", "scale","nzv"),
trControl=ctrl,
tuneLength = 10,
metric = "ROC"
)
##################################### based on this model, I can get a plot of feature importance
feature_importance <- varImp(model)
##################################### I only selected top20 important features
importance_df <- data.frame(feature_importance$importance,feature = rownames(feature_importance$importance))
top20 <- head(importance_df[order(importance_df[,1],decreasing = T),],n=20) %>%
.$feature %>%
gsub("`", '', .)
##################################### top20 data selection
data_top20 <- all_data[,top20]
data_top20$group <- all_data$Bgroup
##################################### re-train model again based on these 20 features
set.seed(1234)
model_top20 <- train(group ~ .,
data=data_top20,
method="rf", preProc=c("center", "scale","nzv"),
trControl=ctrl,
tuneLength = 10,
metric = "ROC"
)
### calculate performance
a <- filter(data_top20$pred, mtry ==4)
confusionMatrix(a$pred,a$obs,positive = "positive")
Is the idea okay? I am very appreciated if anyone can give me some suggestions

Cross-validation on comparing linear models

I have 2 linear models to compare: One is reduced model and the other one is full model. I have already done F-Test on comparing 2 linear models. But I do not know how to do this using 5-fold cross-validation comparing 2 models. The following code is what I have in R.
library(carData) #make sure this package is installed!
data("Anscombe") #NOTE: lowercase was the data from before, uppercase is this dataset (annoying)
spending = subset(Anscombe,! rownames(Anscombe) %in% c("HI","AK","DC"),c(1,2,4)) ##Data file.
mod_full = lm(education ~ urban + income, data=spending) ##Full model.
mod_reduced = lm(education ~ income, data = spending) ## Reduced model.
FTest = anova(mod_reduced, mod_full)
print(FTest) ##Print out the F-test result.
You can use the following code for comparing 2 models using 5-fold cross-validation
library(caret)
library(carData)
data("Anscombe")
spending = subset(Anscombe,! rownames(Anscombe) %in% c("HI","AK","DC"),c(1,2,4)) ##Data file.
#Setting for 5-fold cross-validation
trainControl <- trainControl(method="cv", number=5,
savePredictions=TRUE, classProbs=F)
#Full model
set.seed(7) #To have reproducible results
fit.full <- train(education ~ urban + income, data=spending, method="lm", metric="RMSE",
preProc=c("center", "scale"),
trControl=trainControl)
#Reduced model
set.seed(7)
fit.reduced <- train(education ~ income, data=spending, method="lm", metric="RMSE",
preProc=c("center", "scale"),
trControl=trainControl)
#For comparing 2 models
results <- resamples(list(Full=fit.full,Reduced=fit.reduced))
summary(results)
dotplot(results,scale="free")
# correlation between results
modelCor(results)
splom(results)
#Difference in model predictions
diffs <- diff(results)
#Summarize p-values for pair-wise comparisons
summary(diffs)

Get predictions from each tree in a random forest model in R (using train function in training and predict in predicting)

I am using a train function in training a Random Forest model:
fitControl = trainControl(method="oob")
tuneGrid = expand.grid(.mtry=c(15))
rfmod = train(p ~
x +
y +
z,
method="rf",
data=train_df,
tuneGrid=tuneGrid,
trControl=fitControl,
importance=TRUE,
allowParallel=TRUE)
This is a simplified code showing the structure of my model and I am using the training data train_df.
And I want to have the prediction from each tree. I searched somehow and tried this:
preds <- predict(rfmod, newdata = test_df[1], predict.all = TRUE)
I just used the first row of my test data test_df. After this, when I check preds it just still gives me only one prediction value, while I want the prediction from all the trees.
How can I do this to get all the predictions from all the trees?
Thanks in advance!!

Cross validation for linear models in R

I am trying to do cross validation of a linear model in R using cv.lm. I have tried capturing the output from cv.lm in a separate variable using something like:
cvOutput <- cv.lm(.....)
However, I cannot extract the predicted values from every fold as cvOutput seems to have no information about folds. Is there any way of extracting this?
Try this out. (I used Caravan dataset from MASS package for example)
First your partition the data
df <- Caravan
inTrain <- createDataPartition(df$Purchase,
p =0.8,
list =F)
training <- df[ inTrain,]
testing <- df[-inTrain,]
Then you choose the method
fitControl <- trainControl(method = "cv", number = 10)
Then you can have your cross validated model
fit <- train(Purchase ~ .,
data = training,
method = "lm",
trControl = fitControl)

Resources