How to pass predictions from tunRanger to confusion matrix? - r

I am trying to predict binary outcome (class1 and class2) by tuneRanger function in r as
library(mlr)
library(tuneRanger)
task = makeClassifTask(data = train, target = "outcome")
estimateTimeTuneRanger(task)
res = tuneRanger(task, measure = list(multiclass.brier),
num.trees = 1000,num.threads = 8, iters = 70)
a<-predict(res$model, newdata = test)
My question is how to get confusion matrix after this? Predict gives me probabilities and if I use
confusionMatrix(a, test$outcome, positive = 'Class2')
I will have the error: Error: data and reference should be factors with the same levels.
Do I need to define another random forest model and use the optimal parameters from tuneRanger?
In advance thank you for your attention

I had the same problem and I used:
a<-predict(res$model**$model.learner**, newdata = test)
From there you can get a$predictions that you could use to get the confussion matrix.

Related

How to create confusion matrix for upsampled ML model

I am using RStudio, the caret package, to create predictive models. I want to create a confusion matrix, but I do not know how to access observed values after the resampling has been performed.
I have an imbalanced dataset so I've used upsampling with the following code:
library(caret)
control <- trainControl(method = "LGOCV", number = 1000)
control$sampling = "up"
# I create my predictive model with random forest:
metric = "Accuracy"
set.seed(123)
fit.rand = train(Diet~., data = year3data, method = "ranger", metric = metric, trControl = control)
Now I want to find weighted accuracy using a confusion matrix, but all the code I know of requires input of 'true' values and predicted values. I do not know how to access the true observation values from the upsampled dataset, and I know they won't be the same as those in the original dataset. Below is an example of the code I'd like to use:
confusionMatrix(data = fit.rand$pred, reference = fit.rand$obs, mode = "prec_recall")
The object fit.rand does have fit.rand$pred, but it does not have fit.rand$obs. I would like to know how to access the observations (post-resampling) that were used to create fit.rand please. Thank you!
TLDR -> code and the problem below
library(caret)
control <- trainControl(method = "LGOCV", number = 1000)
control$sampling = "up"
metric = "Accuracy"
set.seed(123)
fit.rand = train(Diet~., data = year3data, method = "ranger", metric = metric, trControl = control)
confusionMatrix(data = fit.rand$pred, reference = fit.rand$obs, mode = "prec_recall")
confusionMatrix is the part of the code that I am having a problem with because fit.rand$obs does not exist. I would like to know how to access the observation values used to create fit.rand, because the resampling process has changed them from the original values in my 'year3data' dataframe.

predict function with lasso regression

I am trying to implement lasso regression for my sales prediction problem. I am using glmnet package and cv.glmnet function to train the model.
library(glmnet)
set.seed(123)
model = cv.glmnet(as.matrix(x = train[, -which(names(train) %in% "Sales")]),
y = train$Sales,
alpha = 1,
lambda = 10^seq(4,-1,-0.1))
best_lambda = model$lambda.min
lasso_predictions_valid <- predict(model,s = best_lambda,type = "coefficients")
After I read few articles about implementing lasso regression I still don't know how to add my test data on which I want to apply the prediction. There is newx argument to be added to predict function that I do not know also. I mean in most regression types we have newdata or data argument that we fill our test data to it.
I think there is an error in your lasso_predictions_valid, you shouldn't put valid$sales as your newx, as I believe this is the actual sales number.
Once you have created the model with the train set, then for newx you need to pass matrix values of x that you want to make predictions on, I guess in this case it will be your validation set.
Looking at your example code above, I think your predict line should be something like:
lasso_predictions_valid <- predict(model, s = best_lambda,
newx = as.matrix(valid[, -which(names(valid) %in% "Sales")]),
type = "coefficients")
Then you should run your RMSE() line:
RMSE(lasso_predictions_valid, valid$Sales)

How to get the predicted class instead of class probabilities?

I have trained a random forest using caret package for predicting a binary classification task.
library(caret)
set.seed(78)
inTrain <- createDataPartition(disambdata$Response, p=3/4, list = FALSE)
trainSet <- disambdata[inTrain,]
testSet <- disambdata[-inTrain,]
ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10)
grid_rf <- expand.grid(.mtry = c(3,5,7,9))
set.seed(78)
m_rf <- train(Response ~ ., data=trainSet,
method= "rf", metric = "Kappa", trcontrol=ctrl, tuneGrid = grid_rf)
The Response variable contains values {Valid, Invalid}.
Using the following I get the class probabilities for the testing data:
pred <- predict.train(m_rf, newdata = testSet,
type="prob", models=m_rf$finalModel)
However I am interested in obtaining the predicted class i.e. Valid or Invalid instead of class probabilities to generate a confusion matrix.
I have already tried the argument type="raw" in the predict.train function but it returns a list of NAs.
By assigning type = "prob" in predict() function, you are specifically asking for probabilities. just remove it & it will provide labels
pred <- predict.train(m_rf, newdata = testSet,models=m_rf$finalModel)
It seems that the caret package (caret_6.0-70) still has issue with the formula interface. Expanding the formula from Response ~ . to the one that explicitly mentions all predictors like this Response ~ MaxLikelihood + n1 + n2 + count resolves the problem and predict.train(m_rf, newdata=testSet) returns the predicted class.

Calculate Prediction Intervals of a predicted value using Caret package of R

I used different neural network packages within Caret package for my predictions. Code used with nnet package is
library(caret)
# training model using nnet method
data <- na.omit(data)
xtrain <- data[,c("temperature","prevday1","prevday2","prev_instant1","prev_instant2","prev_2_hour")]
ytrain <- data$power
train_model <- train(x = xtrain, y = ytrain, method = "nnet", linout=TRUE, na.action = na.exclude,trace=FALSE)
# prediction using training model created
pred_ob <- predict(train_model, newdata=dframe,type="raw")
The predict function simply calculates the prediction value. But, I also need prediction intervals (2-sigma) as well. On searching, I found a relevant answer at stackoverflow link, but this does not result as needed. The solution suggests to use finalModelvariable as
predict(train_model$finalModel, newdata=dframe, interval = "confidence",type=raw)
Is there any other way to calculate prediction intervals? The training data used is the dput() of my previous question at stackoverflow link and the dput() of my prediction dataframe (test data) is
dframe <- structure(list(temperature = 27, prevday1 = 1607.69296666667,
prevday2 = 1766.18103333333, prev_instant1 = 1717.19306666667,
prev_instant2 = 1577.168915, prev_2_hour = 1370.14983583333), .Names = c("temperature",
"prevday1", "prevday2", "prev_instant1", "prev_instant2", "prev_2_hour"
), class = "data.frame", row.names = c(NA, -1L))
****************************UPDATE***********************
I used nnetpredintpackage as suggested at link. To my surprise it results in an error, which I find difficult to debug. Here is my updated code till now,
library(nnetpredint)
nnetPredInt(train_model, xTrain = xtrain, yTrain = ytrain,newData = dframe)
It results in the following error:
Error: Number of observations for xTrain, yTrain, yFit are not the same
[1] 0
I can check that xtrain, ytrain and dframe are with correct dimensions, but I do not have any idea about yFit. I don't need this according to the examples of nnetpredintvignette
caret doesn't generate prediction intervals; that relies on the individual package. If that package cannot do this, then neither can the train objects. I agree that nnetPredInt is the appropriate way to go.
Two other notes:
you most likely should center and scale your data if you have not already.
using the finalModel object is somewhat dangerous since it has no idea what was done to the data (e.g. dummy variables, centering and scale or other preprocessing methods, etc) before it was created.
Max
Thanks for your question. And a simple answer to your problem is: Right now the nnetPredInt function only support the following S3 object, "nnet", "nn" and "rsnns", produced by different neural network packages. And the train function in caret package return an "train" object. That's why the function nnetPredInt doesn't get the yFit vectors, which is the fitted.value of the training datasets, from your train_model.
1.Quick way to use the model from caret package:
Get the finalModel result from the 'train' object:
nnetObj = train_model$finalModel # return the 'nnet' model which the caret package has found.
yPredInt = nnetPredInt(nnetObj, xTrain = xtrain, yTrain = ytrain,newData = dframe)
For Example, Use the Iris Dataset and the 'nnet' method from caret package for regression prediction.
library(caret)
library(nnetpredint)
# Setosa 0 and Versicolor 1
ird <- data.frame(rbind(iris3[,,1], iris3[,,2]), species = c(rep(0, 50), rep(1, 50)))
samp = sample(1:100, 80)
xtrain = ird[samp,][1:4]
ytrain = ird[samp,]$species
# Training
train_model <- train(x = xtrain, y = ytrain, method = "nnet", linout = FALSE, na.action = na.exclude,trace=FALSE)
class(train_model) # [1] "train"
nnetObj = train_model$finalModel
class(nnetObj) # [1] "nnet.formula" "nnet"
# Constructing Prediction Interval
xtest = ird[-samp,][1:4]
ytest = ird[-samp,]$species
yPredInt = nnetPredInt(nnetObj, xTrain = xtrain, yTrain = ytrain,newData = xtest)
# Compare Results: ytest and yPredInt
ytest
yPredInt
2.The Hard Way
Use the generic nnetPredInt function to pass all the neural net specific parameters to the function:
nnetPredInt(object = NULL, xTrain, yTrain, yFit, node, wts, newData,alpha = 0.05 , lambda = 0.5, funName = 'sigmoid', ...)
xTrain # Training Dataset
yTrain # Training Target Value
yFit # Fitted Value of the training data
node # Structure of your network, like c(4,5,5,1)
wts # Specific order of weights parameters found by your neural network
newData # New Data for prediction
Tips:
Right now nnetpredint package only support the standard multilayer neural network regression with activated output, not the linear output,
And it will support more type of models soon in the future.
You can use the nnetPredInt function {package:nnetpredint}. Check out the function's help page here
If you are open to writing your own implementation there is another option. You can get prediction intervals from a trained net using the same implementation you would write for standard non-linear regression (assuming back propagation was used to do the estimation).
This paper goes through the methodology and is fairly straight foward: http://www.cis.upenn.edu/~ungar/Datamining/Publications/yale.pdf.
There are, as with everything,some cons (outlined in the paper) to this approach but definitely worth knowing as an option.

How can I run an ordered logistic regression without the function ignoring the weights?

Suppose I have this dataset:
require(rms)
newdata <- data.frame(eduattain = rep(c(1,2,3), times=2), dadedu=rep(c(1,2,3),each=2),
random=rnorm(6, mean(1000),sd=50))
I transform both the dependent and independent variables to factors
newdata$eduattain <- factor(newdata$eduattain, levels = 1:3, labels = c("L1","L2","L3"),
ordered = T)
newdata$dadedu <- factor(newdata$dadedu, levels = 1:3, labels = c("L1","L2","L3"))
and conduct a simple ordinal logistic regression with weights:
model1 <- lrm(eduattain ~ dadedu, data=newdata, weights = random, normwt = T)
Warning message:
In lrm(eduattain ~ dadedu, data = newdata, weights = random, normwt = T) :
currently weights are ignored in model validation and bootstrapping lrm fits
I have reasons to believe that if the weights were being used the results would be quite different.
How can I fix it? Most questions that tackle this warning don't give proper answers to this specific warning.(here, here, here)
Someone would need to modify the code for validate.lrm and predab.resample in the rms package. The code is on github at https://github.com/harrelfe/rms

Resources