Predict outcome in R - r

I have been using the predict function in R to predict a randomForests model outcomes for a testing set when it suddenly it would only return the predicted levels instead of the probabilities. I specified the type as response but it still returns factors. What possibly could cause this?
The data consists in 23 variables, 20 of which are factors (unordered) and two of which are numeric. I am trying to predict whether a product will sell or not (0 or 1). Here is the code for the prediction:
library(randomForest)
rf = randomForest(sold ~., data = train, ntree=200, nodesize=25)
prf <- predict(rf, newdata = test, type ="response")

set type="prob"
data(iris)
library(randomForest)
seed(1234)
train.key = sort(sample(1:dim(iris)[1],100))
iris.train = iris[train.key,]
iris.test = iris[-train.key,]
rf = randomForest(Species ~., data = iris.train)
predicted.prob = predict(rf,newData=iris.test,type ="prob")

Related

Random Forest prediction error in R "No forest component in the object"

I am attempting to use a random forest regressor to classify a raster stack, but an error does not allow a prediction of "area_pct", have I not trained the model properly?
d100 is my dataset with predictor variables d100[,4:ncol(d100)] and prediction variable d100["area_pct"].
#change na values to zero
d100[is.na(d100)] <- 0
set.seed(100)
#split dataset into training (70%) and testing (30%)
id<- sample(2,nrow(d100), replace = TRUE, prob = c(0.7,0.3))
train_100<- d100[id==1,]
test_100 <- d100[id==2,]
train random forest model with randomForest package, this appears to work fine
final_CC_rf_20 = randomForest(x=train[,4:ncol(train)], y= train$area_pct,
xtest=test[,4:ncol(test)], ytest=test$area_pct, mtry=14, importance=TRUE, ntree = 600)
Then I try to predict a raster.
New raster stack with predictor variables
sentinel_2_20 <- stack( paste(getwd(), "Sentinel_SR_clip_20.tif", sep="/") )
area_classified_20_2018 <- predict(object = final_CC_rf_20 , newdata = sentinel_2_20,type = 'response', progress = 'window')
but error pops up:
#Error in predict.randomForest(object = final_CC_rf_20, newdata = sentinel_2_20, :
# No forest component in the object
any help would be extremely useful
The arguments you are using for predict (with raster data) are not correct. The first argument, object, should be the raster data, the second argument, model, should be the fitted model. There is no argument newdata.
Another problem is that you use keep.forest=FALSE which is the default when xtest is not NULL. You could set keep.forest=TRUE but that is not a good approach, generally, as you should fit your model with all data before you make a prediction (you are no longer evaluating your model). Thus, I would suggest fitting your model without xtest, like this
rfmod <- randomForest(x=d100[,4:ncol(train)], y=d100$area_pct,
mtry=14, importance=TRUE, ntree = 600)
And then do
p <- predict(sentinel_2_20, rfmod, type='response')
See ?raster::predict or ?terra::predict for working examples

How do I use predict() on new data for lme4::glmer model?

I have been trying to establish predictive performance (AUC ROC) for a glmer model. When I try and use the predict() function on a test data set, the output for this function is the length of my train data set.
folds = 10;
glmerperf=rep(0,folds); glmperf=glmerperf;
TB_Train.glmer.subset <- TB_Train.glmer %>% select(one_of(subset.vars), IDNO)
TB_Train.glmer.fs <- TB_Train.glmer.subset[,c(1:7, 22)]
TB_Train.glmer.ns <- TB_Train.glmer.subset[, 8:21]
TB_Train.glmer.cns <- TB_Train.glmer.ns %>% scale(center=TRUE, scale=TRUE) %>% cbind(TB_Train.glmer.fs)
foldsamples = caret::createFolds(TB_Train.glmer.cns$Case.Status, k = folds, list = TRUE, returnTrain = FALSE)
for (n in 1:folds)
{
testdata = TB_Train.glmer.cns[foldsamples[[n]],]
traindata = TB_Train.glmer.cns[-foldsamples[[n]],]
GLMER <- lme4::glmer(Case.Status ~ . + (1 | IDNO), data = traindata, family="binomial", control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=1000000)))
glmer.probs <- predict(GLMER, newdata=testdata$Non.TB.Case, type="response")
glmer.ROC <- roc(predictor=glmer.probs, response=testdata$Case.Status, levels=rev(levels(testdata$Case.Status)))
glmerperf[n] <- glmer.ROC$auc
}
prob <- predict(GLMER, newdata=TB_Test.glmer$Non.TB.Case, type="response", re.form=~(1|IDNO))
print(sprintf('Mean AUC ROC of model on test set for GLMER %f', mean(glmerperf)))
Both the prob and glmer.probs objects are the length of the traindata object, despite specifying the newdata argument. I have noticed issues with the predict function in the past, but none as specific as this one.
Also, when the model is run, I get several errors about needing to scale my data (which I already have) and that the model fails to converge. Any ideas on how to fix this? I have already bumped up the iterations and selected a new optimizer.
Figured out that error was arising from using the "." shortcut to specify all predictors for the model.

How to make a new prediction using rfcv in R

I have used the RandomForest (RF) Package in R for making RF cross validation for proteins data using "rfcv" function.
How can I make a predict for new protein data using object I had from rfcv?
rfvc will cross validate the model against some data.
In order to predict some values for other data you need to use the predict function.
Given a forest, rf and some new data newdata call
predict(rf, newdata)
The detailed docs give this as an example:
data(iris)
set.seed(111)
ind <- sample(2, nrow(iris), replace = TRUE, prob=c(0.8, 0.2))
iris.rf <- randomForest(Species ~ ., data=iris[ind == 1,])
iris.pred <- predict(iris.rf, iris[ind == 2,])
table(observed = iris[ind==2, "Species"], predicted = iris.pred)
## Get prediction for all trees.

How to train non-binary classification rpart with F1 as metric instead of accuracy?

I am using caret for my non-binary (three classes) decision tree classification. My dataset is skewed so I want to use F1 instead of accuracy for my training and testing. How do I set this?
For an MWE lets predict the cut in the diamonds dataset:
library(ggplot2)
library(caret)
inTrain <- createDataPartition(diamonds$cut, p=0.75, list=FALSE)
training <- diamonds[inTrain,]
testing <- diamonds[-inTrain,]
fitModel <- train(cut ~ ., training, method = "rpart")
How to use F1 here?
The page at http://topepo.github.io/caret/training.html details how to create a new metric for the train function -
You need to create a new function with three parameters -
data - "is a reference for a data frame or matrix with columns called obs and pred for the observed and predicted outcome values (either numeric data for regression or character values for classification)"
lev - "is a character string that has the outcome factor levels taken from the training data. For regression, a value of NULL is passed into the function."
name - "is a character string for the model being used"
The function should calculate the F-score for the observed labels and predicted labels in the data object, and name the result based on the metric -
for example a function calculating accuracy
summaryStats <- function (data, lev = NULL, model = NULL) {
cor <- sum(data$pred==data$obs)
incor <- sum(data$pred!=data$obs)
out <- cor/(cor + incor)
names(out) <- c("acc")
out
}
Then create a new trainControl object and train your model --
fitControl <- trainControl(summaryFunction = summaryStats)
fitModel <- train(cut ~ ., training, trControl = fitControl, metric = "acc", maximize=TRUE)

predicting outcome with a model in R

I am trying to do a simple prediction, using linear regression
I have a data.frame where some of the items are missing price (and therefor noted NA).
This apperantely doesn't work:
#Simple LR
fit <- lm(Price ~ Par1 + Par2 + Par3, data=combi[!is.na(combi$Price),])
Prediction <- predict(fit, data=combi[is.na(combi$Price),]), OOB=TRUE, type = "response")
What should I put instead of data=combi[is.na(combi$Price),]) ?
Change data to newdata. Look at ?predict.lm to see what arguments predict can take. Additional arguments are ignored. So in your case data (and OOB) is ignored and the default is to return predictions on the training data.
Prediction <- predict(fit, newdata = combi[is.na(combi$Price),])
identical(predict(fit), predict(fit, data = combi[is.na(combi$Price),]))
## [1] TRUE

Resources