Making a Prediction from a qda function in r - r

I am attempting to make a QDA Model in r. My code for the Model is below, and the model works (It makes a prediction for the training data and creates a working confusion matrix.
Model3=qda(TARGET_FLAG~KIDSDRIV+PARENT1+MSTATUS+CAR_USE+TIF+CAR_TYPE
+CLM_FREQ+REVOKED+MVR_PTS+ URBANICITY +SQRT_TRAVTIME +SQRT_BLUEBOOK+SQRT_INCOME
+EDUCATION+JOB, data = train)
Model3
summary(Model3)
summary(Model3)
predmodel.train.qda = predict(Model3, data=train)
table(Predicted=predmodel.train.qda$class, TARGET_FLAG=train$TARGET_FLAG)
predmodel.test.qda = predict(Model3, newdata=modtest)
table(Predicted=predmodel.test.qda$class, TARGET_FLAG=modtest$TARGET_FLAG)
Model3=qda(TARGET_FLAG~KIDSDRIV+PARENT1+MSTATUS+CAR_USE+TIF+CAR_TYPE
+CLM_FREQ+REVOKED+MVR_PTS+ URBANICITY +SQRT_TRAVTIME +SQRT_BLUEBOOK+SQRT_INCOME
+EDUCATION+JOB, data = data)
Model3Prediction <- predict(Model3, type = "response")
data$Model3Prediction=Model3Prediction$class
confusionMatrix(data$Model3Prediction, data$TARGET_FLAG)
This produces the desired effects, but when I apply the model to the Test Data i get the following error:
"Error in $<-.data.frame(*tmp*, P_TARGET_FLAG, value = list(class = c(1L, :
replacement has 2 rows, data has 2141"
test$P_TARGET_FLAG <- predict(Model3, newdata = test, type = "response")
How do I get the model to predict the value of my test data?

I hope, you are already splitting your data in train and test -
trainset = (data)
test = Data[!trainset,]
Once you are done, Try to use below code.
Model3 <- qda(TARGET_FLAG~KIDSDRIV+PARENT1+MSTATUS+CAR_USE+TIF+CAR_TYPE +CLM_FREQ+REVOKED+MVR_PTS+ URBANICITY +SQRT_TRAVTIME +SQRT_BLUEBOOK+SQRT_INCOME +EDUCATION+JOB, data = data, subset=trainset) qda.preds <- predict(Model3 , new =test) 'cm.f <- table(test$predictor, qda.preds$class) 'cm.f

Related

Obtaining randomForest Predictions from a dataset different from the original set

The datasets where prepared as followed:
library(caret)
library(randomForest)
set.seed(2242)
inTrain <- createDataPartition(y = new_train$classe, p= 0.85, list = FALSE)
training <- new_train[inTrain,]
testing <- new_train[-inTrain, ]
Then I trained the random forest model below
model <- randomForest(classe~., data = training, importance = TRUE)
the predictions are successfully generated
testingpredictions <- predict(model, testing[ ,-55])
The validation set is set as follows:
the term chosen_column is a subset of the test set which contains only the variables used in the training of the model
validation_set <- test[ ,chosen_columns]
validation_set$new_window<- as.factor(validation_set$new_window)
PROBLEM 1
when I try predict using the validation set which is in a different file:
valpredictions <- predict(model, newdata = validation_set, type = "response")
I get the error
Error in predict.randomForest(model, newdata = validation_set, type = "response") :
Type of predictors in new data do not match that of the training data.
PROBLEM 2
When I use the train function with method = "rf" and method = "gbm"
the code runs until the program crashes
Attempted Solutions
I tried rbind to match the datasets but also receive an error:
data <- rbind(train, test)
Error in match.names(clabs, names(xi)) :
names do not match previous names
I also tried coercing all the data into the same class prior to running the models but still yields the same errors

predict function on the 'grplasso' package

I use 'grplasso' package for train and test datasets. I find the best lambda (minimum AIC) by the fitting model on train dataset. This lambda name is 'lambdaopt'.
BestTrainFit <- grplasso(Outcome ~. , data = traindata, lambda = lambdaopt, model = LogReg(), center = TRUE,standardize = TRUE)
I want to calculate performance model on the test dataset. So, Which ways below corrected?
1. The calculation 'grplasso' model again by 'lambdaopt' on the test dataset
BestTestFit <- grplasso(Outcome ~. , data = testdata, lambda = lambdaopt, model = LogReg(), center = TRUE,standardize = TRUE)
p1 = BestTestFit$fitted
Using the 'predict' function on the 'grplasso' package
p2 = predict(BestTrainFit,testdata,type = 'response')

How to stack machine learning models in R

I am new to machine learning and R.
I know that there is an R package called caretEnsemble, which could conveniently stack the models in R. However, this package looks has some problems when deals with multi-classes classification tasks.
Temporarily, I wrote some codes to try to stack the models manually and here is the example I worked on:
library(caret)
set.seed(123)
library(AppliedPredictiveModeling)
data(AlzheimerDisease)
adData = data.frame(diagnosis, predictors)
inTrain = createDataPartition(adData$diagnosis, p = 3 / 4)[[1]]
training = adData[inTrain,]
testing = adData[-inTrain,]
set.seed(62433)
modelFitRF <- train(diagnosis ~ ., data = training, method = "rf")
modelFitGBM <- train(diagnosis ~ ., data = training, method = "gbm",verbose=F)
modelFitLDA <- train(diagnosis ~ ., data = training, method = "lda")
predRF <- predict(modelFitRF,newdata=testing)
predGBM <- predict(modelFitGBM, newdata = testing)
prefLDA <- predict(modelFitLDA, newdata = testing)
confusionMatrix(predRF, testing$diagnosis)$overall[1]
#Accuracy
#0.7682927
confusionMatrix(predGBM, testing$diagnosis)$overall[1]
#Accuracy
#0.7926829
confusionMatrix(prefLDA, testing$diagnosis)$overall[1]
#Accuracy
#0.7682927
Now I've got three models: modelFitRF, modelFitGBM and modelFitLDA, and three predicted vectors corresponding to such three models based on the test set.
Then I will create a data frame to contain these predicted vectors and the original dependent variable in the test set:
predDF <- data.frame(predRF, predGBM, prefLDA, diagnosis = testing$diagnosis, stringsAsFactors = F)
And then, I just used such data frame as a new train set to create a stacked model:
modelStack <- train(diagnosis ~ ., data = predDF, method = "rf")
combPred <- predict(modelStack, predDF)
confusionMatrix(combPred, testing$diagnosis)$overall[1]
#Accuracy
#0.804878
Considering that stacking models usually should improve the accuracy of the predictions, I'de like to believe this might be a right to stack the models. However, I also doubt that here I used the predDF which is created by the predictions from three models with the test set.
I am not sure whether I should use the results from the test set and then apply them back to the test set to get final predictions?
(I am referring to this block below:)
predDF <- data.frame(predRF, predGBM, prefLDA, diagnosis = testing$diagnosis, stringsAsFactors = F)
modelStack <- train(diagnosis ~ ., data = predDF, method = "rf")
combPred <- predict(modelStack, predDF)
confusionMatrix(combPred, testing$diagnosis)$overall[1]

How do I use predict() on new data for lme4::glmer model?

I have been trying to establish predictive performance (AUC ROC) for a glmer model. When I try and use the predict() function on a test data set, the output for this function is the length of my train data set.
folds = 10;
glmerperf=rep(0,folds); glmperf=glmerperf;
TB_Train.glmer.subset <- TB_Train.glmer %>% select(one_of(subset.vars), IDNO)
TB_Train.glmer.fs <- TB_Train.glmer.subset[,c(1:7, 22)]
TB_Train.glmer.ns <- TB_Train.glmer.subset[, 8:21]
TB_Train.glmer.cns <- TB_Train.glmer.ns %>% scale(center=TRUE, scale=TRUE) %>% cbind(TB_Train.glmer.fs)
foldsamples = caret::createFolds(TB_Train.glmer.cns$Case.Status, k = folds, list = TRUE, returnTrain = FALSE)
for (n in 1:folds)
{
testdata = TB_Train.glmer.cns[foldsamples[[n]],]
traindata = TB_Train.glmer.cns[-foldsamples[[n]],]
GLMER <- lme4::glmer(Case.Status ~ . + (1 | IDNO), data = traindata, family="binomial", control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=1000000)))
glmer.probs <- predict(GLMER, newdata=testdata$Non.TB.Case, type="response")
glmer.ROC <- roc(predictor=glmer.probs, response=testdata$Case.Status, levels=rev(levels(testdata$Case.Status)))
glmerperf[n] <- glmer.ROC$auc
}
prob <- predict(GLMER, newdata=TB_Test.glmer$Non.TB.Case, type="response", re.form=~(1|IDNO))
print(sprintf('Mean AUC ROC of model on test set for GLMER %f', mean(glmerperf)))
Both the prob and glmer.probs objects are the length of the traindata object, despite specifying the newdata argument. I have noticed issues with the predict function in the past, but none as specific as this one.
Also, when the model is run, I get several errors about needing to scale my data (which I already have) and that the model fails to converge. Any ideas on how to fix this? I have already bumped up the iterations and selected a new optimizer.
Figured out that error was arising from using the "." shortcut to specify all predictors for the model.

Predict outcome in R

I have been using the predict function in R to predict a randomForests model outcomes for a testing set when it suddenly it would only return the predicted levels instead of the probabilities. I specified the type as response but it still returns factors. What possibly could cause this?
The data consists in 23 variables, 20 of which are factors (unordered) and two of which are numeric. I am trying to predict whether a product will sell or not (0 or 1). Here is the code for the prediction:
library(randomForest)
rf = randomForest(sold ~., data = train, ntree=200, nodesize=25)
prf <- predict(rf, newdata = test, type ="response")
set type="prob"
data(iris)
library(randomForest)
seed(1234)
train.key = sort(sample(1:dim(iris)[1],100))
iris.train = iris[train.key,]
iris.test = iris[-train.key,]
rf = randomForest(Species ~., data = iris.train)
predicted.prob = predict(rf,newData=iris.test,type ="prob")

Resources