Error in eval(predvars, data, env) : object 'Customer Count' not found - r

I am trying to do a Random Forest model on a dataset to predict a two classification variable. I have attached the code below and come back with this error. The variable Customer Count is in the dataset and this error is still getting thrown.
This is for my predictive model. I have tried to reorganize the dataset to get around Customer Count as the first variable. I have also tried to trim the dataset so it is not as large and that maybe that was an issue.
# Load the dataset and explore
library(readxl)
rawData <- read_excel("StrippedTransformerModelData.xlsx")
View(rawData)
head(rawData)
str(rawData)
summary(rawData)
# Split into Train and Validation sets
# Training Set : Validation Set = 70 : 30 (random)
set.seed(100)
train <- sample(nrow(rawData), 0.7*nrow(rawData), replace = FALSE)
TrainSet <- rawData[train,]
ValidSet <- rawData[-train,]
summary(TrainSet)
summary(ValidSet)
# Create a Random Forest model with default parameters
model1 <- randomForest(data = TrainSet, Failure ~ ., ntree = 500, mtry = 6, importance = TRUE)
model1
Error in eval(predvars, data, env) : object 'Customer Count' not found.
The variable Customer Count is in the dataset for sure and I don't know why it is saying it is not found.

Related

Obtaining randomForest Predictions from a dataset different from the original set

The datasets where prepared as followed:
library(caret)
library(randomForest)
set.seed(2242)
inTrain <- createDataPartition(y = new_train$classe, p= 0.85, list = FALSE)
training <- new_train[inTrain,]
testing <- new_train[-inTrain, ]
Then I trained the random forest model below
model <- randomForest(classe~., data = training, importance = TRUE)
the predictions are successfully generated
testingpredictions <- predict(model, testing[ ,-55])
The validation set is set as follows:
the term chosen_column is a subset of the test set which contains only the variables used in the training of the model
validation_set <- test[ ,chosen_columns]
validation_set$new_window<- as.factor(validation_set$new_window)
PROBLEM 1
when I try predict using the validation set which is in a different file:
valpredictions <- predict(model, newdata = validation_set, type = "response")
I get the error
Error in predict.randomForest(model, newdata = validation_set, type = "response") :
Type of predictors in new data do not match that of the training data.
PROBLEM 2
When I use the train function with method = "rf" and method = "gbm"
the code runs until the program crashes
Attempted Solutions
I tried rbind to match the datasets but also receive an error:
data <- rbind(train, test)
Error in match.names(clabs, names(xi)) :
names do not match previous names
I also tried coercing all the data into the same class prior to running the models but still yields the same errors

Random Forest prediction error in R "No forest component in the object"

I am attempting to use a random forest regressor to classify a raster stack, but an error does not allow a prediction of "area_pct", have I not trained the model properly?
d100 is my dataset with predictor variables d100[,4:ncol(d100)] and prediction variable d100["area_pct"].
#change na values to zero
d100[is.na(d100)] <- 0
set.seed(100)
#split dataset into training (70%) and testing (30%)
id<- sample(2,nrow(d100), replace = TRUE, prob = c(0.7,0.3))
train_100<- d100[id==1,]
test_100 <- d100[id==2,]
train random forest model with randomForest package, this appears to work fine
final_CC_rf_20 = randomForest(x=train[,4:ncol(train)], y= train$area_pct,
xtest=test[,4:ncol(test)], ytest=test$area_pct, mtry=14, importance=TRUE, ntree = 600)
Then I try to predict a raster.
New raster stack with predictor variables
sentinel_2_20 <- stack( paste(getwd(), "Sentinel_SR_clip_20.tif", sep="/") )
area_classified_20_2018 <- predict(object = final_CC_rf_20 , newdata = sentinel_2_20,type = 'response', progress = 'window')
but error pops up:
#Error in predict.randomForest(object = final_CC_rf_20, newdata = sentinel_2_20, :
# No forest component in the object
any help would be extremely useful
The arguments you are using for predict (with raster data) are not correct. The first argument, object, should be the raster data, the second argument, model, should be the fitted model. There is no argument newdata.
Another problem is that you use keep.forest=FALSE which is the default when xtest is not NULL. You could set keep.forest=TRUE but that is not a good approach, generally, as you should fit your model with all data before you make a prediction (you are no longer evaluating your model). Thus, I would suggest fitting your model without xtest, like this
rfmod <- randomForest(x=d100[,4:ncol(train)], y=d100$area_pct,
mtry=14, importance=TRUE, ntree = 600)
And then do
p <- predict(sentinel_2_20, rfmod, type='response')
See ?raster::predict or ?terra::predict for working examples

How can I handle a confusionMatrix error when it says my Data is null

I am trying to run a random forests analysis in R and it works well when I fit the model and predict it on the test group but when I run the confusionMatrix it gives me the following error:
Error in table(data, reference, dnn = dnn, ...) : all arguments must have the same length
load the test and training data
trainData <- read.csv("./pml-training.csv")
testData <- read.csv("./pml-testing.csv")
dim(trainData)
dim(testData)
Data cleaning - Here, variables with nearly zero variance or that are almost always NA,
and the columns containing summary statistics or irrelevant data will be removed.
trainClean <- trainData[,colMeans(is.na(trainData))< .9]
trainClean <- trainData[,-c(1:7)]
nvz <- nearZeroVar(trainClean)
trainClean <- trainClean[,-nvz]
dim(trainClean)
Split the data into training (70%) and validation (30%)
inTrain <- createDataPartition(y=trainClean$classe, p=0.7, list=FALSE)
train <- trainClean[inTrain,]
valid <- trainClean[-inTrain,]
# Create a control for 3 fold validation
control <- trainControl(method="cv", number=3, verboseIter = FALSE)
Building the models
Random Forests
# Fit the model on train using random forest
modFit <- train(classe~., data=train, method="rf", trControl=control, tuneLength=5, na.action=na.omit)
modFit
modPredict<- predict(modFit, valid, na.action=na.omit) # predict on the valid data set.
# Turn valid$classe into a factor and check it
valid$classe <- as.factor(valid$classe)
modCM <- confusionMatrix(modPredict, as.factor(valid$classe))
modCM
table(modPredict, valid$classe)
When I check the length of modPredict it = 122, and valid$classe = 5885. If I try dim on modPredict, I get NULL. I have tried using na.action=na.omit on the prediction chunk. I have also tried NOT using na.action=na.omit on the prediction or the fit chunks.
I checked the test and valid data sets where I split the data using:
```length(train); length(valid); length(valid$classe); nrow(valid); nrow(train)```
The output is:
[1] 94
[1] 94
[1] 5885
[1] 5885
[1] 13737
I have been struggling with this problem and similar problems on my decision tree chunk as well. I don't want people to do my homework for me, but I could use a hint.
Thanks in advance

Making a Prediction from a qda function in r

I am attempting to make a QDA Model in r. My code for the Model is below, and the model works (It makes a prediction for the training data and creates a working confusion matrix.
Model3=qda(TARGET_FLAG~KIDSDRIV+PARENT1+MSTATUS+CAR_USE+TIF+CAR_TYPE
+CLM_FREQ+REVOKED+MVR_PTS+ URBANICITY +SQRT_TRAVTIME +SQRT_BLUEBOOK+SQRT_INCOME
+EDUCATION+JOB, data = train)
Model3
summary(Model3)
summary(Model3)
predmodel.train.qda = predict(Model3, data=train)
table(Predicted=predmodel.train.qda$class, TARGET_FLAG=train$TARGET_FLAG)
predmodel.test.qda = predict(Model3, newdata=modtest)
table(Predicted=predmodel.test.qda$class, TARGET_FLAG=modtest$TARGET_FLAG)
Model3=qda(TARGET_FLAG~KIDSDRIV+PARENT1+MSTATUS+CAR_USE+TIF+CAR_TYPE
+CLM_FREQ+REVOKED+MVR_PTS+ URBANICITY +SQRT_TRAVTIME +SQRT_BLUEBOOK+SQRT_INCOME
+EDUCATION+JOB, data = data)
Model3Prediction <- predict(Model3, type = "response")
data$Model3Prediction=Model3Prediction$class
confusionMatrix(data$Model3Prediction, data$TARGET_FLAG)
This produces the desired effects, but when I apply the model to the Test Data i get the following error:
"Error in $<-.data.frame(*tmp*, P_TARGET_FLAG, value = list(class = c(1L, :
replacement has 2 rows, data has 2141"
test$P_TARGET_FLAG <- predict(Model3, newdata = test, type = "response")
How do I get the model to predict the value of my test data?
I hope, you are already splitting your data in train and test -
trainset = (data)
test = Data[!trainset,]
Once you are done, Try to use below code.
Model3 <- qda(TARGET_FLAG~KIDSDRIV+PARENT1+MSTATUS+CAR_USE+TIF+CAR_TYPE +CLM_FREQ+REVOKED+MVR_PTS+ URBANICITY +SQRT_TRAVTIME +SQRT_BLUEBOOK+SQRT_INCOME +EDUCATION+JOB, data = data, subset=trainset) qda.preds <- predict(Model3 , new =test) 'cm.f <- table(test$predictor, qda.preds$class) 'cm.f

Multi-group analysis with multiple imputation and weights using lavaan.survey

I am running some multi-group confirmatory factor analyses (CFA) in lavaan in R after multiple imputation.
First, I created a list called Plav to store 5 imputed datasets:
library(lavaan)
library(lavaan.survey)
library(mitools)
library(semTools)
a <- imputationList(Plav) ##Tell R these are plausible values
Survey <- svydesign(ids = ~1, weights = ~wt, data = a) # set the weight
Subsequently, I conducted a multi-group CFA:
# Model without population corrections
fit <- cfa(model, data=Plav[[1]], estimator = 'MLR', missing = 'default', group = 'gender',group.equal = c("loadings"))
# Model with population corrections
fitSurvey <- lavaan.survey(lavaan.fit = fit, survey.design = Survey)
The following error was returned:
Error in FUN(X[[1L]], ...) :
dims [product 1936] do not match the length of object [0]
When I remove the grouping variable and conduct an analysis on the whole sample, no error is returned.
Can anybody explain why this error is returned?

Resources