unable to train the randomforest model in R - r

I'm trying to train my dataset using R. Following is the code that I'll be using
functionRankFeatureByImportance <- function(logwine_withoutQuality){
#logwine_withoutQuality$quality<-factor(logwine_withoutQuality$quality)
# ensure results are repeatable
set.seed(7)
# prepare training scheme
control <- trainControl(method="repeatedcv", number=10, repeats=3)
# train the model
model <- train(logwine_withoutQuality[,-12],
logwine_withoutQuality$quality, method="lvq", preProcess="scale", trControl=control)
# estimate variable importance
importance <- varImp(model, scale=FALSE)
# summarize importance
print(importance)
# plot importance
plot(importance)
}
But when using this I'm getting an error like below.
I'm unable to understand what my error is.
Following is an image of the dataset I'm using
There aren't any null values in the dataframe.
Really appreciate if someone can kindly help me to solve this

Related

How can I handle a confusionMatrix error when it says my Data is null

I am trying to run a random forests analysis in R and it works well when I fit the model and predict it on the test group but when I run the confusionMatrix it gives me the following error:
Error in table(data, reference, dnn = dnn, ...) : all arguments must have the same length
load the test and training data
trainData <- read.csv("./pml-training.csv")
testData <- read.csv("./pml-testing.csv")
dim(trainData)
dim(testData)
Data cleaning - Here, variables with nearly zero variance or that are almost always NA,
and the columns containing summary statistics or irrelevant data will be removed.
trainClean <- trainData[,colMeans(is.na(trainData))< .9]
trainClean <- trainData[,-c(1:7)]
nvz <- nearZeroVar(trainClean)
trainClean <- trainClean[,-nvz]
dim(trainClean)
Split the data into training (70%) and validation (30%)
inTrain <- createDataPartition(y=trainClean$classe, p=0.7, list=FALSE)
train <- trainClean[inTrain,]
valid <- trainClean[-inTrain,]
# Create a control for 3 fold validation
control <- trainControl(method="cv", number=3, verboseIter = FALSE)
Building the models
Random Forests
# Fit the model on train using random forest
modFit <- train(classe~., data=train, method="rf", trControl=control, tuneLength=5, na.action=na.omit)
modFit
modPredict<- predict(modFit, valid, na.action=na.omit) # predict on the valid data set.
# Turn valid$classe into a factor and check it
valid$classe <- as.factor(valid$classe)
modCM <- confusionMatrix(modPredict, as.factor(valid$classe))
modCM
table(modPredict, valid$classe)
When I check the length of modPredict it = 122, and valid$classe = 5885. If I try dim on modPredict, I get NULL. I have tried using na.action=na.omit on the prediction chunk. I have also tried NOT using na.action=na.omit on the prediction or the fit chunks.
I checked the test and valid data sets where I split the data using:
```length(train); length(valid); length(valid$classe); nrow(valid); nrow(train)```
The output is:
[1] 94
[1] 94
[1] 5885
[1] 5885
[1] 13737
I have been struggling with this problem and similar problems on my decision tree chunk as well. I don't want people to do my homework for me, but I could use a hint.
Thanks in advance

Model Error in Using R Caret with numeric data

I am new to R. I got a error while using Caret.
# load the library
library(mlbench)
library(caret)
mydata2 <-mydata[1:200,c(52, 56:59)]
mydata2
# load the dataset
# prepare training scheme
control <- trainControl(method="lm", number=10, repeats=3)
# train the model
model <- train(MtrRegActNetEngyDailyKwh~., data=mydata2,method="lvq", preProcess="scale", trControl=control)
# estimate variable importance
importance <- varImp(model, scale=FALSE)
# summarize importance
print(importance)
But, the result shows nothing..
plot shows nothing..
Example of my data:
structure(list(MtrRegActNetEngyDailyKwh = c(16.736, 18.093),
Building = c(6, 6), numberofpeople = c(5, 5), pool = c(2,
2), typeofAC = c(1, 1)), row.names = 1:2, class = "data.frame")
I am not sure why the model does not work... Can get some help?
Update:
I tried following code. It works.
model_nnet<-train(trainSetSmall[,predictors],trainSetSmall[,outcomeName],method='nnet')
importance <- varImp(model_nnet, scale=FALSE)
plot(importance)
I also want to test it in 'gbm' model.
model_gbm<-train(trainSetSmall[,predictors], trainSetSmall[,outcomeName],method='gbm')
importance2 <- varImp(model_gbm, scale=FALSE)
But, I got an error message..
Error Message: > importance2 <- varImp(model_gbm, scale=FALSE)
Error in relative.influence(object, n.trees = numTrees) :
could not find function "relative.influence"
I am not sure why it does not work... I just want to use other model to test again. Can I get some help?
As the error states, you are using the wrong kind of model for your data. Learning Vector Quantization is only for classification modelling, not regression modelling. You need to select a different model given your data. See this page of the caret documentation for all the available models in caret. Filter on regression to see all the regression models.

Extimate prediction accuracy of cox ph

i would like to develop a cox proportional hazard model with r, use it to predict input and evaluate the accuracy of the model. For the evaluation I would like to use the Brior score.
# import various packages, needed at some point of the script
library("survival")
library("survminer")
library("prodlim")
library("randomForestSRC")
library("pec")
library("rpart")
library("mlr")
library("Hmisc")
library("ipred")
# load lung cancer data
data("lung")
head(lung)
# recode status variable
lung$status <- lung$status-1
# Delete rows with missing values
lung <- na.omit(lung)
# split data into training and testing
## 80% of the sample size
smp_size <- floor(0.8 * nrow(lung))
## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(lung)), size = smp_size)
# training and testing data
train.lung <- lung[train_ind, ]
test.lung <- lung[-train_ind, ]
# time and failure event
s <- Surv(train.lung$time, train.lung$status)
# create model
cox.ph2 <- coxph(s~age+meal.cal+wt.loss, data=train.lung)
# predict
pred <- predict(cox.ph2, newdata = train.lung)
# evaluate
sbrier(s, pred)
as an outcome of the prediction I would expect the time (as in "when does this individuum experience failure). Instead I get values like this
[1] 0.017576359 -0.135928959 -0.347553969 0.112509137 -0.229301199 -0.131861582 0.044589175 0.002634008
[9] 0.345966978 0.209488560 0.002418358
What does that mean?
Furthermore sbrier does not work. Apparently it can not work with the prediction pred (no surprise there)
How do I solve this? How do I make a prediction with cox.ph2? How can I evaluate the model afterwards?
The predict() function won't return a time value, you have to specify the argument type = c("lp", "risk","expected","terms","survival") in the predict() function.
If you want to get the hazard ratios :
predict(cox.ph2, newdata = test.lung, type = "risk")
Note that you want to predict the values on the test set not the training set.
I have read that you can use AFT models in your case :
https://stats.stackexchange.com/questions/79362/how-to-get-predictions-in-terms-of-survival-time-from-a-cox-ph-model
You also can read this post :
Calculate the Survival prediction using Cox Proportional Hazard model in R
Hope it will help

r caretEnsemble - passing a fit param to one specific model in caretList

I have some code which fits several (cross-validated) models to some data, as below.
library(datasets)
library(caret)
library(caretEnsemble)
# load data
data("iris")
# establish cross-validation structure
set.seed(32)
trainControl <- trainControl(method="repeatedcv",
number=5, repeats=3, # 3x 5-fold CV
search="random")
algorithmList <- c('lda', # Linear Discriminant Analysis
'rpart' , # Classification and Regression Trees
'svmRadial') # SVM with RBF Kernel
# cross-validate models from algorithmList
models <- caretList(Species~., data=iris, trControl=trainControl, methodList=algorithmList)
so far so good. however, if I add 'gbm' to my algorithmList, I get a ton of extraneous log messages because gbm seems to have a verbose=TRUE default fit param.
According to the caret docs, if I were running train on method='gbm' by itself (not along with several models trained in a caretList), I could simply add verbose=FALSE to train(), which would flow through to gbm. But this throws an error when I try it in caretList.
So I would like to pass verbose=FALSE (or any other fit param, in theory) specifically to one particular model from caretList's methodList. How can I accomplish this?
ok this is actually addressed well in the docs.
?caretList
includes:
tuneList: optional, a NAMED list of caretModelSpec objects. This is
much more flexible than methodList and allows the specificaiton of
model-specific parameters
And I've confirmed my problem is solved if instead of:
algorithmList <- c('lda', # Linear Discriminant Analysis
'rpart' , # Classification and Regression Trees
'svmRadial', # SVM with RBF Kernel
'gbm') # Gradient-boosted machines
I use:
modelTypes <- list(lda = caretModelSpec(method="lda"),
rpart = caretModelSpec(method="rpart"),
svmRadial= caretModelSpec(method="svmRadial"),
gbm = caretModelSpec(method="rf", verbose=FALSE)
...then the models <- caretList(... line goes from:
models <- caretList(... methodList=algorithmList)
to:
models <-caretList(... tuneList = modelTypes)

Random Forest Predictions

I am looking for some guidance on a homework assignment I am working on for a class. We are given a dataset with 14K observations and we are asked to build a prediction model. I subset the dataset into training and testing (4909 observations), here I am using the caret package, which predicts the last variable "classe". I pulled out the near zero variables and built the model but when I tried to do predictions I only get 97 predictions back. I reviewed the help files but still can't figure out where I am going wrong. Any hints would be appreciated.
Here is the Code:
set.seed(1234)
pml.training <- read.csv("./data/pml-training.csv")
#
library(caret)
inTrain <- createDataPartition(y=pml.training$classe, p=0.75, list=FALSE)
training <- pml.training[inTrain,]
testing <- pml.training[-inTrain,]
# Pull out the Near Zero Value (NZV)
nzv <- nearZeroVar(training, saveMetrics=TRUE)
omit <- which(nzv$nzv==TRUE)
training <- training[,-omit]
testing <- testing[,-omit]
# Fit the model
modFit <- train(classe ~., method="rf", data=training)
modFit
print(modFit$finalModel)
plot(modFit)
# Try and predict on the testing model
pred <- predict(modFit, newdata=testing)
testing$predRight <- pred==testing$classe
print(table(pred, testing$classe))
Thanks, Pat C.
Have you checked
sum(complete.cases(subset(testing, select = -classe)))
?

Resources