Test for Heteroscedasticity on a caret nnet model - r

Is there a function within caret (or another package) that can perform a Breusch-Pagan / Cook-Weisberg test for heteroskedasticity on an 'nnet' model trained using caret?
E.g. something similar to library(car); ncvTest or library(lmtest); bptest for lm objects, but that works on nnet objects created from caret?
Example data
library(caret)
set.seed(4)
n <- 100
x1i <- rnorm(n)
x2i <- rnorm(n)
yi <- rnorm(n)
dat <- data.frame(yi, x1i, x2i)
mod <- train(yi ~., data=dat, method="nnet", trace=FALSE, linout=TRUE)
This produces the plot of fitted vs residuals:

No there is not anything like that in the package right now.

Related

Understanding the "Combine" Function

I am interested in running a Random Forest model on a very large dataset. I have been reading about "parallel computing" in an effort to make the code run faster. I came across this post over here (parallel execution of random forest in R) that had some suggestions:
library(randomForest)
library(doMC)
registerDoMC()
x <- matrix(runif(500), 100)
y <- gl(2, 50)
rf <- foreach(ntree=rep(25000, 6), .combine=randomForest::combine,
.multicombine=TRUE, .packages='randomForest') %dopar% {
randomForest(x, y, ntree=ntree)
}
I am trying to understand what is happening in the above code - my guess is that perhaps 6 Random Forest models (with each Random Forest Model having 25000 trees) are being fit to dataset and then combined into a single model?
I started looking into the "combine()" function in R (https://cran.r-project.org/web/packages/randomForest/randomForest.pdf) - it seems that the "combine()" function is combining several Random Forest models into a single model (here, I think 3 Random Forest models are being combined into a single model):
data(iris)
rf1 <- randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
rf2 <- randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
rf3 <- randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
rf.all <- combine(rf1, rf2, rf3)
print(rf.all)
My Question: Can someone please confirm if I have understood this correctly? In the above code, are 6 Random Forest models being trained in parallel and then combined into a single model - is this correct?
References:
https://stats.stackexchange.com/questions/519640/parallelizing-random-forest-learning-in-r-changes-the-class-of-the-rf-object
https://rpubs.com/chidungkt/315749
https://www.learnbymarketing.com/724/parallel-processing-r-basics/
Yes, I would say yes. foreach's .combine=arguments takes the function given for it to apply on the results the combination.

How to r un loocv in R with naivebayes model?

My following output produce accuracy:
data(iris)
x = iris[,-5]
y = iris$Species
train_control <- trainControl(method="LOOCV")
model <- train(x,y, trControl=train_control, method="nb")
But what i wish to get is the following output with probability each class belong to:
Model=naiveBayes(Species ~., data=iris)
Model
Please include the packages you are using, like:
library(caret)
It looks to me like caret::train method "nb" uses NaiveBayes (from klaR package), while naiveBayes is from the e1071 package.
In any case model$finalmodel contains the model object.

Depth and OOB error of a randomForest and randomForestSRC

Here is my code for random forest and rfsrc in R; Is there anyway to include n_estimators and max_depth like sklearn version in my R code ? Also, How can I plot OBB error vs number of trees plot like this?
set.seed(2234)
tic("Time to train RFSRC fast")
fast.o <- rfsrc.fast(Label ~ ., data = train[(1:50000),],forest=TRUE)
toc()
print(fast.o)
#print(vimp(fast.o)$importance)
set.seed(2367)
tic("Time to test RFSRC fast ")
#data(breast, package = "randomForestSRC")
fast.pred <- predict(fast.o, test[(1:50000),])
toc()
print(fast.pred)
set.seed(3)
tic("RF model fitting without Parallelization")
rf <-randomForest(Label~.,data=train[(1:50000),])
toc()
print(rf)
plot(rf)
varImp(rf,sort = T)
varImpPlot(rf, sort=T, n.var= 10, main= "Variable Importance", pch=16)
rf_pred <- predict(rf, newdata=test[(1:50000),])
confMatrix <- confusionMatrix(rf_pred,test[(1:50000),]$Label)
confMatrix
I appreciate your time.
You need to set block.size=1 , and also take note the sampling is without replacement, you can check the vignette for rfsrc:
Unlike Breiman's random forests, the default action here is sampling
without replacement. Thus out-of-bag (OOB) technically means
out-of-sample, but for legacy reasons we retain the term OOB.
So using an example dataset,
library(mlbench)
library(randomForestSRC)
data(Sonar)
set.seed(911)
trn = sample(nrow(Sonar),150)
rf <- rfsrc(Class ~ ., data = Sonar[trn,],ntree=500,block.size=1,importance=TRUE)
pred <- predict(rf,Sonar[-trn,],block.size=1)
plot(rf$err.rate[,1],type="l",col="steelblue",xlab="ntrees",ylab="err.rate",
ylim=c(0,0.5))
lines(pred$err.rate[,1],col="orange")
legend("topright",fill=c("steelblue","orange"),c("test","OOB.train"))
In randomForest:
library(randomForest)
rf <- randomForest(Class ~ ., data = Sonar[trn,],ntree=500)
pred <- predict(rf,Sonar[-trn,],predict.all=TRUE)
Not very sure if there's an easier to get ntrees error:
err_by_tree = sapply(1:ncol(pred$individual),function(i){
apply(pred$individual[,1:i,drop=FALSE],1,
function(i)with(rle(i),values[which.max(lengths)]))
})
err_by_tree = colMeans(err_by_tree!=Sonar$Class[-trn])
Then plot:
plot(rf$err.rate[,1],type="l",col="steelblue",xlab="ntrees",ylab="err.rate",
ylim=c(0,0.5))
lines(err_by_tree,col="orange")
legend("topright",fill=c("steelblue","orange"),c("test","OOB.train"))

Cross Validation K-Fold with Forecast Package in R

I have created a model in R using the forecast package.
My source of learning this is from here:
https://robjhyndman.com/hyndsight/dailydata/
I am using the last section which includes fourier series as such:
y <- ts(x, frequency=7)
z <- fourier(ts(x, frequency=365.25), K=5)
zf <- fourier(ts(x, frequency=365.25), K=5, h=100)
fit <- auto.arima(y, xreg=cbind(z,holiday), seasonal=FALSE)
fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)
After I create this model, is there a way I can do a cross validation k-fold test to determine the error and adjusted error?
I know how to do it with a generalized linear model as such:
library(boot)
lm1 <- glm(ValuePerSqFt ~ Units + SqFt + Boro, data = housing)
lm1cv <- cv.glm(housing, lm1, K=5)
lm1cv$delta
[1] 1870.31 1869.352
This shows the error and adjusted error.
Is there a function in the forecast package that can do this and it will help me compare the accuracy of this model with the glm model?

Using nnet for prediction, am i doing it right?

I'm still pretty new to R and AI / ML techniques. I would like to use a neural net for prediction, and since I'm new I would just like to see if this is how it should be done.
As a test case, I'm predicting values of sin(), based on 2 previous values. For training I create a data frame withy = sin(x), x1 = sin(x-1), x2 = sin(x-2), then use the formula y ~ x1 + x2.
It seems to work, but I am just wondering if this is the right way to do it, or if there is a more idiomatic way.
This is the code:
require(quantmod) #for Lag()
requre(nnet)
x <- seq(0, 20, 0.1)
y <- sin(x)
te <- data.frame(y, Lag(y), Lag(y,2))
names(te) <- c("y", "x1", "x2")
p <- nnet(y ~ x1 + x2, data=te, linout=TRUE, size=10)
ps <- predict(p, x1=y)
plot(y, type="l")
lines(ps, col=2)
Thanks
[edit]
Is this better for the predict call?
t2 <- data.frame(sin(x), Lag(sin(x)))
names(t2) <- c("x1", "x2")
vv <- predict(p, t2)
plot(vv)
I guess I'd like to see that the nnet is actually working by looking at its predictions (which should approximate a sin wave.)
I really like the caret package, as it provides a nice, unified interface to a variety of models, such as nnet. Furthermore, it automatically tunes hyperparameters (such as size and decay) using cross-validation or bootstrap re-sampling. The downside is that all this re-sampling takes some time.
#Load Packages
require(quantmod) #for Lag()
require(nnet)
require(caret)
#Make toy dataset
y <- sin(seq(0, 20, 0.1))
te <- data.frame(y, x1=Lag(y), x2=Lag(y,2))
names(te) <- c("y", "x1", "x2")
#Fit model
model <- train(y ~ x1 + x2, te, method='nnet', linout=TRUE, trace = FALSE,
#Grid of tuning parameters to try:
tuneGrid=expand.grid(.size=c(1,5,10),.decay=c(0,0.001,0.1)))
ps <- predict(model, te)
#Examine results
model
plot(y)
lines(ps, col=2)
It also predicts on the proper scale, so you can directly compare results. If you are interested in neural networks, you should also take a look at the neuralnet and RSNNS packages. caret can currently tune nnet and neuralnet models, but does not yet have an interface for RSNNS.
/edit: caret now has an interface for RSNNS. It turns out if you email the package maintainer and ask that a model be added to caret he'll usually do it!
/edit: caret also now supports Bayesian regularization for feed-forward neural networks from the brnn package. Furthermore, caret now also makes it much easier to specify your own custom models, to interface with any neural network package you like!

Resources