Is there a function within caret (or another package) that can perform a Breusch-Pagan / Cook-Weisberg test for heteroskedasticity on an 'nnet' model trained using caret?
E.g. something similar to library(car); ncvTest or library(lmtest); bptest for lm objects, but that works on nnet objects created from caret?
Example data
library(caret)
set.seed(4)
n <- 100
x1i <- rnorm(n)
x2i <- rnorm(n)
yi <- rnorm(n)
dat <- data.frame(yi, x1i, x2i)
mod <- train(yi ~., data=dat, method="nnet", trace=FALSE, linout=TRUE)
This produces the plot of fitted vs residuals:
No there is not anything like that in the package right now.
Related
I am interested in running a Random Forest model on a very large dataset. I have been reading about "parallel computing" in an effort to make the code run faster. I came across this post over here (parallel execution of random forest in R) that had some suggestions:
library(randomForest)
library(doMC)
registerDoMC()
x <- matrix(runif(500), 100)
y <- gl(2, 50)
rf <- foreach(ntree=rep(25000, 6), .combine=randomForest::combine,
.multicombine=TRUE, .packages='randomForest') %dopar% {
randomForest(x, y, ntree=ntree)
}
I am trying to understand what is happening in the above code - my guess is that perhaps 6 Random Forest models (with each Random Forest Model having 25000 trees) are being fit to dataset and then combined into a single model?
I started looking into the "combine()" function in R (https://cran.r-project.org/web/packages/randomForest/randomForest.pdf) - it seems that the "combine()" function is combining several Random Forest models into a single model (here, I think 3 Random Forest models are being combined into a single model):
data(iris)
rf1 <- randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
rf2 <- randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
rf3 <- randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
rf.all <- combine(rf1, rf2, rf3)
print(rf.all)
My Question: Can someone please confirm if I have understood this correctly? In the above code, are 6 Random Forest models being trained in parallel and then combined into a single model - is this correct?
References:
https://stats.stackexchange.com/questions/519640/parallelizing-random-forest-learning-in-r-changes-the-class-of-the-rf-object
https://rpubs.com/chidungkt/315749
https://www.learnbymarketing.com/724/parallel-processing-r-basics/
Yes, I would say yes. foreach's .combine=arguments takes the function given for it to apply on the results the combination.
My following output produce accuracy:
data(iris)
x = iris[,-5]
y = iris$Species
train_control <- trainControl(method="LOOCV")
model <- train(x,y, trControl=train_control, method="nb")
But what i wish to get is the following output with probability each class belong to:
Model=naiveBayes(Species ~., data=iris)
Model
Please include the packages you are using, like:
library(caret)
It looks to me like caret::train method "nb" uses NaiveBayes (from klaR package), while naiveBayes is from the e1071 package.
In any case model$finalmodel contains the model object.
Here is my code for random forest and rfsrc in R; Is there anyway to include n_estimators and max_depth like sklearn version in my R code ? Also, How can I plot OBB error vs number of trees plot like this?
set.seed(2234)
tic("Time to train RFSRC fast")
fast.o <- rfsrc.fast(Label ~ ., data = train[(1:50000),],forest=TRUE)
toc()
print(fast.o)
#print(vimp(fast.o)$importance)
set.seed(2367)
tic("Time to test RFSRC fast ")
#data(breast, package = "randomForestSRC")
fast.pred <- predict(fast.o, test[(1:50000),])
toc()
print(fast.pred)
set.seed(3)
tic("RF model fitting without Parallelization")
rf <-randomForest(Label~.,data=train[(1:50000),])
toc()
print(rf)
plot(rf)
varImp(rf,sort = T)
varImpPlot(rf, sort=T, n.var= 10, main= "Variable Importance", pch=16)
rf_pred <- predict(rf, newdata=test[(1:50000),])
confMatrix <- confusionMatrix(rf_pred,test[(1:50000),]$Label)
confMatrix
I appreciate your time.
You need to set block.size=1 , and also take note the sampling is without replacement, you can check the vignette for rfsrc:
Unlike Breiman's random forests, the default action here is sampling
without replacement. Thus out-of-bag (OOB) technically means
out-of-sample, but for legacy reasons we retain the term OOB.
So using an example dataset,
library(mlbench)
library(randomForestSRC)
data(Sonar)
set.seed(911)
trn = sample(nrow(Sonar),150)
rf <- rfsrc(Class ~ ., data = Sonar[trn,],ntree=500,block.size=1,importance=TRUE)
pred <- predict(rf,Sonar[-trn,],block.size=1)
plot(rf$err.rate[,1],type="l",col="steelblue",xlab="ntrees",ylab="err.rate",
ylim=c(0,0.5))
lines(pred$err.rate[,1],col="orange")
legend("topright",fill=c("steelblue","orange"),c("test","OOB.train"))
In randomForest:
library(randomForest)
rf <- randomForest(Class ~ ., data = Sonar[trn,],ntree=500)
pred <- predict(rf,Sonar[-trn,],predict.all=TRUE)
Not very sure if there's an easier to get ntrees error:
err_by_tree = sapply(1:ncol(pred$individual),function(i){
apply(pred$individual[,1:i,drop=FALSE],1,
function(i)with(rle(i),values[which.max(lengths)]))
})
err_by_tree = colMeans(err_by_tree!=Sonar$Class[-trn])
Then plot:
plot(rf$err.rate[,1],type="l",col="steelblue",xlab="ntrees",ylab="err.rate",
ylim=c(0,0.5))
lines(err_by_tree,col="orange")
legend("topright",fill=c("steelblue","orange"),c("test","OOB.train"))
I have created a model in R using the forecast package.
My source of learning this is from here:
https://robjhyndman.com/hyndsight/dailydata/
I am using the last section which includes fourier series as such:
y <- ts(x, frequency=7)
z <- fourier(ts(x, frequency=365.25), K=5)
zf <- fourier(ts(x, frequency=365.25), K=5, h=100)
fit <- auto.arima(y, xreg=cbind(z,holiday), seasonal=FALSE)
fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)
After I create this model, is there a way I can do a cross validation k-fold test to determine the error and adjusted error?
I know how to do it with a generalized linear model as such:
library(boot)
lm1 <- glm(ValuePerSqFt ~ Units + SqFt + Boro, data = housing)
lm1cv <- cv.glm(housing, lm1, K=5)
lm1cv$delta
[1] 1870.31 1869.352
This shows the error and adjusted error.
Is there a function in the forecast package that can do this and it will help me compare the accuracy of this model with the glm model?
I'm still pretty new to R and AI / ML techniques. I would like to use a neural net for prediction, and since I'm new I would just like to see if this is how it should be done.
As a test case, I'm predicting values of sin(), based on 2 previous values. For training I create a data frame withy = sin(x), x1 = sin(x-1), x2 = sin(x-2), then use the formula y ~ x1 + x2.
It seems to work, but I am just wondering if this is the right way to do it, or if there is a more idiomatic way.
This is the code:
require(quantmod) #for Lag()
requre(nnet)
x <- seq(0, 20, 0.1)
y <- sin(x)
te <- data.frame(y, Lag(y), Lag(y,2))
names(te) <- c("y", "x1", "x2")
p <- nnet(y ~ x1 + x2, data=te, linout=TRUE, size=10)
ps <- predict(p, x1=y)
plot(y, type="l")
lines(ps, col=2)
Thanks
[edit]
Is this better for the predict call?
t2 <- data.frame(sin(x), Lag(sin(x)))
names(t2) <- c("x1", "x2")
vv <- predict(p, t2)
plot(vv)
I guess I'd like to see that the nnet is actually working by looking at its predictions (which should approximate a sin wave.)
I really like the caret package, as it provides a nice, unified interface to a variety of models, such as nnet. Furthermore, it automatically tunes hyperparameters (such as size and decay) using cross-validation or bootstrap re-sampling. The downside is that all this re-sampling takes some time.
#Load Packages
require(quantmod) #for Lag()
require(nnet)
require(caret)
#Make toy dataset
y <- sin(seq(0, 20, 0.1))
te <- data.frame(y, x1=Lag(y), x2=Lag(y,2))
names(te) <- c("y", "x1", "x2")
#Fit model
model <- train(y ~ x1 + x2, te, method='nnet', linout=TRUE, trace = FALSE,
#Grid of tuning parameters to try:
tuneGrid=expand.grid(.size=c(1,5,10),.decay=c(0,0.001,0.1)))
ps <- predict(model, te)
#Examine results
model
plot(y)
lines(ps, col=2)
It also predicts on the proper scale, so you can directly compare results. If you are interested in neural networks, you should also take a look at the neuralnet and RSNNS packages. caret can currently tune nnet and neuralnet models, but does not yet have an interface for RSNNS.
/edit: caret now has an interface for RSNNS. It turns out if you email the package maintainer and ask that a model be added to caret he'll usually do it!
/edit: caret also now supports Bayesian regularization for feed-forward neural networks from the brnn package. Furthermore, caret now also makes it much easier to specify your own custom models, to interface with any neural network package you like!