Which decision-tree algorithm does the R package randomForest uses? - r

R has a package for random forests, named randomForest. Its manual can be found here. In the manual it is not mentioned which decision-tree growing algorithm is being used. Is it the ID3 algorithm? Is it something else?
Clarification: I am not asking about the meta-algorithm of the random forest itself. That meta algorithm uses base decision tree algorithm for each of the grown trees. For example, in Python's scikit-learn package, the tree algorithm which is being used is CART (as mentioned here).

Related

Is it possible to build a random forest with model based trees i.e., `mob()` in partykit package

I'm trying to build a random forest using model based regression trees in partykit package. I have built a model based tree using mob() function with a user defined fit() function which returns an object at the terminal node.
In partykit there is cforest() which uses only ctree() type trees. I want to know if it is possible to modify cforest() or write a new function which builds random forests from model based trees which returns objects at the terminal node. I want to use the objects in the terminal node for predictions. Any help is much appreciated. Thank you in advance.
Edit: The tree I have built is similar to the one here -> https://stackoverflow.com/a/37059827/14168775
How do I build a random forest using a tree similar to the one in above answer?
At the moment, there is no canned solution for general model-based forests using mob() although most of the building blocks are available. However, we are currently reimplementing the backend of mob() so that we can leverage the infrastructure underlying cforest() more easily. Also, mob() is quite a bit slower than ctree() which is somewhat inconvenient in learning forests.
The best alternative, currently, is to use cforest() with a custom ytrafo. These can also accomodate model-based transformations, very much like the scores in mob(). In fact, in many situations ctree() and mob() yield very similar results when provided with the same score function as the transformation.
A worked example is available in this conference presentation:
Heidi Seibold, Achim Zeileis, Torsten Hothorn (2017).
"Individual Treatment Effect Prediction Using Model-Based Random Forests."
Presented at Workshop "Psychoco 2017 - International Workshop on Psychometric Computing",
WU Wirtschaftsuniversität Wien, Austria.
URL https://eeecon.uibk.ac.at/~zeileis/papers/Psychoco-2017.pdf
The special case of model-based random forests for individual treatment effect prediction was also implemented in a dedicated package model4you that uses the approach from the presentation above and is available from CRAN. See also:
Heidi Seibold, Achim Zeileis, Torsten Hothorn (2019).
"model4you: An R Package for Personalised Treatment Effect Estimation."
Journal of Open Research Software, 7(17), 1-6.
doi:10.5334/jors.219

R alternatives to JAGS/BUGS

Is there an R-Package I could use for Bayesian parameter estimation as an alternative to JAGS? I found an old question regarding JAGS/BUGS alternatives in R, however, the last post is already 9 years old. So maybe there are new and flexible gibbs sampling packages available in R? I want to use it to get parameter estimates for novel hierarchical hidden markov models with random effects and covariates etc. I highly value the flexibility of JAGS and think that JAGS is simply great, however, I want to write R functions that facilitate model specification and am looking for a package that I can use for parameter estimation.
There are some alternatives:
stan, with rstan R package. Stan looks well optimized but cannot do certain type of models (like binomial/poisson mixture model), since he cannot sample a discrete variable (or something like that...).
nimble
if you want highly optimized sampling based on C++, you may want to check Rcpp based solutions from Dirk Eddelbuettel

Which decision tree algorithm is used in randomForest in R?

I would like to know, which implementation of random forest in package randomForest in R is used to grow decision trees? Is it CART, ID3, C4.5 ,...... or sth else?
According to ?randomForest() the description states:
randomForest implements Breiman’s random forest algorithm (based on
Breiman and Cutler’s original Fortran code) for classification and
regression. It can also be used in unsupervised mode for assessing
proximities among data points, with Breiman L (2001). "Random
Forests"." Based on: Machine Learning. 45 (1): 5–32.
doi:10.1023/A:1010933404324.
According to Wikipedia (https://en.wikipedia.org/wiki/Random_forest):
The introduction of random forests proper was first made in a paper
by Leo Breiman This paper describes a method of building a forest of
uncorrelated trees using a CART like procedure. Reference to Breiman L (2001).
"Random Forests". Machine Learning. 45 (1): 5–32.
doi:10.1023/A:1010933404324. "
Therefore I would say it is CART.
In R, the ()randomForest package is using CART. There is also another package in R called ()ranger which can run decision trees at a faster pace

Random Subspace Method in R

Any idea on how to implement "Random Subspace Method" (an ensemble method) as described by (Ho,1998) in R?
Can't find a package
Ho, Tin Kam (1998). "The Random Subspace Method for Constructing Decision Forests". IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844.
Practically speaking, this has been "integrated" (kind of) into the Random Forest (RF) algorithm - it is in fact the random selection of features controlled by the mtry argument in the standard R package randomForest; see the Wikipedia entry on RF, as well as the answer (disclaimer: mine) in the SO thread Why is Random Forest with a single tree much better than a Decision Tree classifier? for more details.
While replicating the exact behavior of the said algorithm in the scikit-learn implementation of RF is easy and straightforward (just set bootstrap=False - see linked thread above), I'll confess that I cannot think of a way to get the same behavior from the randomForest R package - i.e. "force" it to not use bootstrap sampling, which would make it equivalent to the Random Subspace method; I have tried the combination of replace=FALSE and sampsize=nrow(x) in the randomForest function, but it doesn't seem to work...
All in all, the message here (and arguably the reason why there is not a specific implementation of the method in R or other frameworks) is that, most probably, you will be better off sticking to Random Forests; if you definitely want to experiment with it, AFAIK the only option seems to be Python and scikit-learn.
Found this function in caret package:
model<-bag(x=iris[,-5], y=iris[,5], vars = 2,
bagControl = bagControl(fit = ctreeBag$fit,
predict = ctreeBag$pred,
aggregate = ctreeBag$aggregate),
trControl=trainControl(method = 'none'))
It supports vars attribute so you can consider a random subset of variables for each learner; at the same time bootstrap sampling can be avoided by passing method = 'none' as a parameter.

Adaboosting in R with any classifier

There is an implementation of AdaBoosting algorithm in R. See this link, the function is called boosting.
The problem is that this package uses classification trees as a base or weak learner.
Is that possible to substitute the original weak learner to any other (e.g., SVM or Neural Networks) using this package?
If not, are there any examples of AdaBoosting implementation in R?
Many thanks!

Resources