Random Subspace Method in R - r

Any idea on how to implement "Random Subspace Method" (an ensemble method) as described by (Ho,1998) in R?
Can't find a package
Ho, Tin Kam (1998). "The Random Subspace Method for Constructing Decision Forests". IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844.

Practically speaking, this has been "integrated" (kind of) into the Random Forest (RF) algorithm - it is in fact the random selection of features controlled by the mtry argument in the standard R package randomForest; see the Wikipedia entry on RF, as well as the answer (disclaimer: mine) in the SO thread Why is Random Forest with a single tree much better than a Decision Tree classifier? for more details.
While replicating the exact behavior of the said algorithm in the scikit-learn implementation of RF is easy and straightforward (just set bootstrap=False - see linked thread above), I'll confess that I cannot think of a way to get the same behavior from the randomForest R package - i.e. "force" it to not use bootstrap sampling, which would make it equivalent to the Random Subspace method; I have tried the combination of replace=FALSE and sampsize=nrow(x) in the randomForest function, but it doesn't seem to work...
All in all, the message here (and arguably the reason why there is not a specific implementation of the method in R or other frameworks) is that, most probably, you will be better off sticking to Random Forests; if you definitely want to experiment with it, AFAIK the only option seems to be Python and scikit-learn.

Found this function in caret package:
model<-bag(x=iris[,-5], y=iris[,5], vars = 2,
bagControl = bagControl(fit = ctreeBag$fit,
predict = ctreeBag$pred,
aggregate = ctreeBag$aggregate),
trControl=trainControl(method = 'none'))
It supports vars attribute so you can consider a random subset of variables for each learner; at the same time bootstrap sampling can be avoided by passing method = 'none' as a parameter.

Related

mlr3 optimized average of ensemble

I try to optimize the averaged prediction of two logistic regressions in a classification task using a superlearner.
My measure of interest is classif.auc
The mlr3 help file tells me (?mlr_learners_avg)
Predictions are averaged using weights (in order of appearance in the
data) which are optimized using nonlinear optimization from the
package "nloptr" for a measure provided in measure (defaults to
classif.acc for LearnerClassifAvg and regr.mse for LearnerRegrAvg).
Learned weights can be obtained from $model. Using non-linear
optimization is implemented in the SuperLearner R package. For a more
detailed analysis the reader is referred to LeDell (2015).
I have two questions regarding this information:
When I look at the source code I think LearnerClassifAvg$new() defaults to "classif.ce", is that true?
I think I could set it to classif.auc with param_set$values <- list(measure="classif.auc",optimizer="nloptr",log_level="warn")
The help file refers to the SuperLearner package and LeDell 2015. As I understand it correctly, the proposed "AUC-Maximizing Ensembles through Metalearning" solution from the paper above is, however, not impelemented in mlr3? Or do I miss something? Could this solution be applied in mlr3? In the mlr3 book I found a paragraph regarding calling an external optimization function, would that be possible for SuperLearner?
As far as I understand it, LeDell2015 proposes and evaluate a general strategy that optimizes AUC as a black-box function by learning optimal weights. They do not really propose a best strategy or any concrete defaults so I looked into the defaults of the SuperLearner package's AUC optimization strategy.
Assuming I understood the paper correctly:
The LearnerClassifAvg basically implements what is proposed in LeDell2015 namely, it optimizes the weights for any metric using non-linear optimization. LeDell2015 focus on the special case of optimizing AUC. As you rightly pointed out, by setting the measure to "classif.auc" you get a meta-learner that optimizes AUC. The default with respect to which optimization routine is used deviates between mlr3pipelines and the SuperLearner package, where we use NLOPT_LN_COBYLA and SuperLearner ... uses the Nelder-Mead method via the optim function to minimize rank loss (from the documentation).
So in order to get exactly the same behaviour, you would need to implement a Nelder-Mead bbotk::Optimizer similar to here that simply wraps stats::optim with method Nelder-Mead and carefully compare settings and stopping criteria. I am fairly confident that NLOPT_LN_COBYLA delivers somewhat comparable results, LeDell2015 has a comparison of the different optimizers for further reference.
Thanks for spotting the error in the documentation. I agree, that the description is a little unclear and I will try to improve this!

Difference between "mlp" and "mlpML"

I'm using the Caret package from R to create prediction models for maximum energy demand. What i need to use is neural network multilayer perceptron, but in the Caret package i found out there's 2 of the mlp method, which is "mlp" and "mlpML". what is the difference between the two?
I have read description from a book (Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization) but it still doesnt answer my question.
Caret has 238 different models available! However many of them are just different methods to call the same basic algorithm.
Besides mlp there are 9 other methods of calling a multi-layer-perceptron one of which is mlpML. The real difference is only in the parameters of the function call and which model you need depends on your use case and what you want to adapt about the basic model.
Chances are, if you don't know what mlpML or mlpWeightDecay,etc. does you are fine to just use the basic mlp.
Looking at the official documentation we can see that:
mlp(size) while mlpML(layer1,layer2,layer3) so in the first method you can only tune the size of the multi-layer-perceptron while in the second call you can tune each layer individually.
Looking at the source code here:
https://github.com/topepo/caret/blob/master/models/files/mlp.R
and here:
https://github.com/topepo/caret/blob/master/models/files/mlpML.R
It seems that the difference is that mlpML allows several hidden layers:
modelInfo <- list(label = "Multi-Layer Perceptron, with multiple layers",
while mlp has one single layer with hidden units.
The official documentation also hints at this difference. In my opinion, it is not particularly useful to have many different models that differ only very slightly, and the documentation does not explain those slight differences well.

R Supervised Latent Dirichlet Allocation Package

I'm using this LDA package for R. Specifically I am trying to do supervised latent dirichlet allocation (slda). In the linked package, there's an slda.em function. However what confuses me is that it asks for alpha, eta and variance parameters. As far as I understand, I thought these parameters are unknowns in the model. So my question is, did the author of the package mean to say that these are initial guesses for the parameters? If yes, there doesn't seem to be a way of accessing them from the result of running slda.em.
Aside from coding the extra EM steps in the algorithm, is there a suggested way to guess reasonable values for these parameters?
Since you are trying to generate a supervised model, the typical approach would be to use cross validation to determine the model parameters. So you hold out some of the data as your test set, train the a model on the remaining data, and evaluate the model performance, repeating k times. You then continue to repeat with different model parameters to determine which result in the best model performance.
In the specific case of slda, I would run demo(slda) to see the author's implementation of it. When you run the demo, you'll see that he sets alpha=1.0, eta=0.1, and variance=0.25. I'd suggest using these as your starting point, and then use cross validation to determine better parameters if you need to improve model performance.

parameter C. epsilon as vector in kernlab's ksvm in R

I am trying to use ksvm function of kernlab package in R for epsilon-SVM regression. I want to put parameters C(regularization constant) and epsilon (insensitivity) as vectors(length of vector = training data length). But I am not able to figure out how to do this. Please suggest some way.
Why do you assume that you can do it? According to documentation of ksvm you can only weight classes, not particular samples. Such modification is accessible in for example sklearn python library (as samples' weights).
To artificialy implement per samples C-weights you could oversample your data. It will be very inefficient (especially if you have large differences in C values), but it can be applied to almost any SVM library.

How to use a glmnet model as a node model for the mob function (of the R package party)?

I am using the mob function of the R package party. My question concerns the model parameter of this function.
How can I define a StatModel object (from the package modeltools) - let's call it glmnetModel - so that the nodes models of the mob estimation are glmnet models (more precisely I would like to use the cv.glmnet function as the main estimation function in the fit slot of glmnetModel) ?
One difficulty is to extend correctly the reweight function (and maybe the estfun and deviance functions ?) like it is suggested here (section 2.1).
Anybody has an idea ?
NB : I have seen some extensions (for SVM : here) but I am not able to use them correctly.
Thank you very much !
Dominique
I'm not sure whether the inferential framework of the parameter instability tests in the MOB algorithm holds for either glmnet or svm.
The assumption is that the model's objective function (e.g., residual sum of squares or log-likelihood) is additive in the observations and that the corresponding first order conditions are consequently additive as well. Then, under certain weak regularity conditions, a central limit theorem holds for the parameter estimates. This can be extended to a functional central limit theorem upon which the parameter instability tests in MOB are based.
If these assumptions do not hold, the p-values may not be valid and hence may lead to too many or too few splits or a biased splitting variable selection. Whether or not this happens in practice for the models you are interested in, I don't know. You would have to check this - either theoretically (which may be hard) or in simulation studies.
Technically, the reimplementation of mob() in the partykit package makes it much easier to plug in new models. A lot less glue code (without S4 classes) is necessary now. See vignette("mob", package = "partykit") for details.

Resources