r gradient boost wrapper for homebrew functions - r

"Gradient boosting", like AdaBoost is built for any weak learner. In that sense a "gradient boosting machine" might be thought of as a generic wrapper, and not necessarily a wrapper for a classification and regression tree (CART) model. When I say "CART", and am referring to the regression, I am looking for something outside a constant (mean) or a linear or generalized linear fit.
Are there any R libraries that allow me to wrap functions into sequential gradient boosting? I'm not looking at "better vs. worse", I'm looking for "minimally capable".
I know that there are many that allow me to make a long sequence of gradient boosted CART models and I am not looking for those.
Libraries (afaict) that can't do what I want:
gbm - CART
xgboost - CART
GAMboost - B-splines
Notes:
I'm still looking and will report results that I find.
If my question has the wrong tags, and you think of one that is more apropos, please feel free to share.

Related

mlr3 optimized average of ensemble

I try to optimize the averaged prediction of two logistic regressions in a classification task using a superlearner.
My measure of interest is classif.auc
The mlr3 help file tells me (?mlr_learners_avg)
Predictions are averaged using weights (in order of appearance in the
data) which are optimized using nonlinear optimization from the
package "nloptr" for a measure provided in measure (defaults to
classif.acc for LearnerClassifAvg and regr.mse for LearnerRegrAvg).
Learned weights can be obtained from $model. Using non-linear
optimization is implemented in the SuperLearner R package. For a more
detailed analysis the reader is referred to LeDell (2015).
I have two questions regarding this information:
When I look at the source code I think LearnerClassifAvg$new() defaults to "classif.ce", is that true?
I think I could set it to classif.auc with param_set$values <- list(measure="classif.auc",optimizer="nloptr",log_level="warn")
The help file refers to the SuperLearner package and LeDell 2015. As I understand it correctly, the proposed "AUC-Maximizing Ensembles through Metalearning" solution from the paper above is, however, not impelemented in mlr3? Or do I miss something? Could this solution be applied in mlr3? In the mlr3 book I found a paragraph regarding calling an external optimization function, would that be possible for SuperLearner?
As far as I understand it, LeDell2015 proposes and evaluate a general strategy that optimizes AUC as a black-box function by learning optimal weights. They do not really propose a best strategy or any concrete defaults so I looked into the defaults of the SuperLearner package's AUC optimization strategy.
Assuming I understood the paper correctly:
The LearnerClassifAvg basically implements what is proposed in LeDell2015 namely, it optimizes the weights for any metric using non-linear optimization. LeDell2015 focus on the special case of optimizing AUC. As you rightly pointed out, by setting the measure to "classif.auc" you get a meta-learner that optimizes AUC. The default with respect to which optimization routine is used deviates between mlr3pipelines and the SuperLearner package, where we use NLOPT_LN_COBYLA and SuperLearner ... uses the Nelder-Mead method via the optim function to minimize rank loss (from the documentation).
So in order to get exactly the same behaviour, you would need to implement a Nelder-Mead bbotk::Optimizer similar to here that simply wraps stats::optim with method Nelder-Mead and carefully compare settings and stopping criteria. I am fairly confident that NLOPT_LN_COBYLA delivers somewhat comparable results, LeDell2015 has a comparison of the different optimizers for further reference.
Thanks for spotting the error in the documentation. I agree, that the description is a little unclear and I will try to improve this!

R: Evaluate Gradient Boosting Machines (GBM) for Regression

Which are the best metrics to evaluate the fit of a GBM algorithm in R (metrics, graphs, ratios)? And how interpret them?
I think maybe you are overthinking this one! Take a step back and think about what matters... the error. You have forecasted values and you have observed values. the difference tells you most of what you need to know when comparing across models. Basic measures like MSE, MPE, etc. should do fine. If you are looking to refine within a given model, I would recommend taking a look at the gbm documentation. For example, you can pass your gbm model object to summary(), to get the relative influence of each of your variables. Additionally, you can find a lot of information in the documentation, so if you haven't taken a look, I would recommend doing so! I have posted the link at the bottom.
-Carmine
gbm_documentation

Is it possible to build a random forest with model based trees i.e., `mob()` in partykit package

I'm trying to build a random forest using model based regression trees in partykit package. I have built a model based tree using mob() function with a user defined fit() function which returns an object at the terminal node.
In partykit there is cforest() which uses only ctree() type trees. I want to know if it is possible to modify cforest() or write a new function which builds random forests from model based trees which returns objects at the terminal node. I want to use the objects in the terminal node for predictions. Any help is much appreciated. Thank you in advance.
Edit: The tree I have built is similar to the one here -> https://stackoverflow.com/a/37059827/14168775
How do I build a random forest using a tree similar to the one in above answer?
At the moment, there is no canned solution for general model-based forests using mob() although most of the building blocks are available. However, we are currently reimplementing the backend of mob() so that we can leverage the infrastructure underlying cforest() more easily. Also, mob() is quite a bit slower than ctree() which is somewhat inconvenient in learning forests.
The best alternative, currently, is to use cforest() with a custom ytrafo. These can also accomodate model-based transformations, very much like the scores in mob(). In fact, in many situations ctree() and mob() yield very similar results when provided with the same score function as the transformation.
A worked example is available in this conference presentation:
Heidi Seibold, Achim Zeileis, Torsten Hothorn (2017).
"Individual Treatment Effect Prediction Using Model-Based Random Forests."
Presented at Workshop "Psychoco 2017 - International Workshop on Psychometric Computing",
WU Wirtschaftsuniversität Wien, Austria.
URL https://eeecon.uibk.ac.at/~zeileis/papers/Psychoco-2017.pdf
The special case of model-based random forests for individual treatment effect prediction was also implemented in a dedicated package model4you that uses the approach from the presentation above and is available from CRAN. See also:
Heidi Seibold, Achim Zeileis, Torsten Hothorn (2019).
"model4you: An R Package for Personalised Treatment Effect Estimation."
Journal of Open Research Software, 7(17), 1-6.
doi:10.5334/jors.219

Comparison of Different Types of Nonlinear Regression Models

Thank you for seeing this post.
Various regression models are being applied to the curve estimating (actual measured ventilation rate).
Comparison was made using the GLM and GAM models including polynomial regression. I use R.
Are there any other types of regression models that can simulate that curve well?
like...bayesian? (In fact, I didn't even understand if it could be applied here)
Sincerely.
loads of non linear methods exist, and many would give similar results, but this is a statistics question. it would fit better on cross validated. also, you have to tell: do you want to do interpolation, even extrapolation? what is your analysis for?
bayesian methods are used when you have knowledge of the phenomenon prior to data, or in some cases when you want to apply regularization or graphical models to data generation processes, I think you should better leave out bayesian statistics here.
edit:
to be short: if you want to obtain a readable formulation of the curve, and you don't have any specific mathematical model in mind, go for a polynomial fit. Other popular choices, which are better for only plotting the curve, instead of reporting it in a mathematical expression, are smoothing splines and LOESS. for further details, maybe ask a new question on stats.stackexchange.com, after studing better the alternatives.

Weights argument in R gbm function

What is the weights argument for in the R gbm function? Does it implement cost-sensitive stochastic gradient boosting?
You may have already read this, but the documentation says that the weights parameter is defined in this way:
an optional vector of weights to be used in the fitting process. Must
be positive but do not need to be normalized. If keep.data=FALSE in
the initial call to gbm then it is the user’s responsibility to
resupply the weights to gbm.more.
Thus my interpretation would be that they are standard observation weights as in any statistical model.
Is it cost-sensitive? Good question. I first noticed that one of the main citations for the package is:
B. Kriegler (2007). Cost-Sensitive Stochastic Gradient Boosting Within a Quantitative Regression Framework.
so I figured it does imply cost-sensitivity, but there's not an explicit use of that term in the vignette, so if it was not apparent.
I did a little bit of a deeper dive though and found some more resources. You can find the equations describing the weights towards the end of this article which describes the package.
I also found this question being asked way back in 2009 in a mailing list, and while there was no response, I finally found a a scholarly article discussing the use of gbm and other R packages for cost-sensitive gradient boosting.
The conclusion is that gbm's quantile loss function is differentiable and can be used in cost-sensitive applications wherein over/under-estimation have different error costs, however other quantitative loss functions (aside from quantile) may be necessary/appropriate in some applications of cost-sensitive gradient boosting.
That paper is centered around gbm but also discusses other packages and if your focus is on cost-sensitive gradient boosting then you may want to look at the others they mention in the paper as well.

Resources