Chain Classifiers in R - r

Is there any way to perform chain classification in multi-label classification problem. I have created a binary relevance model using mlr package which uses learners to achieve the same. But all the classification models in binary relevance are independent of each other and does not take into consideration the inter-dependencies of variables.
It would be really helpful if I can perform chain classification along with binary relevance method to improve my model.

We have multilabel classification with other algorithms like Classifier Chains now available in mlr, checkout the updated tutorial: http://mlr-org.github.io/mlr-tutorial/release/html/multilabel/index.html

Related

Is XGBoost effective for variable selection?

I have understood the use of XGBoost, I got it this was an amateur question
Can XGBoost be used for variable elimination & selection purpose like LASSO, or we need to use LASSO first to eliminate variables & then use XGBoost finally to get prediction?
XGBoost is quite effective for prediction in the presence of redundant variables (features). As underlying gradient boosting algorithm itself is robust to multi-collinearity.
But it is highly recommended to remove (engineer) any redundant features from any dataset used for training for any algorithm of choice (whether LASSO or XGBoost).
Additionally you can combine those two method using Ensemble learning.
xgboost has built-in regularization(Like LASSO) method when you training.

Adaboosting in R with any classifier

There is an implementation of AdaBoosting algorithm in R. See this link, the function is called boosting.
The problem is that this package uses classification trees as a base or weak learner.
Is that possible to substitute the original weak learner to any other (e.g., SVM or Neural Networks) using this package?
If not, are there any examples of AdaBoosting implementation in R?
Many thanks!

Boosted trees and Variable Interactions in R

How can one see in a Boosted trees classification model for machine learning (adaboost), which variables interact with each other and how much? I would like to make use of this in R gbm package if possible.
To extract the interaction between input variables, you can use any package like lm. http://www.r-bloggers.com/r-tutorial-series-regression-with-interaction-variables/
You can use ?interact.gbm. See also this cross-validated question, which directs to a vignette of a related technique from the package dismo.
In general, these interactions may not necessarily agree with the interaction terms estimated in a linear model.

R. How to boost the SVM model

I have made SVM model using SVM package in R for a classification problem. I got only 87% accuracy. But random forest produces around 92.4%.
fit.svm<-svm(modelformula, data=training, gamma = 0.01, cost = 1,cross=5)
Would like to use boosting for tuning this SVM model. Can someone will help me to tune this SVM model?
What are the best parameters I can provide for SVM method?
Example for booting for SVM model.
To answer your first question.
The e1071 library in R has a built-in tune() function to perform CV. This will help you select the optimal parameters cost, gamma, kernel. You can also manipulate a SVM in R with the package kernlab. You may get different results from the 2 libraries. Let me know if you need any examples.
You may want to look into the caret package. It allows you to both pick various kernels for SVM (model list) and also run parameter sweeps to find the best model.

prediction intervals with caret

I've been using the caret package in R to run some boosted regression tree and random forest models and am hoping to generate prediction intervals for a set of new cases using the inbuilt cross-validation routine.
The trainControl function allows you to save the hold-out predictions at each of the n-folds, but I'm wondering whether unknown cases can also be predicted at each fold using the built-in functions, or whether I need to use a separate loop to build the models n-times.
Any advice much appreciated
Check the R package quantregForest, available at CRAN. It can easily calculate prediction intervals for random forest models. There's a nice paper by the author of the package, explaining the backgrounds of the method. (Sorry, I can't say anything about prediction intervals for BRT models; I'm looking for them by myself...)

Resources