All Vs All classification with kernlab R [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I could not find any documentation on how to perform All vs All multi-class classification with kernlab package in R. Any kind of help would be appreciated.

Well apparently the ksvm function of the package does it automatically as it says here .
This is how to use (I quote from the link above):
svp <- ksvm(xtrain,ytrain,type="C-svc",kernel=’vanilladot’,C=100,scaled=c())
And this is the comment below:
"Question 12
Test the ability of a SVM to predict the class of the disease from gene expression. Check the influence of the parameters.
Finally, we may want to predict the type and stage of the diseases. We are then confronted with a multi-class problem, since the variable to predict can take more than two values:
y <- ALL$BT
print(y)
Fortunatelly, kernlab implements automatically multi-class SVM by an all-versus-all strategy to combine several binary SVM."

Related

Calculating AWE from mclust package [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Is it possible to calculate the Approximate Weight of Evidence (AWE) from information obtained via the mclust R package?
According to R documentation, you should have access to function awe(tree, data) since version R1.1.7.
From the example on the linked page (in case of broken link),
data(iris)
iris.m _ iris[,1:4]
awe.val <- awe(mhtree(iris.m), iris.m)
plot(awe.val)
Following the formula from Banfield, J. and Raftery, A. (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics, 49, 803-821. -2*model$loglik + model$d*(log(model$n)+1.5) Where model represents the model with number of cluster solutions selected. Keeping this question in the hope that it may help someone in the future.

R influence variables on y target [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
One task of Machine Learning / Data Science is making predictions. But, I want to get more insights in the variables of my model.
To get more insights, I tried different methods:
Logistic Regression (The output provides some 'insights' in the influence of the different variables, see: Checking interpretation of GLM summary in R)
The xgb.plot.importance function applied on a Boosting Tree, see picture below (applied on the Titanic Data Set).
And I saw a great article (but unfortunately, it is not working) how to explain a boosting tree (see: https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211).
My question: are there other methods to give yourself (or even better: the business) more insights about which variables have a influence on the target variable? And of course: is the influence positive/negative and how big is it?
You could also try to use lasso regression (https://stats.stackexchange.com/questions/17251/what-is-the-lasso-in-regression-analysis), which basically selects the variables that influence the response variable mostly.
The glmnet package provides support for this type of regression.

stepwise regression using caret in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have used leaps package in R to perform forward and backward feature elimination. However, I want automate the cross validation and prediction operations. Therefore, how can I use forward/backward selection in caret?
in leaps package you could do it this way
forward <- regsubsets(x ~ ., data, nvmax = 20,
method = "forward")
You should be able to run a stepwise regression in caret::train() with method=glmStepAIC from the MASS package. For details, see the list of models supported by caret on the caret documentation website.
The caret test cases for this model are accessible on the caret GitHub repository.

Sampling weights for subpopulations in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm working with a large, national survey that was collected using complex survey methods. As such, I'm needing to account for sample weights and other survey design features (e.g., sampling strata). I'm new to this methodology, so apologies if the answers here are obvious.
I've had success running path analysis models using the 'lavaan' package paired with the 'lavaan.survey' package. However, some of my models involve only a subset of the data (e.g., only female participants).
How can I adjust the sample weights to reflect the fact that I am only analyzing a subsample (e.g., females)?
The subset() function in the survey package handles subpopulations correctly, and since lavaan.survey uses the survey package to get the basic standard errors for the population covariance matrix, it should all flow through properly.

gblinear xgboost in R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Let's say that a data has both numeric & catagoricial feature, and I've created a xgboost model by using gblinear. I've analyzed the xgboost model with xgb.importance, then how can I express categorical variable weights?
While XGBoost is considered to be a black box model, you can understand the feature importance (for both categorical and numeric) by averaging the gain of each feature for all split and all trees.
This is represented in the graph below.
# Get the feature real names
names <- dimnames(trainMatrix)[[2]]
# Compute feature importance matrix
importance_matrix <- xgb.importance(names, model = bst)
# Nice graph
xgb.plot.importance(importance_matrix[1:10,])
In the feature importance above, we can see the first 10 most important features.
This function gives a color to each bar. Basically a K-means clustering is applied to group each feature by importance.
Alternately, this could be represented in a tree diagram (see the link above).

Resources