I am training a SVM classifier. Right now, I have about 4000 features, but a lot of them are redundant/uninformative. I want to reduce the features in the model to about maybe 20-50. I would like to use greedy hill climbing, reducing the features by 1 each time.
The removed feature should be the least important feature. After training an SVM, how do I get the ranking of the importance of the features? If I am using libsvm in R, how do I get the weight of each feature, or some other similar type of indicator of importance? Thanks!
I would reduce the dimensionality of the problem first using PCA (Principal Component Analysis), then apply SVM. See, e.g., Andrew Ng's lecture videos
Related
I have built an SVM-RBF model in R using Caret. Is there a way of plotting the decisional boundary?
I know it is possible to do so by using other R packages but unfortunately I’m forced to use the Caret package because this is the only package I found that allows me to calculate the variables importance.
In alternative, can you suggest a package that allows to plot the decision boundaries AND gives also the vars importance?
Thank you very much
First of all, unlike other methods, SVM does not produce feature importance. In your case, the importance score caret reports is calculated independent of the method itself: https://topepo.github.io/caret/variable-importance.html#model-independent-metrics
Second, the decision boundary (or hyperplane) you see in most textbook example is based on a toy problem with only two or three features. If you have more than three features, it is not trivial to visualize this hyperplane.
I have a dataset containing 13 features and a column which represents the class.
I want to do a binary classification based on the features, but I am using a method which can work only with 2 features. So I need to reduce the features to 2 columns.
My problem is that some of my features are real valued like age, heart rate and blood pressure and some of them are categorical like type of the chest pain etc.
Which method of dimensionality reduction suits my work?
Is PCA a good choie?
If so, how can I use PCA for my categorical features?
I work with R.
you can just code the categorical features to number, for example, 1 represent cat, 2 represent dog, and so on.
PCA is a useful feature selection method, but it is used for linear data, you can just try it and see the result. kernel PCA is used for nonlinear data, you can also try this.
other method contain LLE, ISOMAP,CCA,LDA... you can just try those methods and find a better result.
Check H2O library for GLRM models (link to docs). It can handle categorical variables.
If that does not work for you, target encoding techniques could be useful before applying PCA.
You can try using CatBoost (https://catboost.ai, https://github.com/catboost/catboost) - a new gradient boosting library with good handling of categorical features.
First of all, sorry about my english, I am brazilian and I am improving it yet.
I have a hierarchical dataset which I used to use to create flat classification models (NaiveBayes, JRip, J48, SVM)...
For example:
> model<-svm(family~.,data=train)
> pred <-predict(model, test[,-ncol(test)])
And then I calculated Precision, Recall and F-measure, ignoring the fact that the dataset is organized hierarchically.
However, now I want to explore the fact that it is hierarchical and obtain different models and results. So what should I do? Considering the same ML algorithms (NaiveBayes, JRip, J48, SVM), how do I create the models? Should I change or include new parameters? Or should I continue as shown in the code before, and just use hierarchical Precision, hierarchical Recall and hierarchical F-measure as evaluation metrics? If so, is there any specific package?
Thanks!
Is it possible to train Hidden Markov Model in R?
I have a set of observations with its corresponding labels. And I need to train HMM in order to get the Markov parameters (i.e. the transition probabilities matrix, emission probabilities matrix and initial distribution). So, I can predict for the future observations.
In other words, I need the opposite of Forward_Backward Algorithm..
Yes, you can. R is a good tool for simulation and statistical analysis. There are many nice packages available. You do not need to implement at all (of course you can), just learn to use them.
An example of using package.
An example of implementing HMM is here. Here DNA sequence is modeled using HMM.
Similar question is asked here as well.
Let Y be a binary variable.
If we use logistic regression for modeling, then we can use cv.glm for cross validation and there we can specify the cost function in the cost argument. By specifying the cost function, we can assign different unit costs to different types of errors:predicted Yes|reference is No or predicted No|reference is Yes.
I am wondering if I could achieve the same in SVM. In other words, is there a way for me to specify a cost(loss) function instead of using built-in loss function?
Besides the Answer by Yueguoguo, there is also three more solutions, the standard Wrapper approach, hyperplane tuning and the one in e1017.
The Wrapper approach (available out of the box for example in weka) is applicable to almost all classifiers. The idea is to over- or undersample the data in accordance with the misclassification costs. The learned model if trained to optimise accuracy is optimal under the costs.
The second idea is frequently used in textminining. The classification is svm's are derived from distance to the hyperplane. For linear separable problems this distance is {1,-1} for the support vectors. The classification of a new example is then basically, whether the distance is positive or negative. However, one can also shift this distance and not make the decision and 0 but move it for example towards 0.8. That way the classifications are shifted in one or the other direction, while the general shape of the data is not altered.
Finally, some machine learning toolkits have a build in parameter for class specific costs like class.weights in the e1017 implementation. the name is due to the fact that the term cost is pre-occupied.
The loss function for SVM hyperplane parameters is automatically tuned thanks to the beautiful theoretical foundation of the algorithm. SVM applies cross-validation for tuning hyperparameters. Say, an RBF kernel is used, cross validation is to select the optimal combination of C (cost) and gamma (kernel parameter) for the best performance, measured by certain metrics (e.g., mean squared error). In e1071, the performance can be obtained by using tune method, where the range of hyperparameters as well as attribute of cross-validation (i.e., 5-, 10- or more fold cross validation) can be specified.
To obtain comparative cross-validation results by using Area-Under-Curve type of error measurement, one can train different models with different hyperparameter configurations and then validate the model against sets of pre-labelled data.
Hope the answer helps.