tuning svm parameters in R (linear SVM kernel) - r

what is the difference between tune.svm() and best.svm().
When we tune the parameters of svm kernel, aren't we expected to always choose the best values for our model.
Pardon as i am new to R and machine learning.
I noticed that there was no linear kernel option in tuning svm. Is there a possibility to tune my svm using a linear kernel

From ETHZ: best.svm() is really just a wrapper for tune.svm(...)$best.model. The
help page for tune() will tell you more on the available options.
Be sure to also go through the examples on the help page for tune(). e1071::svm offers linear, radial (the default), sigmoid and polynomial kernels, see help(svm). For example, to use the linear kernel the function call has to include the argument kernel = 'linear':
data(iris)
obj <- tune.svm(Species~., data = iris,
cost = 2^(2:8),
kernel = "linear")
If you are new to R and would like to train and cross validate SVM models you could also check the caret package and its train function which offers multiple types of kernels. The whole 'topics' section on that site might be of interest, too.

Related

Apply LASSO in R using glmnet package for cox model

I want to perform LASSO for cox ph model in R for variable selection.
Somewhere, I found this code and done my analysis, somewhere else I found it is for elastic net, someone please confirm I am using the right code.
lasso<- cv.glmnet(xmat, ysurv, alpha = 1, family = 'cox', nfolds = 30)
The help page for cv.glmnet() (type ?cv.glmnet in R or go through the help system in RStudio) isn't useful because the alpha parameter is passed through to glmnet().
alpha: The elasticnet mixing parameter, with 0<=alpha<= 1. The penalty is defined as
(1-alpha)/2||beta||_2^2+alpha||beta||_1.
‘alpha=1’ is the lasso penalty, and ‘alpha=0’ the ridge
penalty.
So alpha=1 is lasso (as described there), and alpha=0.95 is a mixture that is mostly lasso (L1) with a little bit of ridge (L2) mixed in.
I doubt there's much of a difference between 10-fold and 30-fold cross-validation: the reasons you might want to choose different numbers of folds are (1) computational efficiency (computation goes up with number of folds unless there is some trick for computing CV score without refitting the model, as is often the case for LOOCV) and (2) bias-variance tradeoff; see section 5.1.4 of Introduction to Statistical Learning with R.
Follow-up questions that are more statistical or data-sciencey than computational should probably go to CrossValidated.

R CRAN Neural Network Package compute vs prediction

I am using R along with the neuralnet package see docs (https://cran.r-project.org/web/packages/neuralnet/neuralnet.pdf). I have used the neural network function to build and train my model.
Now I have built my model I want to test it on real data. Could someone explain if I should use the compute or prediction function? I have read the documentation and it isnt clear, both functions seem to do similar?
Thanks
The short answer is to use compute to do predictions.
You can see an example of using compute on the test set here. We can also see that compute is the right one from the documentation:
compute, a method for objects of class nn, typically produced by neuralnet. Computes the outputs
of all neurons for specific arbitrary covariate vectors given a trained neural network.
The above says that you can use covariate vectors in order to compute the output of the neural network i.e. make a prediction.
On the other hand prediction does what is mentioned in the title in the documentation:
Summarizes the output of the neural network, the data and the fitted
values of glm objects (if available)
Moreover, it only takes two arguments: the nn object and a list of glm models so there isn't a way to pass in the test set in order to make a prediction.

Spark ML Pipeline Logistic Regression Produces Much Worse Predictions Than R GLM

I used ML PipeLine to run logistic regression models but for some reasons I got worst results than R. I have done some researches and the only post that I found that is related to this issue is this . It seems that Spark Logistic Regression returns models that minimize loss function while R glm function uses maximum likelihood. The Spark model only got 71.3% of the records right while R can predict 95.55% of the cases correctly. I was wondering if I did something wrong on the set up and if there's a way to improve the prediction. The below is my Spark code and R code-
Spark code
partial model_input
label,AGE,GENDER,Q1,Q2,Q3,Q4,Q5,DET_AGE_SQ
1.0,39,0,0,1,0,0,1,31.55709342560551
1.0,54,0,0,0,0,0,0,83.38062283737028
0.0,51,0,1,1,1,0,0,35.61591695501733
def trainModel(df: DataFrame): PipelineModel = {
val lr = new LogisticRegression().setMaxIter(100000).setTol(0.0000000000000001)
val pipeline = new Pipeline().setStages(Array(lr))
pipeline.fit(df)
}
val meta = NominalAttribute.defaultAttr.withName("label").withValues(Array("a", "b")).toMetadata
val assembler = new VectorAssembler().
setInputCols(Array("AGE","GENDER","DET_AGE_SQ",
"QA1","QA2","QA3","QA4","QA5")).
setOutputCol("features")
val model = trainModel(model_input)
val pred= model.transform(model_input)
pred.filter("label!=prediction").count
R code
lr <- model_input %>% glm(data=., formula=label~ AGE+GENDER+Q1+Q2+Q3+Q4+Q5+DET_AGE_SQ,
family=binomial)
pred <- data.frame(y=model_input$label,p=fitted(lr))
table(pred $y, pred $p>0.5)
Feel free to let me know if you need any other information. Thank you!
Edit 9/18/2015 I have tried increasing the maximum iteration and decreasing the tolerance dramatically. Unfortunately, it didn't improve the prediction. It seems the model converged to a local minimum instead of the global minimum.
It seems that Spark Logistic Regression returns models that minimize loss function while R glm function uses maximum likelihood.
Minimization of a loss function is pretty much a definition of the linear models and both glm and ml.classification.LogisticRegression are no different here. Fundamental difference between these two is the way how it is achieved.
All linear models from ML/MLlib are based on some variants of Gradient descent. Quality of the model generated using this approach vary on a case by case basis and depend on the Gradient Descent and regularization parameters.
R from the other hand computes an exact solution which, given its time complexity, is not well suited for large datasets.
As I've mentioned above quality of the model generated using GS depends on the input parameters so typical way to improve it is to perform hyperparameter optimization. Unfortunately ML version is rather limited here compared to MLlib but for starters you can increase a number of iterations.

R. How to boost the SVM model

I have made SVM model using SVM package in R for a classification problem. I got only 87% accuracy. But random forest produces around 92.4%.
fit.svm<-svm(modelformula, data=training, gamma = 0.01, cost = 1,cross=5)
Would like to use boosting for tuning this SVM model. Can someone will help me to tune this SVM model?
What are the best parameters I can provide for SVM method?
Example for booting for SVM model.
To answer your first question.
The e1071 library in R has a built-in tune() function to perform CV. This will help you select the optimal parameters cost, gamma, kernel. You can also manipulate a SVM in R with the package kernlab. You may get different results from the 2 libraries. Let me know if you need any examples.
You may want to look into the caret package. It allows you to both pick various kernels for SVM (model list) and also run parameter sweeps to find the best model.

Recursive feature elimination in 'caret' for 'randomForest': set different ntree parameter for the first forest

I am currently trying to optimize the random forest classifier for a very high-dimensional dataset (p > 200k) using recursive feature elimination (RFE). caret package has a nice implementation for doing this (rfe()-function). However, I am also thinking about optimizing RAM and CPU usage.. That's why I wonder if there is an opportunity to set different (larger) number of trees to train the first forest (without feature elimination) and to use its importances to build the remaining ones (with RFE) using for example 500 trees with 10- or 5-fold cross-validation. I know that this option is available in varSelRF.. But how about caret? I didn't manage to find anything regarding this in the manual.
You can do that. The rfFuncs list has an object called fit that defines how the model is fit. One argument to this function is called 'first' which is TRUE on the first fit (there is also a 'last' arg). You can set ntree based on this.
See the feature selection vignette for more details.
Max

Resources