Implement multi class classification using SVM in R - r

I am trying to implement Multi class classification using SVM under e1071 package in R language. I read in a similar thread that SVM handles one vs one classifier by itself in the back end. Is it true.
Also, if I want to execute One vs Rest classifier, how to do it. And, while printing the summary of SVM model, it doesnt show anywhere that it used One vs One classifier. How to confirm that.

Found the answer to my query above. I implemented one vs rest classifier by building binary classifiers on iris data present by default in R. It has 3 classes. So, I built 3 binary classifiers. Below is the code:
data(iris)
head(iris)
table(iris$Species)
nrow(iris)
index_iris<-sample.split(iris$Species,SplitRatio=.7)
trainset_iris<-iris[index_iris==TRUE,]
testset_iris<-iris[index_iris==FALSE,]
train_setosa<-trainset_iris
train_setosa$Species<-as.character(train_setosa$Species)
train_setosa$Species[train_setosa$Species!="setosa"]<-'0'
train_setosa$Species[train_setosa$Species=="setosa"]<-'1'
train_setosa$Species<-as.integer(train_setosa$Species)
tune_setosa<-tune.svm(Species~.,data=train_setosa,gamma=10^(-6:-1),cost=10^(-1:1))
summary(tune_setosa)
model_setosa<-svm(Species~.,data=train_setosa,kernel="radial",gamma=.1,cost=10,scale=TRUE,probabilities=TRUE,na.action=na.omit)
summary(model_setosa)
predict_setosa<-predict(model_setosa,testset_iris[,-5])
tab_setosa<-table(predict_setosa,testset_iris[,5])
tab_setosa
train_versicolor<-trainset_iris
train_versicolor$Species<-as.character(train_versicolor$Species)
train_versicolor$Species[train_versicolor$Species!="versicolor"]<-0
train_versicolor$Species[train_versicolor$Species=="versicolor"]<-1
train_versicolor$Species<-as.integer(train_versicolor$Species)
tune_versicolor<-tune.svm(Species~.,data=train_versicolor,gamma=10^(-6:-1),cost=10^(-1:1))
summary(tune_versicolor)
model_versicolor<-svm(Species~.,data=train_versicolor,kernel="radial",gamma=.1,cost=10,scale=TRUE,probabilities=TRUE,na.action=na.omit)
summary(model_versicolor)
predict_versicolor<-predict(model_versicolor,testset_iris[,-5])
tab_versicolor<-table(predict_versicolor,testset_iris[,5])
tab_versicolor
train_virginica<-trainset_iris
train_virginica$Species<-as.character(train_virginica$Species)
train_virginica$Species[train_virginica$Species!="virginica"]<-0
train_virginica$Species[train_virginica$Species=="virginica"]<-1
train_virginica$Species<-as.integer(train_virginica$Species)
tune_virginica<-tune.svm(Species~.,data=train_virginica,gamma=10^(-6:-1),cost=10^(-1:1))
summary(tune_virginica)
model_virginica<-svm(Species~.,data=train_virginica,kernel="radial",gamma=.1,cost=10,scale=TRUE,probabilities=TRUE,na.action=na.omit)
summary(model_virginica)
predict_virginica<-predict(model_virginica,testset_iris[,-5])
tab_virginica<-table(predict_virginica,testset_iris[,5])
tab_virginica
bind<-cbind(predict_setosa,predict_versicolor,predict_virginica)
classnames = c('setosa', 'versicolor', 'virginica')
a<-apply(bind,1,classnames[which.max])
b<-cbind(bind,a)
table(b[,4],testset_iris$Species)
But, when I compared the confusion matrix of this result with confusion matrix of the result which used One vs One classifier (by default in radial kernel), One vs One gave better result. I believe that happened since there are only 3 classes in this case and One vs Rest works well when classes are large in number.

Related

Decisional boundary SVM caret (R)

I have built an SVM-RBF model in R using Caret. Is there a way of plotting the decisional boundary?
I know it is possible to do so by using other R packages but unfortunately I’m forced to use the Caret package because this is the only package I found that allows me to calculate the variables importance.
In alternative, can you suggest a package that allows to plot the decision boundaries AND gives also the vars importance?
Thank you very much
First of all, unlike other methods, SVM does not produce feature importance. In your case, the importance score caret reports is calculated independent of the method itself: https://topepo.github.io/caret/variable-importance.html#model-independent-metrics
Second, the decision boundary (or hyperplane) you see in most textbook example is based on a toy problem with only two or three features. If you have more than three features, it is not trivial to visualize this hyperplane.

How to retrain model using old model + new data chunk in R?

I'm currently working on trust prediction in social networks - from obvious reasons I model this problem as data stream. What I want to do is to "update" my trained model using old model + new chunk of data stream. Classifiers that I am using are SVM, NB (e1071 implementation), neural network (nnet) and C5.0 decision tree.
Sidenote: I know that this solution is possible using RMOA package by defining "model" argument in trainMOA function, but I don't think I can use it with those classifiers implementations (if I am wrong please correct me).
According to strange SO rules, I can't post it as comment, so be it.
Classifiers that you've listed need full data set at the time you train a model, so whenever new data comes in, you should combine it with previous data and retrain the model. What you are probably looking for is online machine learning. One of the very popular implementations is Vowpal Wabbit, it also has bindings to R.

Weighting class in machine learning task

I'm trying out a machine learning task (binary classification) using caret and was wondering if there is a way to incorporate information about "uncertain" class, or to weight the classes differently.
As an illustration, I've cut and paste some of the code from the caret homepage working with the Sonar dataset (placeholder code - could be anything):
library(mlbench)
testdat <- get(data(Sonar))
set.seed(946)
testdat$Source<-as.factor(sample(c(LETTERS[1:6],LETTERS[1:3]),nrow(testdat),replace = T))
yielding:
summary(testdat$Source)
A B C D E F
49 51 44 17 28 19
after which I would continue with a typical train,tune, and test routine once I decide on a model.
What I've added here is another factor column of a source, or where the corresponding "Class" came from. As an arbitrary example, say these were 6 different people who made their designation of "Class" using slightly different methods and I want to put greater importance on A's classification method than B's but less than C's and so forth.
The actual data are something like this, where there are class imbalances, both among the true/false, M/R, or whatever class, and among these Sources. From the vignettes and examples I have found, at least the former I would address by using a metric like ROC during tuning, but as to how to even incorporate the latter, I'm not sure.
separating the original data by Source and cycling through the factor
levels one at a time, using the current level to build a model and
the rest of the data to test it
instead of classification, turn it into a hybrid classification/regression problem, where I use the ranks of the sources as what I want to model. If A is considered best, then an "A positive" would get a score of +6, "A negative", a score of -6 and so on. Then perform a regression fit on these values, ignoring the Class column.
Any thoughts? Every search I conduct on classes and weights seems to reference the class imbalance issue, but assumes that the classification itself is perfect (or a standard on which to model). Is it even inappropriate to try to incorporate that information and I should just include everything and ignore the source? A potential issue with the first plan is that the smaller sources account for around a few hundred instances, versus over 10,000 for the larger sources, so I might also be concerned that a model built on a smaller set wouldn't generalize as well as one based on more data. Any thoughts would be appreciated.
There is no difference between weighting "because of importance" and weighting "because imbalance". These are exactly the same settings, they both refer to "how strongly should I penalize model for missclassifing sample from a particular class". Thus you do not need any regression (and should not do so! this is perfectly well stated classification problem, and you are simply overthinking it) but just providing samples weights, thats all. There are many models in caret accepting this kind of setting, including glmnet, glm, cforest etc. if you want to use svm you should change package (as ksvm does not support such things) for example to https://cran.r-project.org/web/packages/gmum.r/gmum.r.pdf (for sample or class weighting) or https://cran.r-project.org/web/packages/e1071/e1071.pdf (if it is class weighting)

R - How to get one "summary" prediction map instead for 5 when using 5-fold cross-validation in maxent model?

I hope I have come to the right forum. I'm an ecologist making species distribution models using the maxent (version 3.3.3, http://www.cs.princeton.edu/~schapire/maxent/) function in R, through the dismo package. I have used the argument "replicates = 5" which tells maxent to do a 5-fold cross-validation. When running maxent from the maxent.jar file directly (the maxent software), an html file with statistics will be made, including the prediction maps. In R, an html file is also made, but the prediction maps have to be extracted afterwards, using the function "predict" in the dismo package in r. When I do this, I get 5 maps, due to the 5-fold cross-validation setting. However, (and this is the problem) I want only one output map, one "summary" prediction map. I assume this is possible, although I don't know how maxent computes it. The maxent tutorial (see link above) says that:
"...you may want to avoid eating up disk space by turning off the “write output grids” option, which will suppress writing of output grids for the replicate runs, so that you only get the summary statistics grids (avg, stderr etc.)."
A list of arguments that can be put into R is found in this forum https://groups.google.com/forum/#!topic/maxent/yRBlvZ1_9rQ.
I have tried to use the argument "outputgrids=FALSE" both in the maxent function itself, and in the predict function, but it doesn't work. I still get 5 maps, even though I don't get any errors in R.
So my question is: How do I get one "summary" prediction map instead of the five prediction maps that results from the cross-validation?
I hope someone can help me with this, I am really stuck and haven't found any answers anywhere on the internet. Not even a discussion about this. Hope my question is clear. This is the R-script that I use:
model1<-maxent(x=predvars, p=presence_points, a=target_group_absence, path="//home//...//model1", args=c("replicates=5", "outputgrids=FALSE"))
model1map<-predict(model1, predvars, filename="//home//...//model1map.tif", outputgrids=FALSE)
Best regards,
Kristin
Sorry to be the bearer of bad news, but based on the source code, it looks like Dismo's predict function does not have the ability to generate a summary map.
Nitty-gritty details for those who care: When you call maxent with replicates set to something greater than 1, the maxent function returns a MaxEntReplicates object, rather than a normal MaxEnt object. When predict receives a MaxEntReplicates object, it just iterates through all of the models that it contains and calls predict on them individually.
So, what next? Fortunately, all is not lost! The reason that Dismo doesn't have this functionality is that for most kinds of model-building, there isn't actually a valid way to average parameters across your cross-validation models. I don't want to go so far as to say that that's definitely the case for MaxEnt specifically, but I suspect it is. As such, cross-validation is usually used more as a way of checking that your model building methodology works for your data than as a way of building your model directly (see this question for further discussion of that point). After verifying via cross-validation that models built using a given procedure seem to be accurate for the phenomenon you're modelling, it's customary to build a final model using all of your data. In theory this new model should only be better than models trained on a subset of your data.
So basically, assuming your cross-validated models look reasonable, you can run MaxEnt again with only one replicate. Your final result will be a model accuracy estimate based on the cross-validation and a map based on the second run with all of your data lumped together. Depending on what exactly your question is, there might be other useful summary statistics from the cross-validation that you want to use, but those are all things you've already seen in the html output.
I may have found this a couple of years later. But you could do something like this:
xm <- maxent(predictors, pres_train) # basically the maxent model
px <- predict(predictors, xm, ext=ext, progress= '' ) #prediction
px2 <- predict(predictors, xm2, ext=ext, progress= '' ) #prediction #02
models <- stack(px,px2) # create a stack of prediction from all the models
final_map <- mean(px,px2) # Take a mean of all the prediction
plot(final_map) #plot the averaged map
xm1,xm2,.. would be the maxent models for each partitions in cross-validation, and px, px2,.. would be the predicted maps.

Differences in scoring from PMML model on different platforms

I've built a toy Random Forest model in R (using the German Credit dataset from the caret package), exported it in PMML 4.0 and deployed onto Hadoop, using the Cascading Pattern library.
I've run into an issue where Cascading Pattern scores the same data differently (in a binary classification problem) than the same model in R. Out of 200 observations, 2 are scored differently.
Why is this? Could it be due to a difference in the implementation of Random Forests?
The German Credit dataset represents a classification-type problem. The winning score of a classification-type RF model is simply the class label that was the most frequent among member decision trees.
Suppose you have RF model with 100 decision trees, and 50 decision trees predict "good credit" and another 50 decision trees predict "bad credit". It is possible that R and Cascading Pattern resolve such tie situations differently - one picks the score that is seen first and the other picks the score that is seen last. You could try re-training your RF model with odd number of member decision trees (ie. use some value that is not divisible by two, such as 99 or 101).
The PMML specification tells to return the score that was seen first. I'm not sure if Cascading Pattern pays any attention to such details. You may want to try out an alternative solution called JPMML-Cascading.
Score matching is a big deal. When a model is moved from the scientist's desktop to the production IT deployment environment, the scores need to match. For a classification task, that also includes the probabilities of all target categories. There is sometimes a problem of precision between different implementations/platforms which can result in minimal differences (really minimal). In any case, they also need to be checked.
Obviously, it could also be the case that the model was not represented correctly in PMML ... unlikely with the R PMML Package. The other option is that the model is not deployed correctly. That is, the scoring engine cascading is using is not interpreting the PMML file properly.
PMML itself has a model element called ModelVerification that allows for a PMML file to contain scored data which can then be used for score matching. This is useful but not necessary since you should be able to score an already scored dataset and compared computed with expected results which you are already doing.
For more on model verification and score matching as well as error handling in PMML, check:
https://support.zementis.com/entries/21207918-Verifying-your-model-in-ADAPA-did-it-upload-correctly-

Resources