Error in plot, formula missing when using svm - r

I am trying to plot my svm model.
library(foreign)
library(e1071)
x <- read.arff("contact-lenses.arff")
#alt: x <- read.arff("http://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/contact-lenses.arff")
model <- svm(`contact-lenses` ~ . , data = x, type = "C-classification", kernel = "linear")
The contact lens arff is the inbuilt data file in weka.
However, now i run into an error trying to plot the model.
plot(model, x)
Error in plot.svm(model, x) : missing formula.

The problem is that in in your model, you have multiple covariates. The plot() will only run automatically if your data= argument has exactly three columns (one of which is a response). For example, in the ?plot.svm help page, you can call
data(cats, package = "MASS")
m1 <- svm(Sex~., data = cats)
plot(m1, cats)
So since you can only show two dimensions on a plot, you need to specify what you want to use for x and y when you have more than one to choose from
cplus<-cats
cplus$Oth<-rnorm(nrow(cplus))
m2 <- svm(Sex~., data = cplus)
plot(m2, cplus) #error
plot(m2, cplus, Bwt~Hwt) #Ok
plot(m2, cplus, Hwt~Oth) #Ok
So that's why you're getting the "Missing Formula" error.
There is another catch as well. The plot.svm will only plot continuous variables along the x and y axes. The contact-lenses data.frame has only categorical variables. The plot.svm function simply does not support this as far as I can tell. You'll have to decide how you want to summarize that information in your own visualization.

Related

How can I load a library in R to call it from Excel with bert-toolkit?

Bert-toolkit is a very nice package to call R functions from Excel. See: https://bert-toolkit.com/
I have used bert-toolkit to call a fitted neuralnet (avNNnet fitted with Caret) within a wrapper function in R from Excel VBA. This runs perfect. This is the code to load the model within the wrapper function in bert-toolkit:
load("D:/my_model_avNNet.rda")
neuraln <- function(x1,x2,x3){
xx <- data.frame(x1,x2,x3)
z <- predict(my_model_avNNET, xx)
z
}
Currently I tried to do this with a fitted GAM (fitted with package mgcv). Although I do not succeed. If I call the fitted GAM from Excel VBA it gives error 2015. If I call the fitted GAM from a cell it gives #VALUE! At the same time the correct outcome of the calculation is shown in the bert-console!
This is the code to load the model in the wrapperfunction in bert-toolkit:
library(mgcv)
load("D:/gam_y_model.rda")
testfunction <- function(k1,k2){
z <- predict(gam_y, data.frame(x = k1, x2 = k2))
print (z)
}
The difference between the avNNnet-model (Caret) and the GAM-model (mgcv) is that the avNNnet-model does NOT need the Caret library to be loaded to generate a prediction, while the GAM-model DOES need the mgcv library to be loaded.
It seems to be not sufficient to load the mgvc-library in the script with the GAM-model which loads the GAM-model in a wrapper function in bert-toolkit, as I did in the code above. Although the correct outcome of the model is shown in the bert-console. It does not generate the correct outcome in Excel.
I wonder how this is possible and can be solved. It seems to me that maybe there are two instances of R running in bert-toolkit.
How can I load the the mgcv-library in such a way that it can be used by the GAM-model within the function called from Excel?
This is some example code to fit the GAM with mgcv and save to model (after running this code the model can uploaded in bert-toolkit with the code above) :
library(mgcv)
# construct some sample data:
x <- seq(0, pi * 2, 0.1)
x2 <- seq(0, pi * 20, 1)
sin_x <- sin(x)
tan_x2 <- tan(x2)
y <- sin_x + rnorm(n = length(x), mean = 0, sd = sd(sin_x / 2))
Sample_data <- data.frame(y,x,x2)
# fit gam:
gam_y <- gam(y ~ s(x) + s(x2), method = "REML")
# Make predictions with the fitted model:
x_new <- seq(0, max(x), length.out = 100)
x2_new <- seq(0, max(x2), length.out = 100)
y_pred <- predict(gam_y, data.frame(x = x_new, x2 = x2_new))
# save model, to load it later in bert-toolkit:
setwd("D:/")
save(gam_y, file = "gam_y_model.rda")
One of R's signatures is method dispatching where users call the same named method such as predict but internally a different variant is run such as predict.lm, predict.glm, or predict.gam depending on the model object passed into it. Therefore, calling predict on an avNNet model is not the same predict on a gam model. Similarly, just as the function changes due to the input, so does the output change.
According to MSDN documents regarding the Excel #Value! error exposed as Error 2015:
#VALUE is Excel's way of saying, "There's something wrong with the way your formula is typed. Or, there's something wrong with the cells you are referencing."
Fundamentally, without seeing actual results, Excel may not be able to interpret or translate into Excel range or VBA type the result R returns from gam model especially as you describe R raises no error.
For example, per docs, the return value of the standard predict.lm is:
predict.lm produces a vector of predictions or a matrix of predictions...
However, per docs, the return value of predict.gam is a bit more nuanced:
If type=="lpmatrix" then a matrix is returned which will give a vector of linear predictor values (minus any offest) at the supplied covariate values, when applied to the model coefficient vector. Otherwise, if se.fit is TRUE then a 2 item list is returned with items (both arrays) fit and se.fit containing predictions and associated standard error estimates, otherwise an array of predictions is returned. The dimensions of the returned arrays depends on whether type is "terms" or not: if it is then the array is 2 dimensional with each term in the linear predictor separate, otherwise the array is 1 dimensional and contains the linear predictor/predicted values (or corresponding s.e.s). The linear predictor returned termwise will not include the offset or the intercept.
Altogether, consider adjusting parameters of your predict call to render a numeric vector for easy Excel interpretation and not a matrix/array or some other higher dimension R type that Excel cannot render:
testfunction <- function(k1,k2){
z <- mgcv::predict.gam(gam_y, data.frame(x = k1, x2 = k2), type=="response")
return(z)
}
testfunction <- function(k1,k2){
z <- mgcv::predict.gam(gam_y, data.frame(x = k1, x2 = k2), type=="lpmatrix")
return(z)
}
testfunction <- function(k1,k2){
z <- mgcv::predict.gam(gam_y, data.frame(x = k1, x2 = k2), type=="linked")
return(z$fit) # NOTICE fit ELEMENT USED
}
...
Further diagnostics:
Check returned object of predict.glm with str(obj) and class(obj)/ typeof(obj) to see dimensions and underlying elements and compare with predict in caret;
Check if high precision of decimal numbers is the case such as Excel's limits of 15 decimal points;
Check amount of data returned (exceeds Excel's sheet row limit of 220 or cell limit of 32,767 characters?).

Plotting SVM Linear Separator in R

I'm trying to plot the 2-dimensional hyperplanes (lines) separating a 3-class problem with e1071's svm. I used the default method (so there is no formula involved) like so:
library('e1071')
## S3 method for class 'default':
machine <- svm(x, y, kernel="linear")
I cannot seem to plot it by using the plot.svm method:
plot(machine, x)
Error in plot.svm(machine, x) : missing formula.
But I did not use the formula method, I used the default one, and if I pass '~' or '~.' as a formula argument it'll complain about the matrix x not being a data.frame.
Is there a way of plotting the fitted separator/s for the 2D problem while using the default method?
How may I achieve this?
Thanks in advance.
It appears that although svm() allows you to specify your input using either the default or formula method, plot.svm() only allows a formula method. Also, by only giving x to plot.svm(), you are not giving it all the info it needs. It also needs y.
Try this:
library(e1071)
x <- prcomp(iris[,1:4])$x[,1:2]
y <- iris[,5]
df <- data.frame(cbind(x[],y[]))
machine <- svm(y ~ PC1 + PC2, data=df)
plot(machine, data=df)
It appears that your x has more than two feature-variables or columns.
Since plot.svm() plots only 2-Dimensions at a time, you need to specify these dimensions explicitly by providing a formula argument.
Ex:- ## more than two variables: fix 2 dimensions
data(iris)
m2 <- svm(Species~., data = iris)
plot(m2, iris, Petal.Width ~ Petal.Length,slice = list(Sepal.Width = 3, Sepal.Length = 4))
In cases where the data-frames has only two dimensions by default, you can ignore the formula argument.
Ex:- ## a simple example
data(cats, package = "MASS")
m <- svm(Sex~., data = cats)
plot(m, cats)
These details can be found at plot.svm() documentation here https://www.rdocumentation.org/packages/e1071/versions/1.7-3/topics/plot.svm

Plot in SVM model (e1071 Package) using DocumentTermMatrix

i trying do create a plot for my model create using SVM in e1071 package.
my code to build the model, predict and build confusion matrix is
ptm <- proc.time()
svm.classifier = svm(x = train.set.list[[0.999]][["0_0.1"]],
y = train.factor.list[[0.999]][["0_0.1"]],
kernel ="linear")
pred = predict(svm.classifier, test.set.list[[0.999]][["0_0.1"]], decision.values = TRUE)
time[["svm"]] = proc.time() - ptm
confmatrix = confusionMatrix(pred,test.factor.list[[0.999]][["0_0.1"]])
confmatrix
train.set.list and test.set.list contains the test and train set for several conditions. train and set factor has the true label for each set. Train.set and test.set are both documenttermmatrix.
Then i tried to see a plot of my data, i tried with
plot(svm.classifier, train.set.list[[0.999]][["0_0.1"]])
but i got the message:
"Error in plot.svm(svm.classifier, train.set.list[[0.999]][["0_0.1"]]) :
missing formula."
what i'm doing wrong? confusion matrix seems good to me even not using formula parameter in svm function
Without given code to run, it's hard to say exactly what the problem is. My guess, given
?plot.svm
which says
formula formula selecting the visualized two dimensions. Only needed if more than two input variables are used.
is that your data has more than two predictors. You should specify in your plot function:
plot(svm.classifier, train.set.list[[0.999]][["0_0.1"]], predictor1 ~ predictor2)

Conversion to formula in R: Predict Y (factor variable) with X (data frame)

I am working on a classification problem where for my training data I have a data frame X and a factor variable Y, and I would like predict my variable Y from X.
The function cforest from the party package has the following interface
cforest(formula, data = list(), ...)
Where:
formula: a symbolic description of the model to be fit. Note that
symbols like ':' and '-' will not work and the tree will make
use of all variables listed on the rhs of 'formula'.
data: a data frame containing the variables in the model.
However, when I try:
# Build a random set of training vectors X
X <- data.frame(replicate(5, rnorm(2000)))
# Build Y from X
Y <- runif(1)*X[,1]*X[,2]^2+runif(1)*X[,3]/X[,4]
cforest(Y, data = X, ...)
I get an error:
..
10: ParseFormula(formula, data = data)
...
5: cforest(Y, data = X, ...) at ..
From the traceback it looks like I am not using the interface to cforest correctly. I have read about R formulas (?formula and this tutorial, which was very helpful), and I understand the concept abstractly, but I don't know how to convert my prediction problem (which I would write Y ~ X) to the formula syntax.
How can I convert my call to cforest using a formula?
The answer is to use the following syntax:
cf.model = cforest(Y ~ ., data=X, ...)
which basically says "use all variables in the dataframe X when trying to predict Y"

How get plot from nls in R?

In R I use nls to do a nonlinear least-squares fit. How then do I plot the model function using the values of the coefficients that the fit provided?
(Yes, this is a very naive question from an R relative newbie.)
Using the first example from ?nls and following the example I pointed you to line by line achieves the following:
#This is just our data frame
DNase1 <- subset(DNase, Run == 1)
DNase1$lconc <- log(DNase1$conc)
#Fit the model
fm1DNase1 <- nls(density ~ SSlogis(lconc, Asym, xmid, scal), DNase1)
#Plot the original points
# first argument is the x values, second is the y values
plot(DNase1$lconc,DNase1$density)
#This adds to the already created plot a line
# once again, first argument is x values, second is y values
lines(DNase1$lconc,predict(fm1DNase1))
The predict method for a nls argument is automatically returning the fitted y values. Alternatively, you add a step and do
yFitted <- predict(fm1DNase1)
and pass yFitted in the second argument to lines instead. The result looks like this:
Or if you want a "smooth" curve, what you do is to simply repeat this but evaluate the function at more points:
r <- range(DNase1$lconc)
xNew <- seq(r[1],r[2],length.out = 200)
yNew <- predict(fm1DNase1,list(lconc = xNew))
plot(DNase1$lconc,DNase1$density)
lines(xNew,yNew)
coef(x) returns the coefficients for regression results x.
model<-nls(y~a+b*x^k,my.data,list(a=0.,b=1.,k=1))
plot(y~x,my.data)
a<-coef(model)[1]
b<-coef(model)[2]
k<-coef(model)[3]
lines(x<-c(1:10),a+b*x^k,col='red')
For example.
I know what you want (I'm a Scientist). This isn't it, but at least shows how to use 'curve' to plot your fitting function over any range, and the curve will be smooth. Using the same data set as above:
nonlinFit <- nls(density ~ a - b*exp(-c*conc), data = DNase1, start = list(a=1, b=1, c=1) )
fitFnc <- function(x) predict(nonlinFit, list(conc=x))
curve(fitFnc, from=.5, to=10)
or,
curve(fitFnc, from=8.2, to=8.4)
or,
curve(fitFnc, from=.1, to=50) # well outside the data range
or whatever (without setting up a sequence of evaluation points first).
I'm a rudimentary R programmer, so I don't know how to implement (elegantly) something like ReplaceAll ( /. ) in Mathematica that one would use to replace occurrences of the symbolic parameters in the model, with the fitted parameters. This first step works although it looks horrible:
myModel <- "a - b*exp(-c*conc)"
nonlinFit <- nls(as.formula(paste("density ~", myModel)), data = DNase1, start = list(a=1, b=1, c=1) )
It leaves you with a separate 'model' (as a character string), that you might be able to make use of with the fitted parameters ... cleanly (NOT digging out a, b, c) would simply use nonlinFit ... not sure how though.
The function "curve" will plot functions for you.

Resources