Plotting SVM Linear Separator in R - r

I'm trying to plot the 2-dimensional hyperplanes (lines) separating a 3-class problem with e1071's svm. I used the default method (so there is no formula involved) like so:
library('e1071')
## S3 method for class 'default':
machine <- svm(x, y, kernel="linear")
I cannot seem to plot it by using the plot.svm method:
plot(machine, x)
Error in plot.svm(machine, x) : missing formula.
But I did not use the formula method, I used the default one, and if I pass '~' or '~.' as a formula argument it'll complain about the matrix x not being a data.frame.
Is there a way of plotting the fitted separator/s for the 2D problem while using the default method?
How may I achieve this?
Thanks in advance.

It appears that although svm() allows you to specify your input using either the default or formula method, plot.svm() only allows a formula method. Also, by only giving x to plot.svm(), you are not giving it all the info it needs. It also needs y.
Try this:
library(e1071)
x <- prcomp(iris[,1:4])$x[,1:2]
y <- iris[,5]
df <- data.frame(cbind(x[],y[]))
machine <- svm(y ~ PC1 + PC2, data=df)
plot(machine, data=df)

It appears that your x has more than two feature-variables or columns.
Since plot.svm() plots only 2-Dimensions at a time, you need to specify these dimensions explicitly by providing a formula argument.
Ex:- ## more than two variables: fix 2 dimensions
data(iris)
m2 <- svm(Species~., data = iris)
plot(m2, iris, Petal.Width ~ Petal.Length,slice = list(Sepal.Width = 3, Sepal.Length = 4))
In cases where the data-frames has only two dimensions by default, you can ignore the formula argument.
Ex:- ## a simple example
data(cats, package = "MASS")
m <- svm(Sex~., data = cats)
plot(m, cats)
These details can be found at plot.svm() documentation here https://www.rdocumentation.org/packages/e1071/versions/1.7-3/topics/plot.svm

Related

R Output of fGarch

I am modelling a time series as a GARCH(1,1)-process:
And the z_t are t-distributed.
In R, I do this in the fGarch-package via
model <- garchFit(formula = ~garch(1,1), cond.dist = "std", data=r)
Is this correct?
Now, I would like to understand the output of this to check my formula.
Obviously, model#fit$coefs gives me the coefficients and model#fitted gives me the fitted r_t.
But how do I get the fitted sigma_t and z_t?
I believe that the best way is to define extractor functions when generics are not available and methods when generics already exist.
The first two functions extract the values of interest from the fitted objects.
get_sigma_t <- function(x, ...){
x#sigma.t
}
get_z_t <- function(x, ...){
x#fit$series$z
}
Here a logLik method for objects of class "fGARCH" is defined.
logLik.fGARCH <- function(x, ...){
x#fit$value
}
Now use the functions, including the method. The data comes from the first example in help("garchFit").
N <- 200
r <- as.vector(garchSim(garchSpec(rseed = 1985), n = N)[,1])
model <- garchFit(~ garch(1, 1), data = r, trace = FALSE)
get_sigma_t(model) # output not shown
get_z_t(model) # output not shown
logLik(model)
#LogLikelihood
# -861.9494
Note also that methods coef and fitted exist, there is no need for model#fitted or model#fit$coefs, like is written in the question.
fitted(model) # much simpler
coef(model)
# mu omega alpha1 beta1
#3.541769e-05 1.081941e-06 8.885493e-02 8.120038e-01
It is a list structure. Can find the structure with
str(model)
From the structure, it is easier to extract with $ or #
model#fit$series$z
model#sigma.t

Remove linear dependent variables while using the bife package

Some pre-programmed models automatically remove linear dependent variables in their regression output (e.g. lm()) in R. With the bife package, this does not seem to be possible. As stated in the package description in CRAN on page 5:
If bife does not converge this is usually a sign of linear dependence between one or more regressors
and the fixed effects. In this case, you should carefully inspect your model specification.
Now, suppose the problem at hand involves doing many regressions and one cannot inspect adequately each regression output -- one has to suppose some sort of rule-of-thumb regarding the regressors. What could be some of the alternatives to remove linear dependent regressors more or less automatically and achieve an adequate model specification?
I set a code as an example below:
#sample coding
x=10*rnorm(40)
z=100*rnorm(40)
df1=data.frame(a=rep(c(0,1),times=20), x=x, y=x, z=z, ID=c(1:40), date=1, Region=rep(c(1,2, 3, 4),10))
df2=data.frame(a=c(rep(c(1,0),times=15),rep(c(0,1),times=5)), x=1.4*x+4, y=1.4*x+4, z=1.2*z+5, ID=c(1:40), date=2, Region=rep(c(1,2,3,4),10))
df3=rbind(df1,df2)
df3=rbind(df1,df2)
for(i in 1:4) {
x=df3[df3$Region==i,]
model = bife::bife(a ~ x + y + z | ID, data = x)
results=data.frame(Region=unique(df3$Region))
results$Model = results
if (i==1){
df4=df
next
}
df4=rbind(df4,df)
}
Error: Linear dependent terms detected!
Since you're only looking at linear dependencies, you could simply leverage methods that detect them, like for instance lm.
Here's an example of solution with the package fixest:
library(bife)
library(fixest)
x = 10*rnorm(40)
z = 100*rnorm(40)
df1 = data.frame(a=rep(c(0,1),times=20), x=x, y=x, z=z, ID=c(1:40), date=1, Region=rep(c(1,2, 3, 4),10))
df2 = data.frame(a=c(rep(c(1,0),times=15),rep(c(0,1),times=5)), x=1.4*x+4, y=1.4*x+4, z=1.2*z+5, ID=c(1:40), date=2, Region=rep(c(1,2,3,4),10))
df3 = rbind(df1, df2)
vars = c("x", "y", "z")
res_all = list()
for(i in 1:4) {
x = df3[df3$Region == i, ]
coll_vars = feols(a ~ x + y + z | ID, x, notes = FALSE)$collin.var
new_fml = xpd(a ~ ..vars | ID, ..vars = setdiff(vars, coll_vars))
res_all[[i]] = bife::bife(new_fml, data = x)
}
# Display all results
for(i in 1:4) {
cat("\n#\n# Region: ", i, "\n#\n\n")
print(summary(res_all[[i]]))
}
The functions needed here are feols and xpd, the two are from fixest. Some explanations:
feols, like lm, removes variables on-the-fly when they are found to be collinear. It stores the names of the collinear variables in the slot $collin.var (if none is found, it's NULL).
Contrary to lm, feols also allows fixed-effects, so you can add it when you look for linear dependencies: this way you can spot complex linear dependencies that would also involve the fixed-effects.
I've set notes = FALSE otherwise feols would have prompted a note referring to collinearity.
feols is fast (actually faster than lm for large data sets) so won't be a strain on your analysis.
The function xpd expands the formula and replaces any variable name starting with two dots with the associated argument that the user provide.
When the arguments of xpd are vectors, the behavior is to coerce them with pluses, so if ..vars = c("x", "y") is provided, the formula a ~ ..vars | ID will become a ~ x + y | ID.
Here it replaces ..vars in the formula by setdiff(vars, coll_vars)), which is the vector of variables that were not found to be collinear.
So you get an algorithm with automatic variable removal before performing bife estimations.
Finally, just a side comment: in general it's better to store results in lists since it avoids copies.
Update
I forgot, but if you don't need bias correction (bife::bias_corr), then you can directly use fixest::feglm which automatically removes collinear variables:
res_bife = bife::bife(a ~ x + z | ID, data = df3)
res_feglm = fixest::feglm(a ~ x + y + z | ID, df3, family = binomial)
rbind(coef(res_bife), coef(res_feglm))
#> x z
#> [1,] -0.02221848 0.03045968
#> [2,] -0.02221871 0.03045990

Is there an R function that resolve a second order linear model?

I´m a begginer in R and programming and struggling in doing problably a simple task.
I've made a code that creates a second model order and i want to input variables in this model and find the "Y value"
I´ve tried to use the predict function, but is actually pretty complex and I can't got anywhere.
I did this so far:
modFOI <- rsm(Rendimento~FO(x1,x2,x3,x4)+TWI(x1,x2,x3,x4)+PQ(x1,x2,x3,x4),data=CR) # com interações
summary(modFOI)
print(modFOI)
With that, i found the SO model, but now i want to create variables like x1,x2,x3 and input that in the model and find the Y. I also woud like to find the optimum Y
Simplest way to create a polynomial (2nd order) that I can think of is the following:
DF <- data.frame(x = runif(10,0,1),
y = runif(10,0,1) )
mod <- lm(DF$y ~ DF$x + I(DF$x^2))
predict(mod, new.data=data.frame(x=c(1,2,3,4,5)))
NB. when using predict the new.data must be in a data.frame format, and the variable must have the same name as the variable in the model (here, x)
Hope this helps
The optimum value is shown as the stationary point in the output of summary(modFOI). You may also run steepest(modFOI) to see a trace of the estimated values along the path of steepest ascent.
To predict, create a data frame with the desired sets of x values. For example,
testdat <- data.frame(x1 = -1:1, x2 = 0, x3 = 0, x4 = 1)
Then use the predict() function with this is newdata:
predict(modFOI, newdata = testdat)

Error in plot, formula missing when using svm

I am trying to plot my svm model.
library(foreign)
library(e1071)
x <- read.arff("contact-lenses.arff")
#alt: x <- read.arff("http://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/contact-lenses.arff")
model <- svm(`contact-lenses` ~ . , data = x, type = "C-classification", kernel = "linear")
The contact lens arff is the inbuilt data file in weka.
However, now i run into an error trying to plot the model.
plot(model, x)
Error in plot.svm(model, x) : missing formula.
The problem is that in in your model, you have multiple covariates. The plot() will only run automatically if your data= argument has exactly three columns (one of which is a response). For example, in the ?plot.svm help page, you can call
data(cats, package = "MASS")
m1 <- svm(Sex~., data = cats)
plot(m1, cats)
So since you can only show two dimensions on a plot, you need to specify what you want to use for x and y when you have more than one to choose from
cplus<-cats
cplus$Oth<-rnorm(nrow(cplus))
m2 <- svm(Sex~., data = cplus)
plot(m2, cplus) #error
plot(m2, cplus, Bwt~Hwt) #Ok
plot(m2, cplus, Hwt~Oth) #Ok
So that's why you're getting the "Missing Formula" error.
There is another catch as well. The plot.svm will only plot continuous variables along the x and y axes. The contact-lenses data.frame has only categorical variables. The plot.svm function simply does not support this as far as I can tell. You'll have to decide how you want to summarize that information in your own visualization.

How get plot from nls in R?

In R I use nls to do a nonlinear least-squares fit. How then do I plot the model function using the values of the coefficients that the fit provided?
(Yes, this is a very naive question from an R relative newbie.)
Using the first example from ?nls and following the example I pointed you to line by line achieves the following:
#This is just our data frame
DNase1 <- subset(DNase, Run == 1)
DNase1$lconc <- log(DNase1$conc)
#Fit the model
fm1DNase1 <- nls(density ~ SSlogis(lconc, Asym, xmid, scal), DNase1)
#Plot the original points
# first argument is the x values, second is the y values
plot(DNase1$lconc,DNase1$density)
#This adds to the already created plot a line
# once again, first argument is x values, second is y values
lines(DNase1$lconc,predict(fm1DNase1))
The predict method for a nls argument is automatically returning the fitted y values. Alternatively, you add a step and do
yFitted <- predict(fm1DNase1)
and pass yFitted in the second argument to lines instead. The result looks like this:
Or if you want a "smooth" curve, what you do is to simply repeat this but evaluate the function at more points:
r <- range(DNase1$lconc)
xNew <- seq(r[1],r[2],length.out = 200)
yNew <- predict(fm1DNase1,list(lconc = xNew))
plot(DNase1$lconc,DNase1$density)
lines(xNew,yNew)
coef(x) returns the coefficients for regression results x.
model<-nls(y~a+b*x^k,my.data,list(a=0.,b=1.,k=1))
plot(y~x,my.data)
a<-coef(model)[1]
b<-coef(model)[2]
k<-coef(model)[3]
lines(x<-c(1:10),a+b*x^k,col='red')
For example.
I know what you want (I'm a Scientist). This isn't it, but at least shows how to use 'curve' to plot your fitting function over any range, and the curve will be smooth. Using the same data set as above:
nonlinFit <- nls(density ~ a - b*exp(-c*conc), data = DNase1, start = list(a=1, b=1, c=1) )
fitFnc <- function(x) predict(nonlinFit, list(conc=x))
curve(fitFnc, from=.5, to=10)
or,
curve(fitFnc, from=8.2, to=8.4)
or,
curve(fitFnc, from=.1, to=50) # well outside the data range
or whatever (without setting up a sequence of evaluation points first).
I'm a rudimentary R programmer, so I don't know how to implement (elegantly) something like ReplaceAll ( /. ) in Mathematica that one would use to replace occurrences of the symbolic parameters in the model, with the fitted parameters. This first step works although it looks horrible:
myModel <- "a - b*exp(-c*conc)"
nonlinFit <- nls(as.formula(paste("density ~", myModel)), data = DNase1, start = list(a=1, b=1, c=1) )
It leaves you with a separate 'model' (as a character string), that you might be able to make use of with the fitted parameters ... cleanly (NOT digging out a, b, c) would simply use nonlinFit ... not sure how though.
The function "curve" will plot functions for you.

Resources