How to plot SVM classification hyperplane - r

Here is my sample code for SVM classification.
train <- read.csv("traindata.csv")
test <- read.csv("testdata.csv")
svm.fit=svm(as.factor(value)~ ., data=train, kernel="linear", method="class")
svm.pred = predict(svm.fit,test,type="class")
The feature value in my example is a factor which gives two levels (either true or false). I wanted to
plot a graph of my svm classifier and group them into two groups. One group
those with a "true" and another group as false. How do we produce a 3D or 2D SVM plot? I tried with plot(svm.fit, train) but it doesn't seem to work out for me.
There is this answer i found on SO but I am not clear with what t, x, y, z, w, and cl are in the answer.
Plotting data from an svm fit - hyperplane
i have about 50 features in my dataset which the last column is a factor. Any simple way of doing it or if any one could help me explain his answer.

The short answer is: you cannot. Your data is 50 dimensional. You cannot plot 50 dimensions. The only thing you can do are some rough approximations, reductions and projections, but none of these can actually represent what is happening inside. In order to plot 2D/3D decision boundary your data has to be 2D/3D (2 or 3 features, which is exactly what is happening in the link provided - they only have 3 features so they can plot all of them). With 50 features you are left with statistical analysis, no actual visual inspection.
You can obviously take a look at some slices (select 3 features, or main components of PCA projections). If you are not familiar with underlying linear algebra you can simply use gmum.r package which does this for you. Simply train svm and plot it forcing "pca" visualization, like here: http://r.gmum.net/samples/svm.basic.html.
library(gmum.r)
# We will perform basic classification on breast cancer dataset
# using LIBSVM with linear kernel
data(svm_breast_cancer_dataset)
# We can pass either formula or explicitly X and Y
svm <- SVM(X1 ~ ., svm.breastcancer.dataset, core="libsvm", kernel="linear", C=10)
## optimization finished, #iter = 8980
pred <- predict(svm, svm.breastcancer.dataset[,-1])
plot(svm, mode="pca")
which gives
for more examples you can refer to project website http://r.gmum.net/
However this only shows points projetions and their classification - you cannot see the hyperplane, because it is highly dimensional object (in your case 49 dimensional) and in such projection this hyperplane would be ... whole screen. Exactly no pixel would be left "outside" (think about it in this terms - if you have 3D space and hyperplane inside, this will be 2D plane.. now if you try to plot it in 1D you will end up with the whole line "filled" with your hyperplane, because no matter where you place a line in 3D, projection of the 2D plane on this line will fill it up! The only other possibility is that the line is perpendicular and then projection is a single point; the same applies here - if you try to project 49 dimensional hyperplane onto 3D you will end up with the whole screen "black").

Related

Fitting different parts of data to different models in R

I've had a pet personal project trying to dig through my Dad's old thesis from 1972 and to reproduce a computational solution that he derived. His project was looking at the kinetics of a transition state for alumina ceramics. After collecting the data, he derived the following model for the kinetic curve of the transition (see attached image from his thesis).
In case the picture doesn't come through, the data form a s shaped curve. To the left of the inflection point t* the data fit the equation
y = A * exp(K*t)
To the right of the inflection point, the data fit the equation
y = 1 - B * exp(-J * t^n)
He wrote up a fortran program using Fortran 68 that does dynamic modeling and least squares fitting for this. I am trying to "update" his code to see if I can do it more efficiently in R. So two questions:
What is the best way to just plot his model? i.e. how to plot two equations in this manner. I feel like I could do it brute force with base R, but I'm not sure that it will transition between the two equations smoothly.
In his model, the coefficients A, K, B, J and n as well as the inflection point t* are unknown and are optimized by least squares. He does his modeling in fortran by brute force. Is there a glm or similar solution in R for solving this elegantly?
Here is a sample of the data that he generated:
y <- c(20,30,40,50,55,60,65,70,80,90,100,110,120,150)
t <- c(0.05,0.11,0.185,0.31,0.375,0.445,0.52,0.63,0.8,0.92,0.97,0.98,0.99,0.999)

How to get X axis on Fig 5.3 in Elements of Statistical Learning?

I am trying to make figure 5.3 in Elements of statistical learning using the South African Heart Disease data. I have gotten to a point where I have been able to get the pointwise variances and plot it against "sbp" of the model predictor variables thus far. In part, because since my pointwise variance vector is of dimension 462 by 1 , the only other things that could plot the point wise variance is one of the predictor variables, in my case "sbp" which contains the same number of data points 462. With that, I get a plot that looks like this: 
Eye balling this plot, I can see knots at 33% (123) and 66%(162) for the cubic spline model with df=6-1 (Note:-1 because there is an intercept) in agreement to the fig 5.3 with knots at 0.33 and 0.66, as explained in the description from figure 5.3. I think I am getting close but my problem now is that this is not being plotted against X from 0 to 1 with 50 points like the figure explains. Here's what the figure should display in principle:
The code for my figure is done in r and is curently only attempting the cubic spline model. If I wanted to do the natural cubic spline I would just replace the bs() function used for the cubic spline with ns() function to build the required H matrix of basis functions. Please see code showing how I am constructing the Cubic Spline model:
library(sqldf)
library(splines)
library(gam)
library(mgcv)
SAheart <- read.table("SAheart.data",
sep = ",", head=T,row.names = 1)
SAheart.var<-sqldf("select sbp,tobacco,ldl,famhist,obesity,alcohol,age,chd from SAheart")
attach(SAheart.var)
sbp<-SAheart.var[,1]
tobacco<-SAheart.var[,2]
ldl.bsf<-SAheart.var[,3]
famhist<-SAheart.var[,4]
obesity<-SAheart.var[,5]
alcohol<-SAheart.var[,6]
age<-SAheart.var[,7]
chd<-SAheart.var[,8]
#Ignore these two models since they are simply dummy models for the natural cubic spline and global linear
SAheartGlobalLinear<-gam(chd~ sbp,data=SAheart)
SAheartNaturalCubicSpline<-gam(chd~ns(sbp,df=5),method="REML",data=SAheart)
#SAheartCubicSpline
sbp.bs <- bs(sbp,df=5)
tobacco.bs<-bs(tobacco,df=5)
ldl.bsf.bs<-bs(ldl.bsf,df=5)
famhist<-as.numeric(famhist)-1
obesity.bs<-bs(obesity,df=5)
alcohol.bs<-bs(alcohol,df=5)
age.bs<-bs(age,df=5)
chd.bs<-bs(chd,df=5)
#build required H matrix of basis functions using df=6-1 degrees of freedom
H <-cbind(sbp.bs,tobacco.bs,ldl.bsf.bs,famhist,obesity.bs,age.bs)
#centering the columns of H, intercept column is not centered
#producing another basis of the column space
H<-cbind(rep(1,dim(SAheart)[1]),scale(H,scale=FALSE))
#obtain coefficients with glm.fit
SAheartCubicSpline<-glm.fit(H,chd, family = binomial())
coeff<-SAheartCubicSpline$coefficients
#make W eight matrix 462 by 462
W= diag(SAheartCubicSpline$weights)
#construct covariance matrix Note: I made it two different ways, not sure if it matters
Sigma = solve(t(H)%*%W%*%H)
sigma = (t(H)%*%W%*%H)^-1
#Calculate pointwise variance for one single predictor "sbp"
pw.var<-diag(H[,2:6]%*%Sigma[2:6,2:6]%*%t(H[,2:6]))
#make plot
plot(sbp,pw.var)
I think I am getting close but my problem now is that, this is not being plotted against X from 0 to 1 with 50 points because my point wise variance vector has 462 points. I wonder how point wise variance against X as an interval of U[0,1] with 50 random points would get you the cubic spline plot as seen in figure 5.3. Also, if possible, I would also like to know how could I also fit the global cubic polynomial and global linear. Otherwise, I completely understand yet I would love to know where I am going wrong in terms of the x-axis from figure 5.3. Thanks in advance!

NMDS ordination interpretation from R output

I have conducted an NMDS analysis and have plotted the output too. However, I am unsure how to actually report the results from R. Which parts from the following output are of most importance? The graph that is produced also shows two clear groups, how are you supposed to describe these results?
MDS.out
Call:
metaMDS(comm = dgge2, distance = "bray")
global Multidimensional Scaling using monoMDS
Data: dgge2
Distance: bray
Dimensions: 2
Stress: 0
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on ‘dgge2’
The most important pieces of information are that stress=0 which means the fit is complete and there is still no convergence. This happens if you have six or fewer observations for two dimensions, or you have degenerate data. You should not use NMDS in these cases. Current versions of vegan will issue a warning with near zero stress. Perhaps you had an outdated version.
I think the best interpretation is just a plot of principal component. yOu can use plot and text provided by vegan package. Here I am creating a ggplot2 version( to get the legend gracefully):
library(vegan)
library(ggplot2)
data(dune)
ord = metaMDS(comm = dune)
ord_spec <- scores(ord, "spec")
ord_spec <- cbind.data.frame(ord_spec,label=rownames(ord_spec))
ord_sites <- scores(ord, "sites")
ord_sites <- cbind.data.frame(ord_sites,label=rownames(ord_sites))
ggplot(data=ord_spec,aes(x=NMDS1,y=NMDS2)) +
geom_text(aes(label=label,col='species')) +
geom_text(data=ord_sites,aes(label=label,col='sites'))

loess predict with new x values

I am attempting to understand how the predict.loess function is able to compute new predicted values (y_hat) at points x that do not exist in the original data. For example (this is a simple example and I realize loess is obviously not needed for an example of this sort but it illustrates the point):
x <- 1:10
y <- x^2
mdl <- loess(y ~ x)
predict(mdl, 1.5)
[1] 2.25
loess regression works by using polynomials at each x and thus it creates a predicted y_hat at each y. However, because there are no coefficients being stored, the "model" in this case is simply the details of what was used to predict each y_hat, for example, the span or degree. When I do predict(mdl, 1.5), how is predict able to produce a value at this new x? Is it interpolating between two nearest existing x values and their associated y_hat? If so, what are the details behind how it is doing this?
I have read the cloess documentation online but am unable to find where it discusses this.
However, because there are no coefficients being stored, the "model" in this case is simply the details of what was used to predict each y_hat
Maybe you have used print(mdl) command or simply mdl to see what the model mdl contains, but this is not the case. The model is really complicated and stores a big number of parameters.
To have an idea what's inside, you may use unlist(mdl) and see the big list of parameters in it.
This is a part of the manual of the command describing how it really works:
Fitting is done locally. That is, for the fit at point x, the fit is made using points in a neighbourhood of x, weighted by their distance from x (with differences in ‘parametric’ variables being ignored when computing the distance). The size of the neighbourhood is controlled by α (set by span or enp.target). For α < 1, the neighbourhood includes proportion α of the points, and these have tricubic weighting (proportional to (1 - (dist/maxdist)^3)^3). For α > 1, all points are used, with the ‘maximum distance’ assumed to be α^(1/p) times the actual maximum distance for p explanatory variables.
For the default family, fitting is by (weighted) least squares. For
family="symmetric" a few iterations of an M-estimation procedure with
Tukey's biweight are used. Be aware that as the initial value is the
least-squares fit, this need not be a very resistant fit.
What I believe is that it tries to fit a polynomial model in the neighborhood of every point (not just a single polynomial for the whole set). But the neighborhood does not mean only one point before and one point after, if I was implementing such a function I put a big weight on the nearest points to the point x, and lower weights to distal points, and tried to fit a polynomial that fits the highest total weight.
Then if the given x' for which height should be predicted is closest to point x, I tried to use the polynomial fitted on the neighborhoods of the point x - say P(x) - and applied it over x' - say P(x') - and that would be the prediction.
Let me know if you are looking for anything special.
To better understand what is happening in a loess fit try running the loess.demo function from the TeachingDemos package. This lets you interactively click on the plot (even between points) and it then shows the set of points and their weights used in the prediction and the predicted line/curve for that point.
Note also that the default for loess is to do a second smoothing/interpolating on the loess fit, so what you see in the fitted object is probably not the true loess fitting information, but the secondary smoothing.
Found the answer on page 42 of the manual:
In this algorithm a set of points typically small in number is selected for direct
computation using the loess fitting method and a surface is evaluated using an interpolation
method that is based on blending functions. The space of the factors is divided into
rectangular cells using an algorithm based on k-d trees. The loess fit is evaluated at
the cell vertices and then blending functions do the interpolation. The output data
structure stores the k-d trees and the fits at the vertices. This information
is used by predict() to carry out the interpolation.
I geuss that for predict at x, predict.loess make a regression with some points near x, and calculate the y-value at x.
Visit https://stats.stackexchange.com/questions/223469/how-does-a-loess-model-do-its-prediction

Is there an implementation of loess in R with more than 3 parametric predictors or a trick to a similar effect?

Calling all experts on local regression and/or R!
I have run into a limitation of the standard loess function in R and hope you have some advice. The current implementation supports only 1-4 predictors. Let me set out our application scenario to show why this can easily become a problem as soon as we want to employ globally fit parametric covariables.
Essentially, we have a spatial distortion s(x,y) overlaid over a number of measurements z:
z_i = s(x_i,y_i) + v_{g_i}
These measurements z can be grouped by the same underlying undistorted measurement value v for each group g. The group membership g_i is known for each measurement, but the underlying undistorted measurement values v_g for the groups are not known and should be determined by (global, not local) regression.
We need to estimate the two-dimensional spatial trend s(x,y), which we then want to remove. In our application, say there are 20 groups of at least 35 measurements each, in the most simple scenario. The measurements are randomly placed. Taking the first group as reference, there are thus 19 unknown offsets.
The below code for toy data (with a spatial trend in one dimension x) works for two or three offset groups.
Unfortunately, the loess call fails for four or more offset groups with the error message
Error in simpleLoess(y, x, w, span, degree, parametric, drop.square,
normalize, :
only 1-4 predictors are allowed"
I tried overriding the restriction and got
k>d2MAX in ehg136. Need to recompile with increased dimensions.
How easy would that be to do? I cannot find a definition of d2MAX anywhere, and it seems this might be hardcoded -- the error is apparently triggered by line #1359 in loessf.f
if(k .gt. 15) call ehg182(105)
Alternatively, does anyone know of an implementation of local regression with global (parametric) offset groups that could be applied here?
Or is there a better way of dealing with this? I tried lme with correlation structures but that seems to be much, much slower.
Any comments would be greatly appreciated!
Many thanks,
David
###
#
# loess with parametric offsets - toy data demo
#
x<-seq(0,9,.1);
x.N<-length(x);
o<-c(0.4,-0.8,1.2#,-0.2 # works for three but not four
); # these are the (unknown) offsets
o.N<-length(o);
f<-sapply(seq(o.N),
function(n){
ifelse((seq(x.N)<= n *x.N/(o.N+1) &
seq(x.N)> (n-1)*x.N/(o.N+1)),
1,0);
});
f<-f[sample(NROW(f)),];
y<-sin(x)+rnorm(length(x),0,.1)+f%*%o;
s.fs<-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')});
s<-paste(c('y~x',s.fs),collapse='+');
d<-data.frame(x,y,f)
names(d)<-c('x','y',s.fs);
l<-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d,
span=0.4);
yp<-predict(l,newdata=d);
plot(x,y,pch='+',ylim=c(-3,3),col='red'); # input data
points(x,yp,pch='o',col='blue'); # fit of that
d0<-d; d0$f1<-d0$f2<-d0$f3<-0;
yp0<-predict(l,newdata=d0);
points(x,y-f%*%o); # spatial distortion
lines(x,yp0,pch='+'); # estimate of that
op<-sapply(seq(NCOL(f)),function(i){(yp-yp0)[!!f[,i]][1]});
cat("Demo offsets:",o,"\n");
cat("Estimated offsets:",format(op,digits=1),"\n");
Why don't you use an additive model for this? Package mgcv will handle this sort of model, if I understand your Question, just fine. I might have this wrong, but the code you show is relating x ~ y, but your Question mentions z ~ s(x, y) + g. What I show below for gam() is for response z modelled by a spatial smooth in x and y with g being estimated parametrically, with g stored as a factor in the data frame:
require(mgcv)
m <- gam(z ~ s(x,y) + g, data = foo)
Or have I misunderstood what you wanted? If you want to post a small snippet of data I can give a proper example using mgcv...?

Resources