Objective function of an SVM - r

I have used the svm function in the e1071 package of R software to model my data using variables selected by my feature selection method. I have obtained predictions from this model using the predict.svm function in the same package. I want to compute the value of the objective function of the svm model using the R software. How can I do this?
Below is my code for my first feature selection technique-Information Gain
P1<-Fold1T$Class_NASQ
InfGainF1 <- information.gain(P1~., Fold1T[,-20])
subset <- cutoff.k(InfGainF1, 8)
f <- as.simple.formula(subset, "P1")
ModelInGF1<-svm(as.factor(P1)~ NSDQ.COMP+S.P.100+S.P.500+NYSE.COMP+NYSE.A.M.MKT +
RSEL.2000+ALL.ORD+HG.SENG ,data=Fold1T[,-20], kernel="radial",gamma=0.5,cost=16)
PredictInGF1<-predict(ModelInGF1,NewData=Fold1V[,-20])
######### Accuracy ########
confusionMatrix(PredictInGF1, P1)
Thanks

While learning about SVR back in 2010 I explored how predicted values are computed. To do this, I went over the file "svminternals.pdf" located in the e1071/doc subfolder and play my custom code (shown after the toy data) using the following data set
ToyData <- data.frame(X1=c(12.4,14.6,13.4,12.9,15.2,13.6,9.2), X2=c(2.1,9.2,1.9,0.8,1.1,8.6,1.1),Y=c(14.2,16.9,15.5,14.7,17.3,16,10.9))
You may explore the following code to see if is somehow helpful to you.
#LINEAR KERNEL
ToyData <- read.csv("ToyData.csv", header=T)
X <- as.matrix(ToyData[,1:2])
Y <- as.vector(ToyData[,3])
SVRLinear <- svm (X, Y, kernel="linear", epsilon=0.1, cost=1, scale=FALSE)
V <- as.matrix(SVRLinear$SV)
Vt <- t(V)
A <- as.matrix(SVRLinear$coefs)
(r <- SVRLinear$rho)
write.csv(V, file="SVLinear.csv")
write.csv(A, file="CoefsLinear.csv")
F <- (X %*% Vt) %*% A - r
write.csv(F, file="FittedLinear.csv")
#RBF KERNEL: Exp[(-gamma||x-z||^2)/2]
ToyData <- read.csv("ToyData.csv", header=T)
X <- as.matrix(ToyData[,1:2])
Y <- as.vector(ToyData[,3])
SVRRadial <- svm (X, Y, kernel="radial", epsilon=0.1, gamma=0.1, cost=5, scale=FALSE)
V <- as.matrix(SVRRadial$SV)
A <- as.matrix(SVRRadial$coefs)
(g <- SVRRadial$gamma)
(r <- SVRRadial$rho)
write.csv(V, file="SVRadial.csv")
write.csv(A, file="CoefsRadial.csv")
Kernel <- matrix(0, nrow(X), nrow(V))
for (i in 1:nrow(X)) {
for (j in 1:nrow(V)) {
Xi <- X[i,]
Vj <- V[j,]
XiMinusVj <- Xi - Vj
SumSqXiMinusVj <- XiMinusVj %*% XiMinusVj
Kernel[i,j] <- exp(-g*SumSqXiMinusVj)
}
}
F <- Kernel %*% A - r
write.csv(F, file="FittedRadial.csv")

I want to add the answer how to reproduce the predict value with the model parameter when scale option is open.In e1071,data are default scaled internally (both x and y variables) to zero mean and unit variance. The center and scale values are returned and used for later predictions.(http://www.inside-r.org/node/57517). According to the above code,I write the following code which may help to you.
ToyData <- data.frame(X1=c(12.4,14.6,13.4,12.9,15.2,13.6,9.2), X2=c(2.1,9.2,1.9,0.8,1.1,8.6,1.1),Y=c(14.2,16.9,15.5,14.7,17.3,16,10.9))
X <- as.matrix(ToyData[,1:2])
Y <- as.vector(ToyData[,3])
SVRRadial <- svm (X, Y, kernel="radial", epsilon=0.1, gamma=0.1, cost=5)
pred<-predict(SVRRadial,X)
toys<-ToyData
#scale the feature
sc_x<-data.frame(SVRRadial$x.scale)
for(col in row.names(sc_x)){
toys[[col]]<-(ToyData[[col]]-sc_x[[col,1]])/sc_x[[col,2]]
}
#compute the predict value, the method is same to the above code
X<-as.matrix(toys[,1:2])
V <- as.matrix(SVRRadial$SV)
A <- as.matrix(SVRRadial$coefs)
g <- SVRRadial$gamma
r <- SVRRadial$rho
Kernel <- matrix(0, nrow(X), nrow(V))
for (i in 1:nrow(X)) {
for (j in 1:nrow(V)) {
Xi <- X[i,]
Vj <- V[j,]
XiMinusVj <- Xi - Vj
SumSqXiMinusVj <- XiMinusVj %*% XiMinusVj
Kernel[i,j] <- exp(-g*SumSqXiMinusVj)
}
}
F <- Kernel %*% A - r
#restore the predict value from standard format to original format
my_pred<-F
sc_y<-data.frame(SVRRadial$y.scale)
my_pred<-my_pred*sc_y[[2]]+sc_y[[1]]
summary(my_pred-pred)
reference link:How to reproduce predict.svm in R?

Related

What is going on inside the varimax function in R?

I have been trying to figure out the core part of the varimax function in R. I found a wiki link that writes out the algorithm. But why is B <- t(x) %*% (z^3 - z %*% diag(drop(rep(1, p) %*% z^2))/p) is computed? I also am not sure as to why SVD is computed of the matrix B. The iteration step is probably to maximize/minimize the variance, and the singular values would really be variances of Principal Components. But I am also unsure about that. I am pasting the whole code of varimax for convenience, but really the relevant part and therefore my question on what is actually happening under the hood, is within the for loop.
function (x, normalize = TRUE, eps = 1e-05)
{
nc <- ncol(x)
if (nc < 2)
return(x)
if (normalize) {
sc <- sqrt(drop(apply(x, 1L, function(x) sum(x^2))))
x <- x/sc
}
p <- nrow(x)
TT <- diag(nc)
d <- 0
for (i in 1L:1000L) {
z <- x %*% TT
B <- t(x) %*% (z^3 - z %*% diag(drop(rep(1, p) %*% z^2))/p)
sB <- La.svd(B)
TT <- sB$u %*% sB$vt
dpast <- d
d <- sum(sB$d)
if (d < dpast * (1 + eps))
break
}
z <- x %*% TT
if (normalize)
z <- z * sc
dimnames(z) <- dimnames(x)
class(z) <- "loadings"
list(loadings = z, rotmat = TT)
}
Edit: The algorithm is available in the book "Factor Analysis of Data Matrices" by Holt, Rinehart and Winston and the actual sources can be found therein. This book is also cited with the varimax function in R.

R function writing - getting error: NaNs producedError in tsort[U + 1]only 0's may be mixed with negative subscripts

I am creating an R function that calculates a bootstrapped bias corrected and accelerated interval, (not using any pre-installed packages) My code seems to be working but am struggling actually writing the code for the lower and upper limits of the interval. Any suggestions would be helpful.
BCa <- function(stat,X,k,level=0.95,...){
if(!is.numeric(k)||k<=0){
stop("The number of bootstrap resamples 'k' must be a numeric value greater than 0")
}
t.star <- stat(X,...)
t.k <- rep(NA,k)
for(i in 1:k){
Xi <- sample(X,replace=TRUE)
t.k[i] <- stat(Xi,...)
}
z0 <- qnorm(mean(t.k<t.star))
n <- length(X)
t.minus.j <- rep(NA,n)
for(j in 1:n){
Xj <- X[-j]
t.minus.j[j]<- stat(Xj,...)
}
t.bar.minus <- mean(t.minus.j)
t.diff <- t.bar.minus - t.minus.j
a <- ((sum(t.diff^3))/(6*(t.diff^2)^3/2))
alpha <- 1-level
tsort <- sort(t.k, decreasing = FALSE)
L <- pnorm(z0 + ((z0+qnorm(alpha/2))/((1-a)*z0+qnorm(alpha/2))))
U <- qnorm(z0 + ((z0+qnorm(alpha/2))/((1-a)*z0+qnorm(alpha/2))))
if(!is.integer(L)){
L <- floor(L*(k+1))
}
if(!is.integer(U)){
U <- ceiling(U*(k+1))
}
lower.limit <- tsort[L]
upper.limit <- tsort[U+1]
return(list(t.star=t.star,ci=c(lower.limit,upper.limit)))
}

Multi-data likelihood function and mle2 function from bbmle package in R

I have written a custom likelihood function that fits a multi-data model that integrates mark-recapture and telemetry data (sensu Royle et al. 2013 Methods in Ecology and Evolution). The likelihood function is designed to be flexible in terms of whether and how many covariates are specified for different linear models in different likelihood components which is determined by values supplied as function arguments (i.e., data matrices "detcovs" and "dencovs" in my code). The likelihood function works when I directly supply it to optimization functions (e.g., optim or nlm), but does not play nice with the mle2 function in the bbmle package. My problem is that I continually run into the following error: "some named arguments in 'start' are not arguments to the specified log-likelihood function". This is my first attempt at writing custom likelihood functions so I'm sure there are general coding conventions of which I'm unaware that make such tasks much more efficient and amendable to the mle2 function. Below is my likelihood function, code creating the staring value objects, and code calling the mle2 function. Any advice how to solve the error problem and general comments on writing cleaner functions is welcome. Many thanks in advance.
Edit: As requested, I have simplified the likelihood function and provided code to simulate reproducible data to which the model can be fit. Included in the simulation code are 2 custom functions and use of the raster function from the raster package. Hopefully, I have sufficiently simplified everything to enable others to troubleshoot. Again, many thanks for your help!
Jared
Likelihood function:
CSCR.RSF.intlik2.EXAMPLE <- function(alpha0,sigma,alphas=NULL,betas=NULL,n0,yscr=NULL,K=NULL,X=X,trapcovs=NULL,Gden=NULL,Gdet=NULL,ytel=NULL,stel=NULL,
dencovs=NULL,detcovs=NULL){
#
# this version of the code handles a covariate on log(Density). This is starting value 5
#
# start = vector of starting values
# yscr = nind x ntraps encounter matrix
# K = number of occasions
# X = trap locations
# Gden = matrix with grid cell coordinates for density raster
# Gdet = matrix with gride cell coordinates for RSF raster
# dencovs = all covariate values for all nGden pixels in density raster
# trapcovs = covariate value at trap locations
# detcovs = all covariate values for all nGrsf pixels in RSF raster
# ytel = nguys x nGdet matrix of telemetry fixes in each nGdet pixels
# stel = home range center of telemetered individuals, IF you wish to estimate it. Not necessary
# alphas = starting values for RSF/detfn coefficients excluding sigma and intercept
# alpha0 = starting values for RSF/detfn intercept
# sigma = starting value for RSF/detfn sigma
# betas = starting values for density function coefficients
# n0 = starting value for number of undetected individuals on log scale
#
n0 = exp(n0)
nGden = nrow(Gden)
D = e2dist(X,Gden)
nGdet <- nrow(Gdet)
alphas = alphas
loglam = alpha0 -(1/(2*sigma*sigma))*D*D + as.vector(trapcovs%*%alphas) # ztrap recycled over nG
psi = exp(as.vector(dencovs%*%betas))
psi = psi/sum(psi)
probcap = 1-exp(-exp(loglam))
#probcap = (exp(theta0)/(1+exp(theta0)))*exp(-theta1*D*D)
Pm = matrix(NA,nrow=nrow(probcap),ncol=ncol(probcap))
ymat = yscr
ymat = rbind(yscr,rep(0,ncol(yscr)))
lik.marg = rep(NA,nrow(ymat))
for(i in 1:nrow(ymat)){
Pm[1:length(Pm)] = (dbinom(rep(ymat[i,],nGden),rep(K,nGden),probcap[1:length(Pm)],log=TRUE))
lik.cond = exp(colSums(Pm))
lik.marg[i] = sum( lik.cond*psi )
}
nv = c(rep(1,length(lik.marg)-1),n0)
part1 = lgamma(nrow(yscr)+n0+1) - lgamma(n0+1)
part2 = sum(nv*log(lik.marg))
out = -1*(part1+ part2)
lam = t(exp(a0 - (1/(2*sigma*sigma))*t(D2)+ as.vector(detcovs%*%alphas)))# recycle zall over all ytel guys
# lam is now nGdet x nG!
denom = rowSums(lam)
probs = lam/denom # each column is the probs for a guy at column [j]
tel.loglik = -1*sum( ytel*log(probs) )
out = out + tel.loglik
out
}
Data simulation code:
library(raster)
library(bbmle)
e2dist <- function (x, y){
i <- sort(rep(1:nrow(y), nrow(x)))
dvec <- sqrt((x[, 1] - y[i, 1])^2 + (x[, 2] - y[i, 2])^2)
matrix(dvec, nrow = nrow(x), ncol = nrow(y), byrow = F)
}
spcov <- function(R) {
v <- sqrt(nrow(R))
D <- as.matrix(dist(R))
V <- exp(-D/2)
cov1 <- t(chol(V)) %*% rnorm(nrow(R))
Rd <- as.data.frame(R)
colnames(Rd) <- c("x", "y")
Rd$C <- as.numeric((cov1 - mean(cov1)) / sd(cov1))
return(Rd)
}
set.seed(1234)
co <- seq(0.3, 0.7, length=5)
X <- cbind(rep(co, each=5),
rep(co, times=5))
B <- 10
co <- seq(0, 1, length=B)
Z <- cbind(rep(co, each=B), rep(co, times=B))
dencovs <- cbind(spcov(Z),spcov(Z)[,3]) # ordered as reading raster image from left to right, bottom to top
dimnames(dencovs)[[2]][3:4] <- c("dencov1","dencov2")
denr.list <- vector("list",2)
for(i in 1:2){
denr.list[[i]] <- raster(
list(x=seq(0,1,length=10),
y=seq(0,1,length=10),
z=t(matrix(dencovs[,i+2],10,10,byrow=TRUE)))
)
}
B <- 20
co <- seq(0, 1, length=B)
Z <- cbind(rep(co, each=B), rep(co, times=B))
detcovs <- cbind(spcov(Z),spcov(Z)[,3]) # ordered as reading raster image from left to right, bottom to top
dimnames(detcovs)[[2]][3:4] <- c("detcov1","detcov2")
detcov.raster.list <- vector("list",2)
trapcovs <- matrix(0,J,2)
for(i in 1:2){
detr.list[[i]] <- raster(
list(x=seq(0,1,length=20),
y=seq(0,1,length=20),
z=t(matrix(detcovs[,i+2],20,20,byrow=TRUE)))
)
trapcovs[,i] <- extract(detr.list[[i]],X)
}
alpha0 <- -3
sigma <- 0.15
alphas <- c(1,-1)
beta0 <- 3
betas <- c(-1,1)
pixelArea <- (dencovs$y[2] - dencovs$y[1])^2
mu <- exp(beta0 + as.matrix(dencovs[,3:4])%*%betas)*pixelArea
EN <- sum(mu)
N <- rpois(1, EN)
pi <- mu/sum(mu)
s <- dencovs[sample(1:nrow(dencovs), size=N, replace=TRUE, prob=pi),1:2]
J <- nrow(X)
K <- 10
yc <- d <- p <- matrix(NA, N, J)
D <- e2dist(s,X)
loglam <- t(alpha0 - t((1/(2*sigma*sigma))*D*D) + as.vector(trapcovs%*%alphas))
p <- 1-exp(-exp(loglam))
for(i in 1:N) {
for(j in 1:J) {
yc[i,j] <- rbinom(1, K, p[i,j])
}
}
detected <- apply(yc>0, 1, any)
yscr <- yc[detected,]
ntel <- 5
nfixes <- 100
poss.tel <- which(s[,1]>0.2 & s[,1]<0.8 & s[,2]>0.2 & s[,2]<0.8)
stel.id <- sample(poss.tel,ntel)
stel <- s[stel.id,]
ytel <- matrix(NA,ntel,nrow(detcovs))
d <- e2dist(stel,detcovs[,1:2])
lam <- t(exp(1 - t((1/(2*sigma*sigma))*d*d) + as.vector(as.matrix(detcovs[,3:4])%*%alphas)))
for(i in 1:ntel){
ytel[i,] <- rmultinom(1,nfixes,lam[i,]/sum(lam[i,]))
}
Specify starting values and call mle2 function:
start1 <- list(alpha0=alpha0,sigma=sigma,alphas=alphas,betas=betas,n0=log(N-nrow(yscr)))
parnames(CSCR.RSF.intlik2.EXAMPLE) <- names(start)
out1 <- mle2(CSCR.RSF.intlik2.EXAMPLE,start=start1,method="SANN",optimizer="optim",
data=list(yscr=yscr,K=K,X=X,trapcovs=trapcovs,Gden=dencovs[,1:2],Gdet=detcovs[,1:2],
ytel=ytel,stel=stel,dencovs=as.matrix(dencovs[,3:4]),detcovs=as.matrix(detcovs[,3:4]))
)

Newton Raphson for logistic regression

I did code for Newton Raphson for logistic regression. Unfortunately I tried many data there is no convergence. there is a mistake I do not know where is it. Can anyone help to figure out what is the problem.
First the data is as following; y indicate the response (0,1) , Z is 115*30 matrix which is the exploratory variables. I need to estimate the 30 parameters.
y = c(rep(0,60),rep(1,55))
X = sample(c(0,1),size=3450,replace=T)
Z = t(matrix(X,ncol=115))
#The code is ;
B = matrix(rep(0,30*10),ncol=10)
B[,1] = matrix(rep(0,30),ncol=1)
for(i in 2 : 10){
print(i)
p <- exp(Z %*%as.matrix(B[,i])) / (1 + exp(Z %*% as.matrix(B[,i])))
v.2 <- diag(as.vector(1 * p*(1-p)))
score.2 <- t(Z) %*% (y - p) # score function
increm <- solve(t(Z) %*% v.2 %*% Z)
B[,i] = as.matrix(B[,i-1])+increm%*%score.2
if(B[,i]-B[i-1]==matrix(rep(0.0001,30),ncol=1)){
return(B)
}
}
Found it! You're updating p based on B[,i], you should be using B[,i-1] ...
While I was finding the answer, I cleaned up your code and incorporated the results in a function. R's built-in glm seems to work (see below). One note is that this approach is likely to be unstable: fitting a binary model with 30 predictors and only 115 binary responses, and without any penalization or shrinkage, is extremely optimistic ...
set.seed(101)
n.obs <- 115
n.zero <- 60
n.pred <- 30
y <- c(rep(0,n.zero),rep(1,n.obs-n.zero))
X <- sample(c(0,1),size=n.pred*n.obs,replace=TRUE)
Z <- t(matrix(X,ncol=n.obs))
R's built-in glm fitter does work (it uses iteratively reweighted least squares, not N-R):
g1 <- glm(y~.-1,data.frame(y,Z),family="binomial")
(If you want to view the results, library("arm"); coefplot(g1).)
## B_{m+1} = B_m + (X^T V_m X)^{-1} X^T (Y-P_m)
NRfit function:
NRfit <- function(y,X,start,n.iter=100,tol=1e-4,verbose=TRUE) {
## used X rather than Z just because it's more standard notation
n.pred <- ncol(X)
B <- matrix(NA,ncol=n.iter,
nrow=n.pred)
B[,1] <- start
for (i in 2:n.iter) {
if (verbose) cat(i,"\n")
p <- plogis(X %*% B[,i-1])
v.2 <- diag(c(p*(1-p)))
score.2 <- t(X) %*% (y - p) # score function
increm <- solve(t(X) %*% v.2 %*% X)
B[,i] <- B[,i-1]+increm%*%score.2
if (all(abs(B[,i]-B[,i-1]) < tol)) return(B)
}
B
}
matplot(res1 <- t(NRfit(y,Z,start=coef(g1))))
matplot(res2 <- t(NRfit(y,Z,start=rep(0,ncol(Z)))))
all.equal(res2[6,],unname(coef(g1))) ## TRUE

customized ridge regression function

I am working on Ridge regression, I want to make my own function. It tried the following. It work for individual value of k but not for array for sequence of values.
dt<-longley
attach(dt)
library(MASS)
X<-cbind(X1,X2,X3,X4,X5,X6)
X<-as.matrix(X)
Y<-as.matrix(Y)
sx<-scale(X)/sqrt(nrow(X)-1)
sy<-scale(Y)/sqrt(nrow(Y)-1)
rxx<-cor(sx)
rxy<-cor(sx,sy)
for (k in 0:1){
res<-solve(rxx+k*diag(rxx))%*%rxy
k=k+0.01
}
Need help for optimized code too.
poly.kernel <- function(v1, v2=v1, p=1) {
((as.matrix(v1) %*% t(v2))+1)^p
}
KernelRidgeReg <- function(TrainObjects,TrainLabels,TestObjects,lambda){
X <- TrainObjects
y <- TrainLabels
kernel <- poly.kernel(X)
design.mat <- cbind(1, kernel)
I <- rbind(0, cbind(0, kernel))
M <- crossprod(design.mat) + lambda*I
#crossprod is just x times traspose of x, just looks neater in my openion
M.inv <- solve(M)
#inverse of M
k <- as.matrix(diag(poly.kernel(cbind(TrainObjects,TrainLabels))))
#Removing diag still gives the same MSE, but will output a vector of prediction.
Labels <- rbind(0,as.matrix(TrainLabels))
y.hat <- t(Labels) %*% M.inv %*% rbind(0,k)
y.true <- Y.test
MSE <-mean((y.hat - y.true)^2)
return(list(MSE=MSE,y.hat=y.hat))
}
Kernel with p=1, will give you ridge regression.
Solve built-in R function sometimes return singular matrix. You may want to write your own function to avoid that.

Resources