Reproducing predict function in the svm method in R - r

I want to reproduce predict function in R. I found very nice example here How to reproduce predict.svm in R?, but it does not work on my data.
The difference is that I have four classes.
I receive error "Error in x - x1 : non-numeric argument to binary operator". After advice from MrFlick, I add as.numeric to all values (this change the error, so I check my original data table, and there where few non numeric values).
Right now, I have another error: "Error in f(y, m) : dims [product 3] do not match the length of object [6]"
My data are big, so I prepared some values to show you my problem.
library(e1071)
Cval =100
GammaVal=0.1
sp1a<-as.numeric(c("2.58","0","0","10.85","20.1","0","0","0","0","0","76.03","0","0","28.79","0","2.76","0","0","23.99","0"))
sp1b<-as.numeric(c("135.32","133.82","134.24","132.84","135.11","133.55","132.99","130.25","133.19","132.42","135.8","133.99","133.33","135.52","134.67","134.79","134.32","133.9","135.36","133.14"))
sp1c<-as.numeric(sp1b)/2.3
sp1d<-as.numeric(sp1b)-3.5
sp1e<-as.numeric(sp1a)+1.3
sp1f<-as.numeric(sp1a)*2
data<-data.frame(cbind(sp1a,sp1b,sp1c,sp1d,sp1e,sp1f,class=c(rep(1,4),rep(2,5),rep(3,5),rep(4,6))))
svm_mod = svm(class~.,type="C-classification",data=data,cost = Cval, gamma = GammaVal,cross=10)
summary(svm_mod)
svm_train_pred = predict(svm_mod, data)
self_check_svm_out = cbind(data,svm_train_pred)
tab <- table(pred = svm_train_pred, true = data[,7])
## my predict functions
k<-function(x,x1,gamma){
return(exp(-gamma*sum((x-x1)^2)))
}
f<-function(x,m){
return(t(m$coefs) %*% as.matrix(apply(m$SV,1,k,x,m$gamma)) - m$rho)
}
my.predict<-function(m,x){
apply(x,1,function(y) sign(f(y,m)))
}
table(my.predict(svm_mod,data[,1:4]),predict(svm_mod,data[,1:4]))

Related

How to write custom predict function for classification model in R?

I am trying to use the flashlight package with the h2o package. An example of doing this on a regression model can be found here. However, I am trying to make it work for a classification model... to achieve this I was following the example given in the link. flashlight will work with h2o if you provide your own custom predict function. However, the predict function that is in the example below does not work for classification.
Here is the code I'm using:
library(flashlight)
library(h2o)
h2o.init()
h2o.no_progress()
iris_hf <- as.h2o(iris)
iris_dl <- h2o.deeplearning(x = 1:4, y = "Species", training_frame = iris_hf, seed=123456)
pred_fun <- function(mod, X) as.vector(unlist(h2o.predict(mod, as.h2o(X))))
fl_NN <- flashlight(model = iris_dl, data = iris, y = "Species", label = "NN",
predict_function = pred_fun)
But when I try and check the importance or interactions, I get an error.... for example:
light_interaction(fl_NN, type = "H",
pairwise = TRUE)
Throws back the error:
Error: Assigned data predict(x, data = X[, cols, drop = FALSE]) must
be compatible with existing data. Existing data has 22500 rows.
Assigned data has 90000 rows. ℹ Only vectors of size 1 are recycled.
I need to change the predict function somehow to make it work... but I have had no success yet... any suggestion as to how I could change the predict function to work?
EDIT UPDATE: So, I found a custom predict function that works with the light_interaction function. That is:
pred_fun <- function(mod, X) as.vector(unlist(h2o.predict(mod, as.h2o(X))[,2]))
Where the above is indexed for the specific category. However, The above doesn't work for calculating the importance. For example:
light_importance(fl_NN)
Gives the error:
Warning messages:
1: In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors
2: In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors
3: In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors
4: In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors
5: In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors
So, Im still trying to figure this out!?

How to load a csv file into R as a factor for use with glmnet and logistic regression

I have a csv file (single column, numeric values) called "y" that consists of zeros and ones where the rows with the value 1 indicate the target variable for logistic regression, and another file called "x" with the same number of rows and with columns of numeric predictor values. How do I load these so that I can then use cv.glmnet, i.e.
x <- read.csv('x',header=FALSE,sep=",")
y <- read.csv('y',header=FALSE )
is throwing an error
Error in y %*% rep(1, nc) :
requires numeric/complex matrix/vector arguments
when I call
cvfit = cv.glmnet(x, y, family = "binomial")
I know that "y" should be loaded as a "factor," but how do I do this? My online searches have found all sorts of approaches that have just confused me. What is the simple one-liner to just load this data ready for glmnet?
The cv.glmnet requires data to be provided in vector or matrix format. You can use the following code
xmat = as.matrix(x)
yvec = as.vector(y)
Then use
cvfit = cv.glmnet(xmat, yvec, family = "binomial")
If you can provide your data in dput() format, I can give a try.

Error when running PerformanceAnalytics function in R

I am getting a Error in 1:T : argument of length 0 when running the Performance Analytics package in R. am I missing a package? Below is my code with error.
#clean z, all features, alpha = .01, run below
setwd("D:/LocalData/casaler/Documents/R/RESULTS/PLOTS_PCA/CLN_01")
PGFZ_ALL <- read.csv("D:/LocalData/casaler/Documents/R/PG_DEUX_Z.csv", header=TRUE)
options(max.print = 100000) #Sets ability to view all dealer records
pgfzc_all <- PGFZ_ALL
#head(pgfzc_all,10)
library("PerformanceAnalytics")
library("RGraphics")
Loading required package: grid
pgfzc_elev <- pgfzc_all$ELEV
#head(pgfzc_elev,5)
#View(pgfzc_elev)
set.seed(123) #for replication purposes; always use same seed value
cln_elev <- clean.boudt(pgfzc_elev, alpha = 0.01) #set alpha .001 to give the most extreme outliers
Error in 1:T : argument of length 0
It's hard to answer your question without knowing what your data looks like. But I can tell you what throws that error. Looking into the source code of the clean.boudt function I find the following cause of your error:
T = dim(R)[1]
...
for (t in c(1:T)) {
d2t = as.matrix(R[t, ] - mu) %*% invSigma %*% t(as.matrix(R[t,
] - mu))
vd2t = c(vd2t, d2t)
}
...
The dim(R)[1] extracts the number of rows in the data supplied to the R argument in the function. It appears that your data has no rows, so check the data type of pgfzc_elev
The cause of the error is likely from your use of $ to subset pgfzc_all.
pgfzc_elev <- pgfzc_all$ELEV
I reckon it is of class integer, which is why dim(R)[1] does not work in the function.
Rather subset your object like this:
pgfzc_elev <- pgfzc_all[, ELEV, drop = F]
Try that and see if it works.

Predict() with regsubsets

I'm trying to replicate the results from An Introduction to Statistical Learning with Applications in R. Specifically, the Lab in section 6.5.3. I have followed the code in the lab exactly:
library("ISLR")
library("leaps")
set.seed(1)
train = sample(c(TRUE, FALSE), nrow(Hitters), rep = TRUE)
test = (!train)
regfit.best = regsubsets(Salary ~., data = Hitters[train,], nvmax= 19)
test.mat = model.matrix(Salary~., data = Hitters[test,])
val.errors = rep(NA, 19)
for (i in 1:19){
coefi= coef(regfit.best, id = i)
pred=test.mat[,names(coefi)]%*%coefi
val.errors[i]=mean((Hitters$Salary[test]-pred)^2)
}
When I run this I still get the following error:
Warning message:
In Hitters$Salary[test] - pred :
longer object length is not a multiple of shorter object length
Error in mean((Hitters$Salary[test] - pred)^2) :
error in evaluating the argument 'x' in selecting a method for function 'mean': Error: dims [product 121] do not match the length of object [148]
And val.errors is a vector of 19 NAs.
I'm still relatively new to R and to the validation approach, so I'm not sure exactly why the dimensions of these are different.
It was actually an issue with not carrying over steps from the previous subsection, which omitted entries that were incomplete.
You need to remove rows with missing data. Run "Hitters = na.omit(Hitters)" at the beginning.

invalid type (list) message in applying gmm method

The moment condition function is simply exp(-g/r)-1, where g is a known series of daily return on AAA-class bond index, and r is the rikiness measure to be derived through gmm. My codes are as follows:
View(Source)
library(gmm)
data(Source)
x <- Source[1:5200,"AAA"]
m <- function(r,x)
{m.1 <- exp(-x[,"AAA"]/r)-1}
summary(gmm(m,x,t0=1,method="BFGS",control=1e-12))
Which in term yields the following error message:
****Error in model.frame.default(formula = gmat ~ 1, drop.unused.levels = TRUE) :
invalid type (list) for variable 'gmat'****
Could anyone help me figure out what went wrong?
Thanks a lot!
For those kind people who would like to replicate the results, please find attached the source data as mentioned above.
The correct r is 1.590 , which can be solved through goal searching in excel, with target function :(average(exp(-g/r)-1) )^2 and target value: 0 (tolerance: 1e-12)
https://docs.google.com/spreadsheets/d/1AnTErQd2jm9ttKDZa7On3DLzEZUWaz5Km3nKaB7K18o/edit?usp=sharing

Resources