Passing the weights argument to a regression function inside an R function

Passing the weights argument to a regression function inside an R function - r

I am trying to write an R function to run a weighted (optional) regressions, and I am having difficulties getting the weight variable to work.
Here is a simplified version of the function.
HC <- function(data, FUN, formula, tau = 0.5, weights = NULL){
if(is.null(weights)){
est <- FUN(data = data, formula = formula, tau = tau)
intercept = est$coef[["(Intercept)"]]
zeroWorker <- exp(intercept)
}
else {
est <- FUN(data = data, formula = formula, tau = tau, weights = weights)
intercept = est$coef[["(Intercept)"]]
zeroWorker <- exp(intercept)
}
return(zeroWorker)
}
The function works perfectly if I do not use the weights argument.
mod1 <- HC(data = mydata, formula = lin.model, tau = 0.2,
FUN = rq)
But, throws an error message when I use the weights argument.
mod2 <- HC(data = mydata, formula = lin.model, tau = 0.2,
FUN = rq, weights = weig)
I google the problem, and this post seems to be the closest to my problem, but I could still not get it to work. R : Pass argument to glm inside an R function.
Any help will be appreciated.
My problem can be replicated with:
library("quantreg")
data(engel)
mydata <- engel
mydata$weig <- with(mydata, log(sqrt(income))) # Create a fictive weigth variable
lin.model <- foodexp~income
mod1 <- HC(data = mydata, formula = lin.model, tau = 0.2,
FUN = rq) # This works perfectly
mod2 <- HC(data = mydata, formula = lin.model, tau = 0.2,
FUN = rq, weights = weig) # throws an error.
Error in HC(data = mydata, formula = lin.model, tau = 0.2, FUN = rq, weights = weig) :
object 'weig' not found

You have two problems. The error you're encountering is because you're trying to use the weigh variable without referencing it as coming from the mydata dataset. Try using mydata$weig. This will solve your first error, but you then get the actual one related to using the weights argument, which is:
Error in model.frame.default(formula = formula, data = data, weights = substitute(weights), :
invalid type (symbol) for variable '(weights)'
The solution is to add the variable specified in HC's weights argument to the dataframe before passing it to FUN:
HC <- function(data, FUN, formula, tau = 0.5, weights = NULL){
data$.weights <- weights
if(is.null(weights)){
est <- FUN(data = data, formula = formula, tau = tau)
} else {
est <- FUN(data = data, formula = formula, tau = tau, weights = .weights)
}
intercept = est$coef[["(Intercept)"]]
zeroWorker <- exp(intercept)
return(zeroWorker)
}
Then everything works:
mod2 <- HC(data = mydata, formula = lin.model, tau = 0.2, FUN = rq, weights = mydata$weig)
mod2
# [1] 4.697659e+47

Related

How can I train a glmnet model (Poisson family) with an offset term using the caret package in R?

I want to model insurance claim count using a Poisson glmnet. The data I have at hand contains the number of claims for each policy (which is the response variable), some features about the policy (gender, region, etc.) as well as the duration of the policy (in years). I want to include the log-duration as an offset term, as we usually do in actuarial science. With the cv.glmnet function of the glmnet package, it is straightforward:
library(tidyverse)
library(glmnet)
n <- 100
dat <- tibble(
nb_claims = rpois(n, lambda = 0.5),
duration = runif(n),
x1 = runif(n),
x2 = runif(n),
x3 = runif(n)
)
fit <- cv.glmnet(
x = dat %>% dplyr::select(x1, x2, x3) %>% as.matrix(),
y = dat %>% pull(nb_claims),
family = "poisson",
offset = dat %>% pull(duration) %>% log()
)
fit
However, my goal is to train this model using the train function of the caret package, because of the many advantages it gives. Indeed, validation, preprocessing as well as feature selection is much better with this package. It is straightforward to train a basic glmnet (without an offset term) with caret:
library(caret)
fit <- caret::train(
x = dat %>% dplyr::select(x1, x2, x3) %>% as.matrix(),
y = dat %>% pull(nb_claims),
method = "glmnet",
family = "poisson"
)
fit
Naively, we could try to add the offset argument in the train function:
fit <- caret::train(
x = dat %>% dplyr::select(x1, x2, x3) %>% as.matrix(),
y = dat %>% pull(nb_claims),
method = "glmnet",
family = "poisson",
offset = dat %>% pull(duration) %>% log()
)
fit
Unfortunately, this code throws the error Error : No newoffset provided for prediction, yet offset used in fit of glmnet. This error occurs because the caret::train function doesn't take care to give a value for the newoffset argument in predict.glmnet function.
In this book, they show how to add an offset term to a GLM model by modifying the source code of the caret::train function. It works perfectly. However, the predict.glm function is quite different from the predict.glmnet function, because it does not have the newoffset argument. I tried to modify the source code of the caret::train function, but I am having some trouble because I do not know well enough how this function works.

A simple way to perform this is pass the offset column as part of x and in each fit and predict call pass as x columns of x which are not the offset. While as offset/newoffset pass the x column corresponding to the offset.
In the following example the offest column of x needs to be named "offset" too. This can be changed relatively easy
To create the function we will just use lots of parts from: https://github.com/topepo/caret/blob/master/models/files/glmnet.R
glmnet is peculiar since it needs a loop, the rest is just rinse and reapeat from https://topepo.github.io/caret/using-your-own-model-in-train.html#illustrative-example-1-svms-with-laplacian-kernels
family = "poisson" will be specified throughout, to change this adopt code from https://github.com/topepo/caret/blob/master/models/files/glmnet.R
glmnet_offset <- list(type = "Regression",
library = c("glmnet", "Matrix"),
loop = function(grid) {
alph <- unique(grid$alpha)
loop <- data.frame(alpha = alph)
loop$lambda <- NA
submodels <- vector(mode = "list", length = length(alph))
for(i in seq(along = alph)) {
np <- grid[grid$alpha == alph[i],"lambda"]
loop$lambda[loop$alpha == alph[i]] <- np[which.max(np)]
submodels[[i]] <- data.frame(lambda = np[-which.max(np)])
}
list(loop = loop, submodels = submodels)
})
glmnet_offset$parameters <- data.frame(parameter = c('alpha', 'lambda'),
class = c("numeric", "numeric"),
label = c('Mixing Percentage', 'Regularization Parameter'))
glmnet_offset$grid <- function(x, y, len = NULL, search = "grid") {
if(search == "grid") {
init <- glmnet::glmnet(Matrix::as.matrix(x[,colnames(x) != "offset"]), y,
family = "poisson",
nlambda = len+2,
alpha = .5,
offset = x[,colnames(x) == "offset"])
lambda <- unique(init$lambda)
lambda <- lambda[-c(1, length(lambda))]
lambda <- lambda[1:min(length(lambda), len)]
out <- expand.grid(alpha = seq(0.1, 1, length = len),
lambda = lambda)
} else {
out <- data.frame(alpha = runif(len, min = 0, 1),
lambda = 2^runif(len, min = -10, 3))
}
out
}
So x[,colnames(x) != "offset"] is x while offset is x[,colnames(x) == "offset"]
glmnet_offset$fit <- function(x, y, wts, param, last, ...) {
theDots <- list(...)
## pass in any model weights
if(!is.null(wts)) theDots$weights <- wts
if(!(class(x)[1] %in% c("matrix", "sparseMatrix")))
x <- Matrix::as.matrix(x)
modelArgs <- c(list(x = x[,colnames(x) != "offset"],
y = y,
alpha = param$alpha,
family = "poisson",
offset = x[,colnames(x) == "offset"]),
theDots)
out <- do.call(glmnet::glmnet, modelArgs)
if(!is.na(param$lambda[1])) out$lambdaOpt <- param$lambda[1]
out
}
glmnet_offset$predict <- function(modelFit, newdata, submodels = NULL) {
if(!is.matrix(newdata)) newdata <- Matrix::as.matrix(newdata)
out <- predict(modelFit,
newdata[,colnames(newdata) != "offset"],
s = modelFit$lambdaOpt,
newoffset = newdata[,colnames(newdata) == "offset"],
type = "response") #important for measures to be appropriate
if(is.matrix(out)) out <- out[,1]
out
if(!is.null(submodels)) {
tmp <- as.list(as.data.frame(predict(modelFit,
newdata[,colnames(newdata) != "offset"],
s = submodels$lambda,
newoffset = newdata[,colnames(newdata) == "offset"],
type = "response"),
stringsAsFactors = TRUE))
out <- c(list(out), tmp)
}
out
}
For some reason which I don't understand yet it does not work without the prob slot
glmnet_offset$prob <- glmnet_offset$predict
glmnet_offset$tags = c("Generalized Linear Model", "Implicit Feature Selection",
"L1 Regularization", "L2 Regularization", "Linear Classifier",
"Linear Regression")
glmnet_offset$sort = function(x) x[order(-x$lambda, x$alpha),]
glmnet_offset$trim = function(x) {
x$call <- NULL
x$df <- NULL
x$dev.ratio <- NULL
x
}
library(tidyverse)
library(caret)
library(glmnet)
n <- 100
set.seed(123)
dat <- tibble(
nb_claims = rpois(n, lambda = 0.5),
duration = runif(n),
x1 = runif(n),
x2 = runif(n),
x3 = runif(n)
)
x = dat %>%
dplyr::select(-nb_claims) %>%
mutate(offset = log(duration)) %>%
dplyr::select(-duration) %>%
as.matrix
fit <- caret::train(
x = x,
y = dat %>% pull(nb_claims),
method = glmnet_offset,
)
fit
100 samples
4 predictor
No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 100, 100, 100, 100, 100, 100, ...
Resampling results across tuning parameters:
alpha lambda RMSE Rsquared MAE
0.10 0.0001640335 0.7152018 0.01805762 0.5814200
0.10 0.0016403346 0.7152013 0.01805684 0.5814193
0.10 0.0164033456 0.7130390 0.01798125 0.5803747
0.55 0.0001640335 0.7151988 0.01804917 0.5814020
0.55 0.0016403346 0.7150312 0.01802689 0.5812936
0.55 0.0164033456 0.7095996 0.01764947 0.5783706
1.00 0.0001640335 0.7152033 0.01804795 0.5813997
1.00 0.0016403346 0.7146528 0.01798979 0.5810811
1.00 0.0164033456 0.7063482 0.01732168 0.5763653
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were alpha = 1 and lambda = 0.01640335.
predict(fit$finalModel, x[,1:3], newoffset = x[,4]) #works
This will not work with preprocessing in caret since we pass offset as one of the features. However it will work with recipes since you can define columns on which preprocessing functions will be performed via selections. Se article for details: https://tidymodels.github.io/recipes/articles/Selecting_Variables.html
I haven't had time to error check my code. If any problems occur or if there is a mistake somewhere please comment. Thanks.
You can also post an issue in caret github asking this feature (offset/newoffset) to be added to the model

I tried to change the model info a lot of ways, but it was failing miserably. Below I can propose one solution, may not be the best, but will get you somewhere if your data is sensible.
In the poisson / negative binom .. regression, the offset in factor gets introduced into the regression, you can read more here and here:
where tx is the offset. In glmnet, there is a penalty factor you can introduce for each term, and if you let that be 0 for a term, basically you are not penalizing it and it's always included. We can use that for the offset, and you can see this effect only if you use a dataset that makes some sense (note that in your example dataset, the offsets are numbers that make no sense).
Below I use the insurance claims dataset from MASS:
library(tidyverse)
library(glmnet)
library(MASS)
dat <- Insurance
X = model.matrix(Claims ~ District + Group + Age,data=dat)
Y = dat$Claims
OFF = log(dat$Holders)
fit_cv <- cv.glmnet(
x = X,
y = Y,
family = "poisson",
offset = OFF
)
Now using caret, I will fit it without any training, and using the same lambda obtained from the fit in cv.glmnet. One thing you should note too is that cv.glmnet often uses lambda.1se instead of lambda.min:
fit_c <- caret::train(
x = cbind(X,OFF),
y = Y,
method = "glmnet",
family = "poisson",
tuneGrid=data.frame(lambda=fit_cv$lambda.1se,alpha=1),
penalty=c(rep(1,ncol(X)),0),
trControl = trainControl(method="none")
)
We can see how different are the predictions:
p1 = predict(fit_cv,newx=X,newoffset=OFF)
p2 = predict(fit_c,newx=cbind(X,OFF))
plot(p1,p2)

How to pass a formula as an argument to a function in r?

How do I pass a formula as an argument in R?
The code below works for the first two cases, but when I pass in the formula, I get an error: Error in model.frame.default(formula = formula, weights = weights, na.action = na.omit, : invalid type (closure) for variable '(weights)'
makeModel<-function(formula,weights) {
m <- lm(formula, na.action = na.omit, weights = weights)
return(m);
}
run<-function(t) {
f<-formula(t$y~t$x+t$r)
m <- lm(t$y~t$x+t$r, na.action = na.omit, weights = t$size)
m <- lm(f, na.action = na.omit, weights = t$size)
m <- makeModels(f,t$size)
}
l<-20
x<-seq(0,1,1/l)
y<-sqrt(x)
r=round(runif(n=length(x),min=0,max=.8))
n<-1:(l+1)
size=n/sum(n)
t<-data.frame(x,y,r,n,size)
run(t)
edit 1: this code:
makeModel<-function(formula,weights,t) {
print(class(weights))
m <- lm(formula, na.action = na.omit, weights = weights,data=t)
return(m);
}
run<-function(t) {
f<-formula(y~x+r)
f <- as.formula("t$y~t$x+t$r")
m <- lm(y~x+r, na.action = na.omit, weights = t$size,data=t)
m <- lm(f, na.action = na.omit, weights = t$size,data=t)
m <- makeModel(f,t$size,t)
}
produces:
Error in model.frame.default(formula = formula, data = t, weights = weights, :
invalid type (closure) for variable '(weights)'
edit 2: works:
makeModel <- function(formula, data) {
# size is looked in data first, which is why this works
m <- lm(formula, na.action = na.omit, weights = size, data = data) # works
#m <- lm(formula, na.action = na.omit, weights = data$size, data = data) # fails!
return(m)
}
r is strange!
Does anyone know why the line with: weights=data$size fails?
Edit 3: Got: weights=data$size to work.
makeModel<-function(formula,w,data) {
print(class(weights))
m <- lm(formula, na.action = na.omit, weights = size, data = data) # works
m <- lm(formula, na.action = na.omit, weights = data$size, data = data) #works
m <- lm(formula, na.action = na.omit, weights = w,data=data) # fails
return(m);
}
run<-function(data) {
f<-formula(y~x+r)
#f <- as.formula("t$y~t$x+t$r")
m <- lm(y~x+r, na.action = na.omit, weights = data$size,data=data)
m <- lm(f, na.action = na.omit, weights = data$size,data=data)
m <- makeModel(f,data$size,data)
}
The last one fails with a: Error in eval(extras, data, env) : object 'w' not found

See examples in ?as.formula. You should not be explicitly calling variables from their variable names. The formula should be an abstract, and lm will know which variables to pull from data, which you should specify.
makeModels <- function(formula, data) {
# size is looked in data first, which is why this works
m <- lm(formula, na.action = na.omit, weights = size, data = data)
return(m)
}
run <- function(t) {
f <- formula(y ~ x + r)
m1 <- lm(formula = f, na.action = na.omit, weights = size, data = t)
m2 <- makeModels(formula = f, data = t)
return(list(m1, m2))
}
l<-20
x<-seq(0,1,1/l)
y<-sqrt(x)
r=round(runif(n = length(x), min = 0, max = 0.8))
n<-1:(l+1)
size=n/sum(n)
t<-data.frame(x,y,r,n,size)
run(t)
[[1]]
Call:
lm(formula = f, data = t, weights = t$size, na.action = na.omit)
Coefficients:
(Intercept) x r
0.327154 0.706553 -0.008167
[[2]]
Call:
lm(formula = formula, data = data, weights = size, na.action = na.omit)
Coefficients:
(Intercept) x r
0.327154 0.706553 -0.008167

Avoid assigning an object called t which coincides with the transpose function. Looking at the traceback yields
makeModel<-function(formula,weights) {
m <- lm(formula, na.action = na.omit, weights = weights)
return(m)
}
run<-function(x) {
f<-formula(x$y~x$x+x$r)
m <- lm(x$y~x$x+x$r, na.action = na.omit, weights = x$size)
m <- lm(f, na.action = na.omit, weights = x$size)
m <- makeModel(f,x$size)
}
l<-20
x<-seq(0,1,1/l)
y<-sqrt(x)
r=round(runif(n=length(x),min=0,max=.8))
n<-1:(l+1)
size=n/sum(n)
x<-data.frame(x,y,r,n,size)
run(x)
#R Error in model.frame.default(formula = formula, weights = weights, na.action = na.omit, :
#R invalid type (closure) for variable '(weights)'
traceback()
#R 7: model.frame.default(formula = formula, weights = weights, na.action = na.omit,
#R drop.unused.levels = TRUE)
#R 6: stats::model.frame(formula = formula, weights = weights, na.action = na.omit,
#R drop.unused.levels = TRUE)
#R 5: eval(mf, parent.frame())
#R 4: eval(mf, parent.frame())
#R 3: lm(formula, na.action = na.omit, weights = weights) at #3
#R 2: makeModel(f, x$size) at #5
#R 1: run(t)
Now debug(model.frame.default) shows that this line is where it goes wrong due to these line and this line. The reason is that it calls
eval(list(weights = weights), environment(formula), environment(formula))
and there is no weights object assigned in the run environment (the environment where the formula is assigned) so instead it yields stats::weights. Three solutions are
makeModel <- function(formula, weights) {
environment(formula) <- environment()
lm(formula, na.action = na.omit, weights = weights)
}
run<-function(x) {
f <- x$y ~ x$x + x$r
makeModel(f, x$size)
}
x1 <- run(x)
makeModel <- function(formula, weights) {
cl <- match.call()
cl[[1L]] <- quote(lm)
cl$na.action <- quote(na.omit)
eval(cl, parent.frame())
}
run<-function(x) {
f <- x$y ~ x$x + x$r
makeModel(f, x$size)
}
x2 <- run(x)
makeModel <- function(formula, weights, x) {
cl <- match.call()
cl[[1]] <- quote(lm)
cl$x <- NULL
cl[c("data", "formula", "na.action")] <-
list(quote(x), formula, quote(na.omit))
eval(cl)
}
run<-function(x) {
f <- y ~ x + r
makeModel(f, size, x)
}
x3 <- run(x)
stopifnot(all.equal(coef(x1), coef(x2)))
stopifnot(all.equal(coef(x1), coef(x3), check.attributes = FALSE))
As an example, the first solution above implies that
eval(list(weights = weights), environment(formula), environment(formula))
succeeds since there is a weights object assigned in the environment of formula. The second solution makes a call in the run environment with weights = x$size and thus succeeds. The third one is like Roman Luštrik's answer though his solution is much more clean than the third I propose if you know that the weights argument is always the size column. Here the call is
eval(list(weights = size), data, environment(formula))
which works since size is a column in data.

Error with bootmer and confint for glmer

I'm running into an error that I can't find any documentation on when I try to bootstrap a glmer object:
glm2 <- glmer(RT~valence+location+first_location+Trial_num +
(1+Trial_num|id)+(1|Trial_num),
family=inverse.gaussian(log),
control = glmerControl(optimizer = "nloptwrap",
calc.derivs = FALSE), data=df_long)
The error is:
Error in lme4::.simulateFun(object = , :
could not find function "sfun
This is regardless of whether I try bootMer or confint:
bootMer_out <- bootMer(glm2,FUN=fixef, nsim=300)
confint_out <- confint(glm2, method="boot")
When I run as an lmer object I don't have the issue with bootstrapping. i.e.
lm2 <- glmer(RT~valence+location+first_location+Trial_num + (1+Trial_num|id)+(1|Trial_num), family=inverse.gaussian(log), control = glmerControl(optimizer = "nloptwrap", calc.derivs = FALSE), data=df_long))
Does it have to do with the link function? Is there a workaround? I couldn't find function 'sfun' in the simulateFun documentation either. I could always just do the transformation on the data separately and use lmer instead of glmer, but if anyone has some insight that would be great (since I'm curious now).

As pointed out by #user20650, you'll need to add a simulation method for the inverse gaussian family.
For example, I added these to a branch on my lme4 fork under predict.R:
rinvgauss <- function(n, mu, lambda) {
# transcribed from https://en.wikipedia.org/wiki/Inverse_Gaussian_distribution
nu <- rnorm(n)
y <- nu^2
x <- mu + (mu^2 * y)/(2*lambda) - (mu/(2*lambda)) * sqrt(4*mu*lambda*y + mu^2*y^2)
z <- runif(n)
ifelse(z <= mu/(mu + x), x, mu^2/x)
}
inverse.gaussian_simfun <- function(object, nsim, ftd = fitted(object),
wts = weights(object)) {
if (any(wts != 1)) message("using weights as inverse variances")
dispersion <- sum((weights(object, 'working') *
resid(object, 'working')^2)[weights(object, 'working')>0])/df.residual(object)
rinvgauss(nsim * length(ftd), mu = ftd,
lambda = wts/dispersion)
}
# ... skip a few
simfunList <- list(gaussian = gaussian_simfun,
binomial = binomial_simfun,
poisson = poisson_simfun,
Gamma = Gamma_simfun,
negative.binomial = negative.binomial_simfun,
inverse.gaussian = inverse.gaussian_simfun)
Here's an example:
# devtools::install_github('aforren1/lme4', ref = 'add_invgauss_simulate')
library(lme4)
set.seed(1)
dat <- data.frame(y = lme4:::rinvgauss(1000, 3, 4),
x = runif(1000),
subj = factor(rep(1:10, 100)))
mod <- glmer(y ~ x + (1|subj),
data = dat,
family = inverse.gaussian(link='log'))
# ~60 secs on my laptop
(boots <- confint(mod, method = 'boot', nsim = 100, parm = 'beta_'))
2.5 % 97.5 %
(Intercept) 1.0044813 1.248774
x -0.2158155 0.161213
(walds <- confint(mod, method = 'Wald', parm = 'beta_'))
2.5 % 97.5 %
(Intercept) 1.000688 1.2289971
x -0.205546 0.1644621
You can see that the bootstrap method gives (roughly) the same results as the Wald method.

Finite mixture of tweedie

I'm trying to estimate a finite mixture of tweedie (or compound Poisson-gamma) distributions. I have scoured any resources I can think of, without finding any resources on how to do this.
I am currently trying to use the flexmix package in R writing a different M-step driver, as outlined in the flexmix vignette on pages 12-14. Here is my code, which relies on the cplm package:
tweedieClust <- function(formula = .~.,offset = NULL){
require(tweedie)
require(cplm)
require(plyr)
require(dplyr)
retval <- new("FLXMC", weighted = TRUE, formula = formula, dist = "tweedie",
name = "Compound Poisson Clustering")
retval#defineComponent <- expression ({
predict <- function(x, ...) {
pr <- mu
}
logLik <- function(x, y, ...){
dtweedie(y, xi = p, mu = mu, phi = phi) %>%
log
}
new("FLXcomponent",
parameters=list(coef=coef),
logLik=logLik, predict=predict,
df=df)
})
retval#fit <- function (x, y, w, component) {
fit <- cpglm(formula = y ~ x, link = "log", weights=w, offset=offset)
with(list(coef = coef(fit), df = ncol(x),mu = fit$fitted.values,
p = fit$p, phi = fit$phi),
eval(retval#defineComponent))
}
retval
}
However, this results in the following error:
Error in dtweedie(y, xi = p, mu = mu, phi = phi) :
binary operation on non-conformable arrays
Has anyone done or seen a finite mixture of tweedie distributions? Can you point me in the right direction to accomplish this, using flexmix or otherwise?

The problem is somewhere in the weights part, if you remove it, it works:
tweedieClust <- function(formula = .~.,offset = NULL){
require(tweedie)
require(statmod)
require(cplm)
require(plyr)
require(dplyr)
retval <- new("FLXMC", weighted = F, formula = formula, dist = "tweedie",
name = "Compound Poisson Clustering")
retval#defineComponent <- expression ({
predict <- function(x, ...) {
pr <- mu
}
logLik <- function(x, y, ...){
dtweedie(y, xi = p, mu = mu, phi = phi) %>%
log
}
new("FLXcomponent",
parameters=list(mu=mu,xi=p,phi=phi),
logLik=logLik, predict=predict,
df=df)
})
retval#fit <- function (x, y, w, component) {
fit <- cpglm(formula = End~.,data=dmft, link = "log")
with(list(df = ncol(x), mu = fit$fitted.values,
p = fit$p, phi = fit$phi),
eval(retval#defineComponent))
}
retval
}
example:
library(flexmix)
data("dmft", package = "flexmix")
m1 <- flexmix(End ~ .,data=dmft, k = 4, model = tweedieClust())

Error with custom SVM model for tuning in caret

I'm trying to follow this link to create a custom SVM and run it through some cross-validations. My primary reason for this is to run Sigma, Cost and Epsilon parameters in my grid-search and the closest caret model (svmRadial) can only do two of those.
When I attempt to run the code below, I get the following error all over the place at every iteration of my grid:
Warning in eval(expr, envir, enclos) :
model fit failed for Fold1.: sigma=0.2, C=2, epsilon=0.1 Error in if (!isS4(modelFit) & !(method$label %in% c("Ensemble Partial Least Squares Regression", :
argument is of length zero
Even when I replicate the code from the link verbatim, I get a similar error and I'm not sure how to solve it. I found this link which goes through how the custom models are built and I see where this error is referenced, but still not sure what the issue is. I have my code below:
#Generate Tuning Criteria across Parameters
C <- c(1,2)
sigma <- c(0.1,.2)
epsilon <- c(0.1,.2)
grid <- data.frame(C,sigma)
#Parameters
prm <- data.frame(parameter = c("C", "sigma","epsilon"),
class = rep("numeric", 3),
label = c("Cost", "Sigma", "Epsilon"))
#Tuning Grid
svmGrid <- function(x, y, len = NULL) {
expand.grid(sigma = sigma,
C = C,
epsilon = epsilon)
}
#Fit Element Function
svmFit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
ksvm(x = as.matrix(x), y = y,
type = "eps-svr",
kernel = rbfdot,
kpar = list(sigma = param$sigma),
C = param$C,
epsilon = param$epsilon,
prob.model = classProbs,
...)
}
#Predict Element Function
svmPred <- function(modelFit, newdata, preProc = NULL, submodels = NULL)
predict(modelFit, newdata)
#Sort Element Function
svmSort <- function(x) x[order(x$C),]
#Model
newSVM <- list(type="Regression",
library="kernlab",
loop = NULL,
parameters = prm,
grid = svmGrid,
fit = svmFit,
predict = svmPred,
prob = NULL,
sort = svmSort,
levels = NULL)
#Train
tc<-trainControl("repeatedcv",number=2, repeats = 0,
verboseIter = T,savePredictions=T)
svmCV <- train(
Y~ 1
+ X1
+ X2
,data = data_nn,
method=newSVM,
trControl=tc
,preProc = c("center","scale"))
svmCV

After viewing the second link provided, I decided to try and include a label into the Model's parameters and that solved the issue! It's funny that it worked because the caret documentation says that value is optional, but if it works I can't complain.
#Model
newSVM <- list(label="My Model",
type="Regression",
library="kernlab",
loop = NULL,
parameters = prm,
grid = svmGrid,
fit = svmFit,
predict = svmPred,
prob = NULL,
sort = svmSort,
levels = NULL)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Passing the weights argument to a regression function inside an R function - r

Related

How can I train a glmnet model (Poisson family) with an offset term using the caret package in R?

How to pass a formula as an argument to a function in r?

Error with bootmer and confint for glmer

Finite mixture of tweedie

Error with custom SVM model for tuning in caret

Categories

Resources