"Model is empty! error using tune for svm method in package e1071 - r

I'm trying to tune hyperparameters epsilon and cost using the tune function in e1071, but I keep getting this error whenever I try to expand the ranges of values that I want to test:
"Error in predict.svm(ret, xhold, decision.values = TRUE) :
Model is empty!"
I'm dealing with the regression application, not a classification one, and the data I'm using is for density profiles, where "x" describes the position alongside a board and "y corresponds to the value of the density measured. This is the code I'm using:
model <- tune(svm, y~x, data = profiles, ranges = list(cost = 2^(0:10), epsilon = 10^(-10:0), tunecontrol = tune.control(cross = 5))
The data is all numeric (doubles) and the problem seems to occur only when I try to test such a large range of values. Has anybody experienced a similar issue?

It may be the range of your cost and epsilon values. I ran into the same problem, i.e. svm regression with all numeric data. I was tuning using a range of epsilon values from .1 to 10 and I was getting the model empty error. I then reduced the epsilon range from .1 to 1 and it was able to converge with no errors. There is probably some interaction between cost and epsilon that generates unstable predictions, i.e. high cost and high epsilon are not kosher.

Related

Confidence Interval of the predicted mean of a LMER object for large dataset

I would like to get the confidence interval (CI) for the predicted mean of a Linear Mixed Effect Model on a large dataset (~40k rows), which is itself a subset of an even larger dataset. This CI is then used for estimating the uncertainty of another calculation that uses the mean and its related CI as input data.
I managed to create a prediction estimate and interval for the full dataset, but a Prediction Interval is not the same and much larger than a CI. Beside bootstrapping (which takes way too much time with this much data), I cannot find a method that would allow me to estimate a CI – either because it is throwing errors or because it only offers to calculate Prediction intervals.
I quite recently moved into LME and I might therefore have overseen some obvious method.
Here is what I did so far in more detail:
The input data is confidential and I can therefore unfortunately not share any extract.
But in general, we have one dependent variable (y) representing the probability of a event and 2 categorical (c1 and c2) and two continuous variables (x1 and x2) with some weighting factor (w1). Some values in the dataset are missing. An extract of the first rows of the data could look like the example below:
c1
c2
x1
x2
w1
y
London
small
1
10
NA
NA
London
small
1
20
NA
NA
London
large
2
10
0.2
0.1
Paris
small
1
10
0.2
0.23
Paris
large
2
10
0.3
0.3
Based on this input data, I am then fitting a LMER model in the following form:
lmer1 <- lme4::lmer( y ~ x1 * poly(x2, 5) + ((x1 * poly(x2 ,5)) | c1),
data = df,
weights = w1,
control = lme4::lmerControl(check.conv.singular = lme4::.makeCC(action = "ignore", tol = 1e-3)))
This runs for some minutes and returns several warnings:
Warning messages: 1: In optwrap(optimizer, devfun, getStart(start,
rho$pp), lower = rho$lower, : convergence code 5 from nloptwrap:
NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above)
was reached.
2: In checkConv(attr(opt, “derivs”), opt$par, ctrl =
control$checkConv, : unable to evaluate scaled gradient
3: In checkConv(attr(opt, “derivs”), opt$par, ctrl =
control$checkConv, : Model failed to converge: degenerate Hessian with
11 negative eigenvalues
I increased the MAXEVAL parameter but this still did not help to get rid of the warnings and I found that despite these warnings, the model is still fitted. I therefore started to apply different methods to get a prediction of the mean for the whole dataset and the related CI for the mean.
predictInterval
I started with creating a Prediction Interval for the full dataset:
predictions <- merTools::predictInterval(lmer1,
newdata = df,
which = "full",
n.sims = 1000,
include.resid.var = FALSE,
level=0.95,
stat="mean")
However, as stated above, the Prediction Interval is not the same as the CI (see also https://datascienceplus.com/prediction-interval-the-wider-sister-of-confidence-interval/).
I found that the general predict function has the option to set interval to either “prediction” or “confidence”, but this option does not exist with the prediction from a LMER object. And I could not find another possibility to switch from Prediction Interval to CI – even though I would believe that the data drawn should be sufficient to do this.
confint
I then saw that there is a function called “confint”, but when running this function I get the following error:
predicition_ci = lme4::confint.merMod(lmer1)
Computing profile confidence intervals ...
Error in zeta(shiftpar, start = opt[seqpar1][-w]) : profiling
detected new, lower deviance
In addition: Warning messages:
1: In commonArgs(par, fn, control, environment()) : maxfun < 10 *
length(par)^2 is not recommended.
2: In optwrap(optimizer, devfun, x#theta, lower = x#lower, calc.derivs
= TRUE, : convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
I found this thread (Error when estimating CI for GLMM using confint()), which said that I need to reduce the “devtol” parameter by setting a different profile. But doing so results in the same error:
lmer1_devtol = profile(lmer1, devtol = 1e-7)
Error in zeta(shiftpar, start = opt[seqpar1][-w]) : profiling
detected new, lower deviance
In addition: Warning messages:
1: In commonArgs(par, fn, control, environment()) : maxfun < 10 *
length(par)^2 is not recommended.
2: In optwrap(optimizer, devfun, x#theta, lower = x#lower, calc.derivs
= TRUE, : convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
add_ci
I found the function “add_ci” but this again resulted in another error:
predictions_ci = ciTools::add_ci(df, lmer1,
alpha = 0.05)
Error in levelfun(r, n, allow.new.levels = allow.new.levels) : new
levels detected in newdata
I then set the new “allow.new.levels” parameter to TRUE like in the description of the prediction function, but this parameter seems not to be carried through:
predictions_ci = ciTools::add_ci(df, lmer1,
alpha = 0.05,
allow.new.levels = TRUE)
Error in levelfun(r, n, allow.new.levels = allow.new.levels) : new
levels detected in newdata
Diag
I found a method to calculate CI intervals for the sleepstudy data, which uses a matrix conversion with diag.
Designmat <- model.matrix(as.formula("y ~ x1 * poly(x2, 5)")[-2], df)
predvar <- diag(Designmat %*% vcov(lmer1) %*% t(Designmat))
#With new data
newdat = df
newdat$pred <- predict(lmer1, newdat, allow.new.levels = TRUE)
Designmat <- model.matrix(formula(lmer1)[-2], newdat)
But the diag method does not work for such large datasets.
bootMer
As said earlier, the boostrapping of the confidence interval with bootMer is taking too much time for this subset of data (I started it 1 day ago and it is still running). I tried to use some parallel processing with the sleepstudy sample data but this could not increase the speed dramatically, so I would assume it will have the same effect on my large dataset.
merBoot <- bootMer(lmer1, predict, nsim = 1000, re.form = NA)
Others
I have read through all these post (and more), but none of them could help me to get the CI in reasonable time for my case. But maybe I have overseen something.
https://stats.stackexchange.com/questions/344012/confidence-intervals-from-bootmer-in-r-and-pros-cons-of-different-interval-type
https://stats.stackexchange.com/questions/117641/how-trustworthy-are-the-confidence-intervals-for-lmer-objects-through-effects-pa
How to get coefficients and their confidence intervals in mixed effects models?
Error when estimating CI for GLMM using confint()
https://stats.stackexchange.com/questions/235018/r-extract-and-plot-confidence-intervals-from-a-lmer-object-using-ggplot
How to get confidence intervals for lmer object?
Confidence intervals for the predicted probabilities from glmer object, error with bootMer
https://rdrr.io/cran/ciTools/man/add_ci.lmerMod.html
Error when estimating Confidence interval in lme4
https://fromthebottomoftheheap.net/2018/12/10/confidence-intervals-for-glms/
https://cran.r-project.org/web/packages/merTools/vignettes/Using_predictInterval.html
https://drewtyre.rbind.io/classes/nres803/week_12/lab_12/
Unsurprising to me but unfortunate for you, nonconvergence of mixed model estimation and difficulty in generating confidence intervals results from the misuse of a linear model for data with a limited dependent variable. "Despite these warnings, the model is still fitted" is a dangerous practice, as iterations are not to be used from predictions if not converged. As you described, the dependent variable (y) represents the probability of an event, which is a continuous variable between zero and one. Using a linear model to predict probability constitutes a linear probability regression, which requires censoring predicted outcomes (e.g. forcing all predicted values greater than .99 to be .99 while forcing all predicted values smaller than .01 to be .01) and adjusting for heterogenous variances using weighted least squares (see https://bookdown.org/ccolonescu/RPoE4/heteroskedasticity.html). Having continuous variables produce both fixed and random effects also burden the convergence, while some or all the random effects of continuous variables may not be necessary. The use of weights can be also problematic.
Instead of a linear probability regression, beta regression works best for dependent variables which are proportions and probabilities. Beta regression without random effects is done in betareg::betareg(). glmmTMB::glmmTMB() handles beta regression with random effects. Start from a simple setting where only the intercept has random effects such as
glmmTMB(y ~ 1 + x1 * poly(x2, 5) + c2 + (1 | c1), family = list(family = "beta", link ="logit"), data = df)
You may compare the result with glmer() and lmer()
glmer(y ~ 1 + x1 * poly(x2, 5) + c2 + (1 | c1), family = gaussian(link = "logit"), data = df)
lmer(log(y/(1-y)) ~ 1 + x1 * poly(x2, 5) + c2 + (1 | c1), data = df)
glmer() and lmer() with the above specifications are equivalent, and both assume that predicting log(y/(1-y)) has normal residuals, while glmmTMB() assumes that y follows a gamma distribution. lmer() results are easier to explain and receive wider support from other packages, since they are linear models. On the other hand, glmmTMB() may fit better according to AIC, BIC, and log likelihood. Note that all three requires y strictly in (0, 1) noninclusive. To include occasional zeros and ones, manipulate observations at both boundaries by introducing a small tolerance usually equal to half of the smaller distance from a boundary to its closest observed value (see https://stats.stackexchange.com/questions/109702 and https://graphworkflow.com/eda/bounded01/). For probabilities with either or both of many zeros and ones, zero-, one-, and zero-one–inflated beta regression is fitted via gamlss::gamlss(). See Korosteleva, O. (2019). Advanced regression models with SAS and R. CRC Press.
Add random effects of slopes if necessary according to likelihood ratio tests. Make sure there are enough levels in c1 (e.g. more than 10 different cities) to necessitate mixed effect models. The {glmmTMB} package extends glm() and glmer(). Its alternative {brms} package is built for Bayesian approach. Note that the weights = argument in glmmTMB() as in glm() specifies that values in weights are inversely proportional to the dispersions and are not automatically scaled to sum to one unless integer values which specifies number of observation units. Therefore, you need to investigate what w1 stands for and evaluate how to use it in modeling.
merTools::predictInterval() generates many kinds of intervals for mixed models, some comparable to confidence intervals and prediction intervals in linear models without random effects. However, it supports lmer() model objects only. See https://cran.r-project.org/web/packages/merTools/vignettes/merToolsIntro.html and https://cran.r-project.org/web/packages/merTools/vignettes/Using_predictInterval.html.
predictInterval(lmer(), include.resid.var = F) includes uncertainty from both fixed and random effects of all coefficients including the intercept but excludes variation from multiple measurements of the same group or individual. This can be considered similar to prediction intervals of linear models without random effects. predictInterval(lmer(), include.resid.var = F, fix.intercept.variance = T) generates shorter CI than above by accounting for covariance between the fixed and random effects of the intercept. predictInterval(lmer(), include.resid.var = F, ignore.fixed.terms = "(Intercept)") also shortens CI by removing uncertainty from the fixed effect of the intercept. If there are no random slopes other than random intercept, the last two methods are comparable to confidence intervals of of linear models without random effects. confint(lmear()) and confint(profile(lmear())) generates confidence intervals of modal parameters such as a slope, so they do not produce confidence intervals of predicted outcomes.
You may also find the following functions and packages useful for generating CIs of mixed effect models.
ggeffect() {ggeffects} predictions() {marginaleffects} and margins() prediction() {margins} {predictions}
They can produce predictions averaged over observed distribution of covariates, instead of making predictions by holding some predictors at specific values such as means or modes which can be misleading and not useful.

LASSO-type regressions with non-negative continuous dependent variable (dependent var)

I am using "glmnet" package (in R) mostly to perform regularized linear regression.
However I am wondering if it can perform LASSO-type regressions with non-negative (integer) continuous (dependent) outcome variable.
I can use family = poisson, but the outcome variable is not specifically "count" variable. It is just a continuous variable with lower limit 0.
I aware of "lower.limits" function, but I guess it is for covariates (independent variables). (Please correct me if my understanding of this function not right.)
I look forward to hearing from you all! Thanks :-)
You are right that setting lower limit in glmnet is meant for covariates. Poisson will set a lower limit to zero because you exponentiate to get back the "counts".
Going along those lines, most likely it will work if you transform your response variable. One quick way is to take the log of your response variable, do the fit and transform it back, this will ensure that it's always positive. you have to deal with zeros
An alternative is a power transformation. There's a lot to think about and I can only try a two parameter box-cox with a dataset since you did not provide yours:
library(glmnet)
library(mlbench)
library(geoR)
data(BostonHousing)
data = BostonHousing
data$chas=as.numeric(data$chas)
# change it to min 0 and max 1
data$medv = (data$medv-min(data$medv))/diff(range(data$medv))
Then here I use a quick approximation via pca (without fitting all the variables) to get the suitable lambda1 and lambda2 :
bcfit = boxcoxfit(object = data[,14],
xmat = prcomp(data[,-14],scale=TRUE,center=TRUE)$x[,1:2],
lambda2=TRUE)
bcfit
Fitted parameters:
lambda lambda2 beta0 beta1 beta2 sigmasq
0.42696313 0.00001000 -0.83074178 -0.09876102 0.08970137 0.05655903
Convergence code returned by optim: 0
Check the lambda2, it is the one thats critical for deciding whether you get a negative value.. It should be rather small.
Create the functions to power transform:
bct = function(y,l1,l2){((y+l2)^l1 -1)/l1}
bctinverse = function(y,l1,l2){(y*l1+1)^(1/l1) -l2}
Now we transform the response:
data$medv_trans = bct(data$medv,bcfit$lambda[1],bcfit$lambda[2])
And fit glmnet:
fit = glmnet(x=as.matrix(data[,1:13]),y=data$medv_trans,nlambda=500)
Get predictions over all lambdas, and you can see there's no negative predictions once you transform back:
pred = predict(fit,as.matrix(data[,1:13]))
range(bctinverse(pred,bcfit$lambda[1],bcfit$lambda[2]))
[1] 0.006690685 0.918473356
And let's say we do a fit with cv:
fit = cv.glmnet(x=as.matrix(data[,1:13]),y=data$medv_trans)
pred = predict(fit,as.matrix(data[,1:13]))
pred_transformed = bctinverse(pred,bcfit$lambda[1],bcfit$lambda[2]
plot(data$medv,pred_transformed,xlab="orig response",ylab="predictions")

Using ROC curve to find optimum cutoff for my weighted binary logistic regression (glm) in R

I have build a binary logistic regression for churn prediction in Rstudio. Due to the unbalanced data used for this model, I also included weights. Then I tried to find the optimum cutoff by try and error, however To complete my research I have to incorporate ROC curves to find the optimum cutoff. Below I provided the script I used to build the model (fit2). The weight is stored in 'W'. This states that the costs of wrongly identifying a churner is 14 times as large as the costs of wrongly identifying a non-churner.
#CH1 logistic regression
library(caret)
W = 14
lvl = levels(trainingset$CH1)
print(lvl)
#if positive we give it the defined weight, otherwise set it to 1
fit_wts = ifelse(trainingset$CH1==lvl[2],W,1)
fit2 = glm(CH1 ~ RET + ORD + LVB + REVA + OPEN + REV2KF + CAL + PSIZEF + COM_P_C + PEN + SHOP, data = trainingset, weight=fit_wts, family=binomial(link='logit'))
# we test it on the test set
predlog1 = ifelse(predict(fit2,testset,type="response")>0.5,lvl[2],lvl[1])
predlog1 = factor(predlog1,levels=lvl)
predlog1
confusionMatrix(pred,testset$CH1,positive=lvl[2])
For this research I have also build ROC curves for decision trees using the pROC package. However, of course the same script does not work the same for a logistic regression. I have created a ROC curve for the logistic regression using the script below.
prob=predict(fit2, testset, type=c("response"))
testset$prob=prob
library(pROC)
g <- roc(CH1 ~ prob, data = testset, )
g
plot(g)
Which resulted in the ROC curve below.
How do I get the optimum cut off from this ROC curve?
Getting the "optimal" cutoff is totally independent of the type of model, so you can get it like you would for any other type of model with pROC. With the coords function:
coords(g, "best", transpose = FALSE)
Or directly on a plot:
plot(g, print.thres=TRUE)
Now the above simply maximizes the sum of sensitivity and specificity. This is often too simplistic and you probably need a clear definition of "optimal" that is adapted to your use case. That's mostly beyond the scope of this question, but as a starting point you should a look at Best Thresholds section of the documentation of the coords function for some basic options.

Type measure differences in glmnet package?

What is the difference between using "mse" and "class" in the glmnet package?
log_x <- model.matrix(response~.,train)
log_y <- ifelse(train$response=="good",1,0)
log_cv <- cv.glmnet(log_x,log_y,alpha=1,family="binomial", type.measure = "class")
summary(log_cv)
plot(log_cv)
vs.
log_x <- model.matrix(response~.,train)
log_y <- ifelse(train$response=="good",1,0)
log_cv <- cv.glmnet(log_x,log_y,alpha=1,family="binomial", type.measure = "mse")
summary(log_cv)
plot(log_cv)
I'm noticing that I'm getting a slightly different curve, or smootness in my plot, and a few % difference in accuracy. But for predicting a binnomial class response is one type measure more appropriate than the other?
It depends on your case study and what you want to learn from your model. From the help files
The default is type.measure="deviance", which uses squared-error
for gaussian models (a.k.a type.measure="mse" there) [...]. type.measure="class"
applies to binomial and multinomial logistic regression only, and gives misclassification
error
Therefore, you have to ask yourself whether, in your problem, you want to minimize misclassification error or the mean squared error.
There is no straight forward answer to which is best. They are two different statistics from which the model decides what is the best penalization parameter to go for given the different models generated by the cross validation.

Difference between glmnet() and cv.glmnet() in R?

I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code:
# de <- data imported from sql connection
x <- model.matrix(~.,data = de[,2:7])
y <- (de[,1])
reg <- cv.glmnet(x,y, family = "poisson", alpha = 1)
reg1 <- glmnet(x,y, family = "poisson", alpha = 1)
**Co <- coef(?reg or reg1?,s=???)**
summ <- summary(Co)
c <- data.frame(Name= rownames(Co)[summ$i],
Lambda= summ$x)
c2 <- c[with(c, order(-Lambda)), ]
The beginning imports a large amount of data from my database in SQL. I then put it in matrix format and separate the response from the predictors.
This is where I'm confused: I can't figure out exactly what the difference is between the glmnet() function and the cv.glmnet() function. I realize that the cv.glmnet() function is a k-fold cross-validation of glmnet(), but what exactly does that mean in practical terms? They provide the same value for lambda, but I want to make sure I'm not missing something important about the difference between the two.
I'm also unclear as to why it runs fine when I specify alpha=1 (supposedly the default), but not if I leave it out?
Thanks in advance!
glmnet() is a R package which can be used to fit Regression models,lasso model and others. Alpha argument determines what type of model is fit. When alpha=0, Ridge Model is fit and if alpha=1, a lasso model is fit.
cv.glmnet() performs cross-validation, by default 10-fold which can be adjusted using nfolds. A 10-fold CV will randomly divide your observations into 10 non-overlapping groups/folds of approx equal size. The first fold will be used for validation set and the model is fit on 9 folds. Bias Variance advantages is usually the motivation behind using such model validation methods. In the case of lasso and ridge models, CV helps choose the value of the tuning parameter lambda.
In your example, you can do plot(reg) OR reg$lambda.min to see the value of lambda which results in the smallest CV error. You can then derive the Test MSE for that value of lambda. By default, glmnet() will perform Ridge or Lasso regression for an automatically selected range of lambda which may not give the lowest test MSE. Hope this helps!
Hope this helps!
Between reg$lambda.min and reg$lambda.1se ; the lambda.min obviously will give you the lowest MSE, however, depending on how flexible you can be with the error, you may want to choose reg$lambda.1se, as this value would further shrink the number of predictors. You may also choose the mean of reg$lambda.min and reg$lambda.1se as your lambda value.

Resources