cv.glmnet and Leave-one out CV - r

I'm trying to use the function cv.glmnet to find the best lambda (using the RIDGE regression) in order to predict the class of belonging of some objects.
So the code that I have used is:
CVGLM<-cv.glmnet(x,y,nfolds=34,type.measure = "class",alpha=0,grouped = FALSE)
actually I'm not using a K-fold cross validation because my size dataset is too small, in fact I have only 34 rows. So, I'm using in nfolds the number of my rows, to compute a Leave-one out CV.
Now, I have some questions:
1) First of all: Does cv.glmnet function tune the Hyperpameter lambda or also test the "final model"?
2)One time got the best lambda, what have I to do? Have I to use predict function?
If yes, which data I have to use if I use all data to find lambda since I have used LOO CV?
3)How can I calculate R^2 from cv.glmnet function?

Here is an attempt to answer your questions:
1) cv.glmnet tests the performance of each lambda by using the cross validation of your specification. Here is an example:
library(glmnet)
data(iris)
find best lambda for iris prediction:
CVGLM <- cv.glmnet(as.matrix(iris[,-5]),
iris[,5],
nfolds = nrow(iris),
type.measure = "class",
alpha = 0,
grouped = FALSE,
family = "multinomial")
the miss classification error of best lambda is in
CVGLM$cvm
#output
0.06
If you test this independently using LOOCV and best lambda:
z <- lapply(1:nrow(iris), function(x){
fit <- glmnet(as.matrix(iris[-x,-5]),
iris[-x,5],
alpha = 0,
lambda = CVGLM$lambda.min,
family="multinomial")
pred <- predict(fit, as.matrix(iris[x,-5]), type = "class")
return(data.frame(pred, true = iris[x,5]))
})
z <- do.call(rbind, z)
and check the error rate it is:
sum(z$pred != z$true)/150
#output
0.06
so it looks like there is no need to test the performance using the same method as in cv.glmnet since it will be the same.
2) when you have the optimal lambda you should fit a model on the whole data set using glmnet function. What you do after with the model is entirely up to you. Most people train a model to predict something.
3) what is R^2 for a classification problem? If you could explain that then you could calculate it.
R^2 = Explained variation / Total variation
what is this in terms of classes?
Anyhow R^2 is not used for classification but rather AUC, deviance, accuracy, balanced accuracy, kappa, joudens J and so on - most of these are used for binary classification but some are available for multinomial.
I suggest this as further reading

Related

predict function with lasso regression

I am trying to implement lasso regression for my sales prediction problem. I am using glmnet package and cv.glmnet function to train the model.
library(glmnet)
set.seed(123)
model = cv.glmnet(as.matrix(x = train[, -which(names(train) %in% "Sales")]),
y = train$Sales,
alpha = 1,
lambda = 10^seq(4,-1,-0.1))
best_lambda = model$lambda.min
lasso_predictions_valid <- predict(model,s = best_lambda,type = "coefficients")
After I read few articles about implementing lasso regression I still don't know how to add my test data on which I want to apply the prediction. There is newx argument to be added to predict function that I do not know also. I mean in most regression types we have newdata or data argument that we fill our test data to it.
I think there is an error in your lasso_predictions_valid, you shouldn't put valid$sales as your newx, as I believe this is the actual sales number.
Once you have created the model with the train set, then for newx you need to pass matrix values of x that you want to make predictions on, I guess in this case it will be your validation set.
Looking at your example code above, I think your predict line should be something like:
lasso_predictions_valid <- predict(model, s = best_lambda,
newx = as.matrix(valid[, -which(names(valid) %in% "Sales")]),
type = "coefficients")
Then you should run your RMSE() line:
RMSE(lasso_predictions_valid, valid$Sales)

R coefficients of glmnet::cvfit

As far as I am concerned, cvfit does a K fold cross validation, which means that in each time, it separates all the data into training & validation set. For every fixed lambda, first it uses training data to get a coefficient vector. Then implements this constructed model to predict on the validation set to get the error.
Hence, for K fold CV, it has k coefficient vectors (each is generated from a training set). So what does
coef(cvfit)
get?
Here is an example:
x <- iris[1:100,1:4]
y <- iris[1:100,5]
y <- factor(y)
fit <- cv.glmnet(data.matrix(x), y, family = "binomial", type.measure = "class",alpha=1,nfolds=3,standardize = T)
coef(fit, s=c(fit$lambda.min,fit$lambda.1se))
fit1 <- glmnet(data.matrix(x), y, family = "binomial",
standardize = T,
lambda = c(fit$lambda.1se,fit$lambda.min))
coef(fit1)
in fit1, I use the whole dataset as the training set, seems that the coefficients of fit1 and fit are just the same. That's why?
Thanks in advance.
Although cv.glmnet checks model performance by cross-validation, the actual model coefficients it returns for each lambda value are based on fitting the model with the full dataset.
The help for cv.glmnet (type ?cv.glmnet) includes a Value section that describes the object returned by cv.glmet. The returned list object (fit in your case) includes an element called glmnet.fit. The help describes it like this:
glmnet.fit a fitted glmnet object for the full data.

Query about ridge regression - optimum value of lambda

I have a query about the cv.glmnet() function in R which is supposed to find the "optimum" value of the parameter lambda for ridge regression.
In the example code below, if you experiment a bit with values of lambda that are smaller than the one that cv.glmnet() gives, you will find that the error sum of squares actually is much smaller than what cv.fit$lambda.min gives.
I have noticed this with many datasets. Even the example in the well known book "Introduction to Statistical Learning", (ISLR) by Gareth James et al has this problem. (Section 6.6.1 using the Hitters dataset). The actual value of lambda that minimizes the MSE is smaller than what the ISLR book gives. This is true both on the train data as well as new test data.
What is the reason for this? So, what exactly is cv.fit$lambda.min returning?
Ravi
data(mtcars)
y = mtcars$hp
X = model.matrix(hp~mpg+wt+drat, data=mtcars)[ ,-1]
X
lambdas = 10^seq(3, -2, by=-.1)
fit = glmnet(X, y, alpha=0, lambda=lambdas)
summary(fit)
cv.fit = cv.glmnet(X, y, alpha=0, lambda=lambdas)
# what is the optimum value of lambda?
(opt.lambda = cv.fit$lambda.min) # 1.995262
y.pred = predict(fit, s=0.01, newx=X, exact=T) # gives lower SSE
# Sum of Squares Error
(sse = sum((y.pred - y)^2))
cv.glmnet searches for lambda minimizing cross-validation score, not MSE.
From ?cv.glmnet:
The function runs glmnet nfolds+1 times; the first to get the lambda
sequence, and then the remainder to compute the fit with each of the
folds omitted. The error is accumulated, and the average error and
standard deviation over the folds is computed.

Choose model by BIC in a stepwise algorithm after choosing model from glmnet

I have data where number of observation n is smaller than number of variables p. The answer variable is binary. For example:
n <- 10
p <- 100
x <- matrix(rnorm(n*p), ncol = p)
y <- rbinom(n, size = 1, prob = 0.5)
I would like to fit logistic model for this data. So I used the code:
model <- glmnet(x, y, family = "binomial", intercept = FALSE)
The function returns 100 models for different $\lambda$ values (penalization parameter in LASSO regression). I would like to choose the biggest model which also has n - 1 parameters or less (so less than number of observations). Let's say the chosen model is for lambda_opt.
model_one <- glmnet(x, y, family = "binomial", intercept = FALSE, lambda = lambda_opt)
Now I would like to do the second step - use step function to my model to choose the submodel which will be the best in term of BIC - Bayesian Information Criterion. Unfortunately the step function doesn't work for objects of the glmnet class.
step(model_one, direction = "backward", k = log(n))
How can I perform such procedure? Is there any other function for this specific class (glmnet) to do what I want?
BIC is a fine way to select a penalty parameter from the sequence returned by glmnet, it's faster the cross validation and works quite well at least in the settings where I've tried it.
Compute the residuals sum of square for each value of the penalty parameter in the sequence (use predict(model,x) to get the fit)
model$df gives you the degrees of freedom.
Combine those to get a BIC and pick the value of lambda corresponding to the lowers BIC.

Performing Anova on Bootstrapped Estimates from Quantile Regression

So I'm using the quantreg package in R to conduct quantile regression analyses to test how the effects of my predictors vary across the distribution of my outcome.
FML <- as.formula(outcome ~ VAR + c1 + c2 + c3)
quantiles <- c(0.25, 0.5, 0.75)
q.Result <- list()
for (i in quantiles){
i.no <- which(quantiles==i)
q.Result[[i.no]] <- rq(FML, tau=i, data, method="fn", na.action=na.omit)
}
Then i call anova.rq which runs a Wald test on all the models and outputs a pvalue for each covariate telling me whether the effects of each covariate vary significantly across the distribution of my outcome.
anova.Result <- anova(q.Result[[1]], q.Result[[2]], q.Result[[3]], joint=FALSE)
Thats works just fine. However, for my particular data (and in general?), bootstrapping my estimates and their error is preferable. Which i conduct with a slight modification of the code above.
q.Result <- rqs(FML, tau=quantiles, data, method="fn", na.action=na.omit)
q.Summary <- summary(Q.mod, se="boot", R=10000, bsmethod="mcmb",
covariance=TRUE)
Here's where i get stuck. The quantreg currently cannot peform the anova (Wald) test on boostrapped estimates. The information files on the quantreg packages specifically states that "extensions of the methods to be used in anova.rq should be made" regarding the boostrapping method.
Looking at the details of the anova.rq method. I can see that it requires 2 components not present in the quantile model when bootstrapping.
1) Hinv (Inverse Hessian Matrix). The package information files specifically states "note that for se = "boot" there is no way to split the estimated covariance matrix into its sandwich constituent parts."
2) J which, according to the information files, is "Unscaled Outer product of gradient matrix returned if cov=TRUE and se != "iid". The Huber sandwich is cov = tau (1-tau) Hinv %*% J %*% Hinv. as for the Hinv component, there is no J component when se == "boot". (Note that to make the Huber sandwich you need to add the tau (1-tau) mayonnaise yourself.)"
Can i calculate or estimate Hinv and J from the bootstrapped estimates? If not what is the best way to proceed?
Any help on this much appreciated. This my first timing posting a question here, though I've greatly benefited from the answers to other peoples questions in the past.
For question 2: You can use R = for resampling. For example:
anova(object, ..., test = "Wald", joint = TRUE, score =
"tau", se = "nid", R = 10000, trim = NULL)
Where R is the number of resampling replications for the anowar form of the test, used to estimate the reference distribution for the test statistic.
Just a heads up, you'll probably get a better response to your questions if you only include 1 question per post.
Consulted with a colleague, and he confirmed that it was unlikely that Hinv and J could be 'reverse' computed from bootstrapped estimates. However we resolved that estimates from different taus could be compared using Wald test as follows.
From object rqs produced by
q.Summary <- summary(Q.mod, se="boot", R=10000, bsmethod="mcmb", covariance=TRUE)
you extract the bootstrapped Beta values for variable of interest in this case VAR, the first covariate in FML for each tau
boot.Bs <- sapply(q.Summary, function (x) x[["B"]][,2])
B0 <- coef(summary(lm(FML, data)))[2,1] # Extract liner estimate data linear estimate
Then compute wald statistic and get pvalue with number of quantiles for degrees of freedom
Wald <- sum(apply(boot.Bs, 2, function (x) ((mean(x)-B0)^2)/var(x)))
Pvalue <- pchisq(Wald, ncol(boot.Bs), lower=FALSE)
You also want to verify that bootstrapped Betas are normally distributed, and if you're running many taus it can be cumbersome to check all those QQ plots so just sum them by row
qqnorm(apply(boot.Bs, 1, sum))
qqline(apply(boot.Bs, 1, sum), col = 2)
This seems to be working, and if anyone can think of anything wrong with my solution, please share

Resources