Extract coefficients from penalized regression - r

I am using the sparse group lasso, which is a penalized regression. The package I am using is SGL. I tried to run the examples in my R, and the code is given as below
set.seed(1)
n = 50; p = 100; size.groups = 10
index <- ceiling(1:p / size.groups)
X = matrix(rnorm(n * p), ncol = p, nrow = n)
beta = (-2:2)
y = X[,1:5] %*% beta + 0.1*rnorm(n)
data = list(x = X, y = y)
cvFit = cvSGL(data, index, type = "linear")
I tried to extract the regression coefficient of cvFit, but it turns out to be
coef(cvFit)
NULL
Can anyone tell me what is wrong? Thanks in advance.

This extracts the coefficients from the model with the minimum lambda value.
coef(fit,s=cvfit$lambda.min)

Related

Misuse predict.rq in the package quantreg?

I am using quantreg package to predict new data based on training set. However, I noticed a discrepancy between predict.rq or predict and doing it manually. Here is an example:
The quantile regression setting is
N = 10000
tauList = seq(1:11/12)/12
y = rchisq(N,2)
X = matrix( rnorm(3*N) ,nrow = N, ncol = 3 )
fit <- rq( y ~ X-1, tau = tauList, method = "fn")
The new data set I want to predict is
newdata <- matrix( rbeta((3*N),2,2) ,nrow = N,ncol=3 )
I use predict.rq or predict to predict newdata. Both return the same result:
fit_use_predict <- predict.rq( fit, newdata = as.data.frame(newdata) )
Also I manually do the prediction based on the coefficients matrix:
coef_mat <- coef(fit)
fit_use_multiplication <- newdata %*% coef_mat
I expect both are numerically identical, but they are not:
diff <- fit_use_predict - fit_use_multiplication
print(diff)
Their difference cannot be negligible.
However, predicting the original data set X, both return the same result, i.e.,
predict(fit, newdata = data.frame(X)) = X %*% coef_mat ## True
Do I miss something when using the function? Thanks!
A more serious problem here, before we get to prediction is that the model is forcing all of the fitted quantile functions through the origin of design space and since the covariates are centered at the origin all of the quantile functions are forced to cross there. Even if the X's all lie in the positive orthant it is quite a strong assumption to say that the distribution of the response is degenerate at the origin.
I think you just have to retain the 'X' name in your data as it was in the training data.
library(quantreg)
N = 10000
tauList = seq(1:11/12)/12
y = rchisq(N,2)
X = matrix( rnorm(3*N) ,nrow = N, ncol = 3 )
fit <- rq( y ~ X-1, tau = tauList, method = "fn")
newdata <- matrix( rbeta((3*N),2,2) ,nrow = N,ncol=3 )
fit_use_predict <- predict.rq( fit, newdata = data.frame(X=I(newdata)) )
coef_mat <- coef(fit)
fit_use_multiplication <- newdata %*% coef_mat
diff <- fit_use_predict - fit_use_multiplication
max( abs(diff) )
Output is 0

Is there a way in R to determine AIC from cv.glmnet?

I am using the glmnet package in R, and not(!) the caret package for my binary ElasticNet regression. I have come to the point where I would like to compare models (e.g. lambda set to lambda.1se or lambda.min, and models where k-fold is set to 5 or 10). But, I have not yet achieved to compute the AICc or BIC for my models. How do I do that? I tried this and this but it did not work for me, I only get an empty list.
Code:
set.seed(123)
foldid <- sample(rep(seq(10), length.out = nrow(x.train)))
list.of.fits.df <- list()
for (i in 0:10){
fit.name <- paste0("alpha", i/10)
list.of.fits.df[[fit.name]] <- cv.glmnet(x.train, y.train, type.measure = c("auc"), alpha = i/10, family = "binomial", nfolds = 10, foldid = foldid, parallel = TRUE)
}
best.fit <- coef(list.of.fits.df[[fit.name]], s = list.of.fits.df[[fit.name]]$lambda.1se)
best.fit.min <- coef(list.of.fits.df[[fit.name]], s = list.of.fits.df[[fit.name]]$lambda.min)
#AICc & BIC
#???
How can I find the AICc and BIC for my best fit model?
You can alter the solution given in this answer slightly to obtain the desired result The reason it doesn't work "out of the box" is that the cv.glmnet function returns the result of several fits, but the individual results are stored in x$glmnet.fit, and we can use this to create a simple function for calculating AICc and BIC.
glmnet_cv_aicc <- function(fit, lambda = 'lambda.1se'){
whlm <- which(fit$lambda == fit[[lambda]])
with(fit$glmnet.fit,
{
tLL <- nulldev - nulldev * (1 - dev.ratio)[whlm]
k <- df[whlm]
n <- nobs
return(list('AICc' = - tLL + 2 * k + 2 * k * (k + 1) / (n - k - 1),
'BIC' = log(n) * k - tLL))
})
}
All we'll then have to do is provide the model and get our estimated AICc.
best.aicc <- glmnet_cv_aicc(list.of.fits.df[[fit.name]])
best.aicc.min <- glmnet_cv_aicc(list.of.fits.df[[fit.name]], 'lambda.min')
For a reproducible example, one could use one of the many examples provided in help(glmnet)
n = 500
p = 30
nzc = trunc(p/10)
x = matrix(rnorm(n * p), n, p)
beta3 = matrix(rnorm(30), 10, 3)
beta3 = rbind(beta3, matrix(0, p - 10, 3))
f3 = x %*% beta3
p3 = exp(f3)
p3 = p3/apply(p3, 1, sum)
g3 = glmnet:::rmult(p3)
set.seed(10101)
cvfit = cv.glmnet(x, g3, family = "multinomial")
print(glmnet_cv_aicc(cvfit))
# Output
#$AICc
#[1] -556.2404
#
#$BIC
#[1] -506.3058
print(glmnet_cv_aicc(cvfit, 'lambda.min'))
# Output
#$AICc
#[1] -601.0234
#
#$BIC
#[1] -506.4068

How to write this function for mixture model in R?

I want to want to estimate a model in R.
One of its part is a finite mixture model which is consisted of two OLS.
As a freshman in R, I don't know how to write this probability density function in R.
I wonder if you can give some help.
The probability density function is as following:
f(y|x)=(p/σ1)*φ(y-x*b1/σ1)+((1-p)/σ2)*φ(y-x*b2/σ2)
I have used stata to write a example:
gen double f1'=normalden($ML_y1,xb1',exp(lns1'))
gen doublef2'=normalden($ML_y1,xb2',exp(lns2'))
tempvar p
gen double p'=exp(lp')/(1+exp(lp'))
replacelnf'=ln(p'*f1'+(1-p')*f2')
I wonder if you can show me how to write this function in R.
Thanks a lot and I am looking forward to your help
See the function FLXMRglm. The density is estimated with dnorm
library(flexmix)
FLXMRglm
# your case
if (family == "gaussian") {
z#defineComponent <- function(para) {
predict <- function(x, ...) {
dotarg = list(...)
if ("offset" %in% names(dotarg))
offset <- dotarg$offset
p <- x %*% para$coef
if (!is.null(offset))
p <- p + offset
p
}
logLik <- function(x, y, ...) dnorm(y, mean = predict(x,
...), sd = para$sigma, log = TRUE)
new("FLXcomponent", parameters = list(coef = para$coef,
sigma = para$sigma), logLik = logLik, predict = predict,
df = para$df)
}
z#fit <- function(x, y, w, component) {
fit <- lm.wfit(x, y, w = w, offset = offset)
z#defineComponent(para = list(coef = coef(fit), df = ncol(x) +
1, sigma = sqrt(sum(fit$weights * fit$residuals^2/mean(fit$weights))/(nrow(x) -
fit$rank))))
}
}

Logistic Regression in R: Optimization Issues concerning Initial Guess

I need to implement a logistic regression manually, using the Score/GMM approach, without the use of GLM. This is because at later stages the model will be much more complicated. Currently I am running into a problem where for the logistic regression, the optimization procedures are very initial point dependent.To illustrate, here is my code using an online dataset. More details about the procedure are in the comments:
library(data,table)
library(nleqslv)
library(Matrix)
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
data_analysis<-data.table(mydata)
data_analysis[,constant:=1]
#Likelihood function for logit
#The logistic regression will regress the binary variable
#admit on a constant and the variable gpa
LL <- function(beta){
beta=as.numeric(beta)
data_temp=data_analysis
mat_temp2 = cbind(data_temp$constant,
data_temp$gpa)
one = rep(1,dim(mat_temp2)[1])
h = exp(beta %*% t(mat_temp2))
choice_prob = h/(1+h)
llf <- sum(data_temp$admit * log(choice_prob)) + (sum((one-data_temp$admit) * log(one-choice_prob)))
return(-1*llf)
}
#Score to be used when optimizing using LL
#Identical to the Score function below but returns negative output
Score_LL <- function(beta){
data_temp=data_analysis
mat_temp2 = cbind(data_temp$constant,
data_temp$gpa)
one = rep(1,dim(mat_temp2)[1])
h = exp(beta %*% t(mat_temp2))
choice_prob = h/(1+h)
resid = as.numeric(data_temp$admit - choice_prob)
score_final2 = t(mat_temp2) %*% Diagonal(length(resid), x=resid) %*% one
return(-1*as.numeric(score_final2))
}
#The Score/Deriv/Jacobian of the Likelihood function
Score <- function(beta){
data_temp=data_analysis
mat_temp2 = cbind(data_temp$constant,
data_temp$gpa)
one = rep(1,dim(mat_temp2)[1])
h = exp(beta %*% t(mat_temp2))
choice_prob = as.numeric(h/(1+h))
resid = as.numeric(data_temp$admit - choice_prob)
score_final2 = t(mat_temp2) %*% Diagonal(length(resid), x=resid) %*% one
return(as.numeric(score_final2))
}
#Derivative of the Score function
Score_Deriv <- function(beta){
data_temp=data_analysis
mat_temp2 = cbind(data_temp$constant,
data_temp$gpa)
one = rep(1,dim(mat_temp2)[1])
h = exp(beta %*% t(mat_temp2))
weight = (h/(1+h)) * (1- (h/(1+h)))
weight_mat = Diagonal(length(weight), x=weight)
deriv = t(mat_temp2)%*%weight_mat%*%mat_temp2
return(-1*as.array(deriv))
}
#Quadratic Gain function
#Minimized at Score=0 and so minimizing is equivalent to solving the
#FOC of the Likelihood. This is the GMM approach.
Quad_Gain<- function(beta){
h=Score(as.numeric(beta))
return(sum(h*h))
}
#Derivative of the Quadratic Gain function
Quad_Gain_deriv <- function(beta){
return(2*t(Score_Deriv(beta))%*%Score(beta))
}
sol1=glm(admit ~ gpa, data = data_analysis, family = "binomial")
sol2=optim(c(2,2),Quad_Gain,gr=Quad_Gain_deriv,method="BFGS")
sol3=optim(c(0,0),Quad_Gain,gr=Quad_Gain_deriv,method="BFGS")
When I run this code, I get that sol3 matches what glm produces (sol1) but sol2, with a different initial point, differs from the glm solution by a lot. This is something happening in my main code with the actual data as well. One solution is to create a grid and test multiple starting points. However, my main data set has 10 parameters and this would make the grid very large and the program computationally infeasible. Is there a way around this problem?
Your code seems overly complicated. The following two functions define the negative log-likelihood and negative score vector for a logistic regression with the logit link:
logLik_Bin <- function (betas, y, X) {
eta <- c(X %*% betas)
- sum(dbinom(y, size = 1, prob = plogis(eta), log = TRUE))
}
score_Bin <- function (betas, y, X) {
eta <- c(X %*% betas)
- crossprod(X, y - plogis(eta))
}
Then you can use it as follows:
# load the data
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
# fit with optim()
opt1 <- optim(c(-1, 1, -1), logLik_Bin, score_Bin, method = "BFGS",
y = mydata$admit, X = cbind(1, mydata$gre, mydata$gpa))
opt1$par
# compare with glm()
glm(admit ~ gre + gpa, data = mydata, family = binomial())
Typically, for well-behaved covariates (i.e., expecting to have a coefficients in the interval [-4 to 4]), starting at 0 is a good idea.

maximum likelihood estimation

I am new user of R and hope you will bear with me if my question is silly. I want to estimate the following model using the maximum likelihood estimator in R.
y= a+b*(lnx-α)
Where a, b, and α are parameters to be estimated and X and Y are my data set. I tried to use the following code that I get from the web:
library(foreign)
maindata <- read.csv("C:/Users/NUNU/Desktop/maindata/output2.csv")
h <- subset(maindata, cropid==10)
library(likelihood)
modelfun <- function (a, b, x) { b *(x-a)}
par <- list(a = 0, b = 0)
var<-list(x = "x")
par_lo <- list(a = 0, b = 0)
par_hi <- list(a = 50, b = 50)
var$y <- "y"
var$mean <- "predicted"
var$sd <- 0.815585
var$log <- TRUE
results <- anneal(model = modelfun, par = par, var = var,
source_data = h, par_lo = par_lo, par_hi = par_hi,
pdf = dnorm, dep_var = "y", max_iter = 20000)
The result I am getting is similar although the data is different, i.e., even when I change the cropid. Similarly, the predicted value generated is for x rather than y.
I do not know what I missed or went wrong. Your help is highly appreciated.
I am not sure if your model formula will lead to a unique solution, but in general you can find MLE with optim function
Here is a simple example for linear regression with optim:
fn <- function(beta, x, y) {
a = beta[1]
b = beta[2]
sum( (y - (a + b * log(x)))^2 )
}
# generate some data for testing
x = 1:100
# a = 10, b = 3.5
y = 10 + 3.5 * log(x)
optim(c(0,0,0),fn,x=x,y=y,method="BFGS")
you can change the function "fn" to reflect your model formula e.g.
sum( (y - (YOUR MODEL FORMULA) )^2 )
EDIT
I am just giving a simple example of using optim in case you have a custom model formula to optimize. I did not mean using it from simple linear regression, since lm will be sufficient.
I was a bit surprised that iTech used optim for what is a problem that is linear in its parameters. With his data for x and y:
> lm(y ~ log(x) )
Call:
lm(formula = y ~ log(x))
Coefficients:
(Intercept) log(x)
10.0 3.5
For linear problems, the least squares solution is the ML solution.

Resources