Is there any summary for cv.glmnet function in R? - r

One can perform regular ordinary generalized linear model in R using glm function that has it's own method for summary function and one can summary of the model in which output there are p-values for each variable. Depending on those p-values one can say which variables are statistically significance or not under specific confidence level.
My question is. Is there is the same functionality for cv.glmnet function from glmnet package? I know that after computation I can receive a table with coefficients coef(model, s="lambda.min") where some of them are not zero. So I assume (maybe wrongly) that those non-zero are statistically significance. Am I right? Is there any method that provides p-values or confidence intervals for those coefficients?

Related

Using GAMLSS, the difference between fitDist() and gamlss()

When using the GAMLSS package in R, there are many different ways to fit a distribution to a set of data. My data is a single vector of values, and I am fitting a distribution over these values.
My question is this: what is the main difference between using fitDist() and gamlss() since they give similar but different answers for parameter values, and different worm plots?
Also, using the function confint() works for gamlss() fitted objects but not for objects fitted with fitDist(). Is there any way to produce confidence intervals for parameters fitted with the fitDist() function? Is there an accuracy difference between the two procedures? Thanks!
m1 <- fitDist()
fits many distributions and chooses the best according to a
generalized Akaike information criterion, GAIC(k), wit penalty k for each
fitted parameter in the distribution, where k is specified by the user,
e.g. k=2 for AIC,
k = log(n) for BIC,
k=4 for a Chi-squared test (rounded from 3.84, the 5% critical value of a Chi-squared distribution with 1 degree of fereedom), which is my preference.
m1$fits
gives the full results from the best to worst distribution according to GAIC(k).

Does the function multinom() from R's nnet package fit a multinomial logistic regression, or a Poisson regression?

The documentation for the multinom() function from the nnet package in R says that it "[f]its multinomial log-linear models via neural networks" and that "[t]he response should be a factor or a matrix with K columns, which will be interpreted as counts for each of K classes." Even when I go to add a tag for nnet on this question, the description says that it is software for fitting "multinomial log-linear models."
Granting that statistics has wildly inconsistent jargon that is rarely operationally defined by whoever is using it, the documentation for the function even mentions having a count response and so seems to indicate that this function is designed to model count data. Yet virtually every resource I've seen treats it exclusively as if it were fitting a multinomial logistic regression. In short, everyone interprets the results in terms of logged odds relative to the reference (as in logistic regression), not in terms of logged expected count (as in what is typically referred to as a log-linear model).
Can someone clarify what this function is actually doing and what the fitted coefficients actually mean?
nnet::multinom is fitting a multinomial logistic regression as I understand...
If you check the source code of the package, https://github.com/cran/nnet/blob/master/R/multinom.R and https://github.com/cran/nnet/blob/master/R/nnet.R, you will see that the multinom function is indeed using counts (which is a common thing to use as input for a multinomial regression model, see also the MGLM or mclogit package e.g.), and that it is fitting the multinomial regression model using a softmax transform to go from predictions on the additive log-ratio scale to predicted probabilities. The softmax transform is indeed the inverse link scale of a multinomial regression model. The way the multinom model predictions are obtained, cf.predictions from nnet::multinom, is also exactly as you would expect for a multinomial regression model (using an additive log-ratio scale parameterization, i.e. using one outcome category as a baseline).
That is, the coefficients predict the logged odds relative to the reference baseline category (i.e. it is doing a logistic regression), not the logged expected counts (like a log-linear model).
This is shown by the fact that model predictions are calculated as
fit <- nnet::multinom(...)
X <- model.matrix(fit) # covariate matrix / design matrix
betahat <- t(rbind(0, coef(fit))) # model coefficients, with expicit zero row added for reference category & transposed
preds <- mclustAddons::softmax(X %*% betahat)
Furthermore, I verified that the vcov matrix returned by nnet::multinom matches that when I use the formula for the vcov matrix of a multinomial regression model, Faster way to calculate the Hessian / Fisher Information Matrix of a nnet::multinom multinomial regression in R using Rcpp & Kronecker products.
Is it not the case that a multinomial regression model can always be reformulated as a Poisson loglinear model (i.e. as a Poisson GLM) using the Poisson trick (glmnet e.g. uses the Poisson trick to fit multinomial regression models as a Poisson GLM)?

Convert probabilities from logistic regression to classes

I've fit a multinomial logistic regression using gam() and the mgcv package in R. I then used this model to make predictions on new data. The resulting predictions are probabilities because gam() doesn't have a type="class" option. I'd like to convert these probabilities to one of my three levels for my categorical response variable but I have two questions: 1) How should I determine where make the probability threshold between the three classes? and 2). How can I make a nested ifelse() statement that reclassifies the probabilities according to the thresholds?

Does auto.arima function in R do the differencing for the y- and x-variables before or after estimating the linear regression model?

I'm trying to estimate a linear regression with arima errors, but my regressors are highly collinear and thus the regression model suffers from multicollinearity. Since my ultimate goal is to be able to interpret the individual regression coefficients as elasticities and to use them for ex-ante forecasting, I need to solve the multicollinearity somehow to be able to trust the coefficients of the regressors. I know that transforming the regressor variables eg. by differencing might help to reduce the multicollinearity. And I have also understood that auto.arima performs the same differencing for both the response variable as well as the regressors defined in xreg (see: Do we need to do differencing of exogenous variables before passing to xreg argument of Arima() in R?).
So my question is, does it do the transformation already before estimating the regression coefficients or is the regression estimated using the untransformed data and transformation done only before fitting the arima model to the errors? And if the transformation is done before estimating the regression, what is the script to get those transformed values to a table or something, to be able to run the multicollinearity test on those rather than the original data?
The auto.arima() function will do the differencing before estimation to ensure consistent estimators. The estimation of the regression coefficients and ARIMA model is done simultaneously using MLE.
You can manually difference the data yourself using the diff() function.

Can I test autocorrelation from the generalized least squares model?

I am trying to use a generalized least square model (gls in R) on my panel data to deal with autocorrelation problem.
I do not want to have any lags for any variables.
I am trying to use Durbin-Watson test (dwtest in R) to check the autocorrelation problem from my generalized least square model (gls).
However, I find that the dwtest is not applicable over gls function while it is applicable to other functions such as lm.
Is there a way to check the autocorrelation problem from my gls model?
Durbin-Watson test is designed to check for presence of autocorrelation in standard least-squares models (such as one fitted by lm). If autocorrelation is detected, one can then capture it explicitly in the model using, for example, generalized least squares (gls in R). My understanding is that Durbin-Watson is not appropriate to then test for "goodness of fit" in the resulting models, as gls residuals may no longer follow the same distribution as residuals from the standard lm model. (Somebody with deeper knowledge of statistics should correct me, if I'm wrong).
With that said, function durbinWatsonTest from the car package will accept arbitrary residuals and return the associated test statistic. You can therefore do something like this:
v <- gls( ... )$residuals
attr(v,"std") <- NULL # get rid of the additional attribute
car::durbinWatsonTest( v )
Note that durbinWatsonTest will compute p-values only for lm models (likely due to the considerations described above), but you can estimate them empirically by permuting your data / residuals.

Resources