I want to use a mixed model without a random intercept but with a correlation structure. The reason is to get the AIC to help choose the best correlation structure (e.g., autoregressive versus compound symmetry). So it is essentially a GEE, but GEEs don't allow estimation of the AIC. They are also called covariance pattern models.
The code below simulates random data with a compound symmetry correlation. The model fits both a random intercept and a variance-covariance matrix. Is there any way to switch off the random intercept?
library(MASS)
library(nlme)
Sigma = toeplitz(c(1,0.5,0.5,0.5))
data = data.frame(mvrnorm(n=10, mu=1:4, Sigma=Sigma))
data$id = 1:nrow(data)
long = reshape(data, direction='long', varying=list(1:4), v.names='Y')
cs = corCompSymm(0.5, form = ~ 1 | id)
model = lme(Y~time , random=list(~1|id), data=long, correlation=cs)
summary(model)
If you are solely interested in comparing correlation structures, then I am pretty sure your goal could be served by a generalized least squares model fit with gls:
model = gls(Y~time, data=long, correlation=cs)
summary(model)
AIC(model)
Otherwise, a linear mixed effects model fit with lme must have random effects specified.
Related
Is there a way to get the null deviance and df for a generalized linear mixed model fit with glmer()? Is there a reason that this is not included in the summary() output, the way that it is with a glm() object?
You can compute the null deviance by re-fitting the model with an intercept term only, e.g.
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
gm0 <- update(gm1, . ~ 1 + (1|herd))
deviance(gm1) ## 73.47
deviance(gm0) ## 92.42 (null deviance)
I'm not sure what you mean by the "null df" for the GLMM; the 'denominator degree of freedom' measure of effective sample size that works perfectly for balanced ANOVAs and questionably for linear mixed models [via inclusion/exclusion, Satterthwaite, Kenward-Roger, etc.] is hard to define for GLMMs.
I can think of a couple of reasons that lme4 doesn't automatically do this computation for you:
it could be an expensive re-fit (even for GLMs it does require refitting the model, see here for the code in glm that does it)
it's less obvious for GLMMs what the appropriate null model for comparison is. Do you remove both random and fixed effects and reduce the model to a GLM? Do you keep all of the random effects, or only intercept-level random effects, or some other mixture depending on the context of the question? Making the user do it themselves forces them to make this choice.
(That said, I don't believe that omitting the null deviance was an explicit choice.)
If you do choose to discard all of the random effects (i.e. comparing to deviance(glm(cbind(incidence, size - incidence) ~ period, data =cbpp, family = binomial)) in the example above, you should be able to do a meaningful comparison with a glmer fit, but there are some subtleties: you might want to read the section on Deviance and log-likelihood of GLMMs in ?deviance.merMod.
I have used caret to build a elastic net model using 10-fold cv and I want to see which coefficients are used in the final model (i.e the ones that aren't reduced to zero). I have used the following code to view the coefficients, however, this apears to create a dataframe of every permutation of coefficient values used, rather than the ones used in the final model:
tr_control = train_control(method="cv",number=10)
formula = response ~.
model1 = caret::train(formula,
data=training,
method="glmnet",
trControl=tr_control,
metric = "Accuracy",
family = "binomial")
Then to extract the coefficients from the final model and using the best lambda value, I have used the following:
data.frame(as.matrix(coef(model1$finalModel, model1$bestTune$.lambda)))
However, this just returns a dataframe of all the coefficients and I can see different instances of where the coefficients have been reduced to zero, however, I'm not sure which is the one the final model uses. Using some slightly different code, I get slightly different results, but in this instance, non of the coefficients are reduced to zero, which suggests to me that the the final model isn't reducing any coefficients to zero:
data.frame(as.matrix(coef(model1$finalModel, model1$bestTune$lambda))) #i have removed the full stop preceeding lambda
Basically, I want to know which features are in the final model to assess how the model has performed as a feature reduction process (alongside standard model evaluation metrics such as accuracy, sensitivity etc).
Since you do not provide any example data I post an example based on the iris built-in dataset, slightly modified to fit better your need (a binomial outcome).
First, modify the dataset
library(caret)
set.seed(5)#just for reproducibility
iris
irisn <- iris[iris$Species!="virginica",]
irisn$Species <- factor(irisn$Species,levels = c("versicolor","setosa"))
str(irisn)
summary(irisn)
fit the model (the caret function for setting controls parameters for train is trainControl, not train_control)
tr_control = trainControl(method="cv",number=10)
model1 <- caret::train(Species~.,
data=irisn,
method="glmnet",
trControl=tr_control,
family = "binomial")
You can extract the coefficients of the final model as you already did:
data.frame(as.matrix(coef(model1$finalModel, model1$bestTune$lambda)))
Also here the model did not reduce any coefficients to 0, but what if we add a random variable that explains nothing about the outcome?
irisn$new1 <- runif(nrow(irisn))
model2 <- caret::train(Species~.,
data=irisn,
method="glmnet",
trControl=tr_control,
family = "binomial")
var <- data.frame(as.matrix(coef(model2$finalModel, model2$bestTune$lambda)))
Here, as you can see, the coefficient of the new variable was turning to 0. You can extract the variable name retained by the model with:
rownames(var)[var$X1!=0]
Finally, the accuracy metrics from the test set can be obtained with
confusionMatrix(predict(model1,test),test$outcome)
I have two models:
model1 = y~ a+b*c+ 1|d
model2 = y~ a*e+c+1|d
I wanted to compare how they do.
anova(model1, model2)
This is the result:
Why is the p value 0?
Thank you!
Desperate grad student
Hi Desperate Grad student! Typically, the ANOVA test is used to test the necessity of a complex model with respect to a simpler, more parsimonious model. Since, in your case, you're comparing two models with the same number of parameters, you have 0 degrees of freedom (where df = # of parameters in the complex model - # of parameters in the simpler model). This is why you have an absent p-value associated with this comparison.
However, since you have the information criteria for both of these models (AIC/BIC), you can use that to compare the two. Here, model 1 is favorable since its AIC and BIC are lower than the IC for model 2.
If you're set on using the ANOVA approach to compare models, consider creating an "intercept only" model using model0 <- y ~ 1 as your basis for comparison.
Previously I used SAS to fit data into nonlinear regression model. SAS was able to produce an analysis of variance table for the model. The table displays the degrees of freedom, sums of squares, and mean squares along with the model F test.
Please refer to Table 69.4 in this pdf file.
Source: https://support.sas.com/documentation/onlinedoc/stat/132/nlin.pdf
How can I re-create something similar in R? Thanks in advance.
I'm not sure what type of nonlinear regression you're interested in- but the general approach would be to run the model and call for a summary. The typical linear model would be:
linearmodel = lm(`outcomevar` ~ `predictorvar`, data = dataset)
linearmodel #gives coefficients
summary(linearmod) # gives model fit
For nonlinear regression you would add the polynomial term. For quadratic fit it would be
y = b0 + b1(Var) + b2(Var * Var) or:
nonlinmodel = lm(`outcomevar` ~ `predictorvar` + I(`predictorvar`^2), data = dataset)
nonlinmodel
summary(nonlinmodel)
other methods here: https://data-flair.training/blogs/r-nonlinear-regression/
I have following:
library(pls)
pcr(price ~ X, 6, data=cars, validation="CV")
it works, but because I have a small dataset, I cannot divide in into training and test and therefore I want to perform cross-validation and then extract predicted data for AUC and accuracy. But I could not find how I can extract the predicted data.Which parameter is it?
When you fit a cross-validated principal component regression model with pcr() and the validation= argument, one of the components of the output list is called validation. This contains the results of the cross validation. This in turn is a list and it has a component called pred, which contains the cross-validated predictions.
An example adapted from example("pcr"):
sens.pcr <- pcr(sensory ~ chemical, data = oliveoil, validation = "CV")
sens.pcr$validation$pred
As an aside, it's generally a good idea to set your random seed immediately prior to performing cross validation to ensure reproducibility of your results.