MuMIn dredge gam error using default na.omit - r

I have a global model I'm trying to dredge, but I keep getting the error "Error in dredge(myglobalmod, evaluate = TRUE, trace = 2) :
'global.model' uses 'na.action' = "na.omit"
I tried running the global model with na.action="na.omit" within the gam() call and leaving it out (since it's the default).
myglobalmod <- gam(response~ s(x1) + s(x2) + s(x3) + offset(x4), data=mydata, family="tw", na.action="na.omit")
options(na.action=na.omit)
mydredge <- dredge(myglobalmod, evaluate=TRUE, trace=2)
When I didn't include na.action="na.omit" within the gam, I got a similar error.
I then tried with a subset of the data that has all the NA rows removed, but same error.
I've gotten dredge to work before so I'm not sure why it doesn't like the na.omit now, I'm using the same code.

MuMIn insists that you use na.action = na.fail, in order to ensure that the same data set is used for every model (if NA values were left in the data set, different subsets could be used for different models depending on which variables were used). You can use na.omit(mydata) or mydata[complete.cases(mydata), ] to get rid of NA values before you start (assuming that the NA values in your data set occur only in variables you will be using for the full model).
> library(MuMIn)
> m1 <- lm(mpg ~ ., data = mtcars)
> d0 <- dredge(m1)
Error in dredge(m1) :
'global.model''s 'na.action' argument is not set and options('na.action') is "na.omit"
> m1 <- lm(mpg ~ ., data = mtcars, na.action = na.fail)
> d1 <- dredge(m1)
Fixed term is "(Intercept)"

Related

problem with anova refit argument: objects must inherit from classes "gls"

I am testing whether I should include certain variance structure in my model, for which I'm using the LRT test from the anova function. The problem is that when I set the argument refit to FALSE (in fact, it doesn't work with TRUE neither), anova returns an error that I don't know how to handle.
> m1 <- lme(Compscore ~ Cond * Time, random = ~1|Participant, data = lyster.long4)
> m4 <- update(m1, weights = varIdent(form = ~ Time | Cond))
> an.res <- anova.lme(m1, m4, refit=FALSE)
Error in anova.lme(m1, m4, refit = FALSE) :
objects must inherit from classes "gls", "gnls", "lm", "lmList", "lme", "nlme", "nlsList", or
"nls"
Without that argument, the function works fine. I'm using lme package.

Loop mixed linear model longitudinal time data assessing groups effect on the continous y variable

EDITED:
I'm trying to assess the effect of variables (e.g. presence of severe trauma) on a continous variable (here energy expenditure (=REE) in calories) over time (Day). The dataframe is called my_data. Amongst the variables
Following I would like to display the results using the mixed linear model for each assessed variable in one large file.
General concept:
REE ~ Time*predictor + (1 + Time | Case identifier)
(1) Starting creating the lmer model:
library(tidyverse)
library(ggpmisc)
library(sjPlot)
library(lme4)
mixed.modelloop <- function(x) {
lmer(REE ~ Day*(x) + (1 + Day | Studynumber),
data=my_data,
REML=FALSE,
na.action=na.omit,
control = lmerControl(check.nobs.vs.nRE = "ignore"))
}
(2) Then creating the predictors (x)
cols <- c(colnames(my_data))
(3) And then generating the overall purrr function:
output <- purrr::map(cols, ~ mixed.modelloop(.x) %>% tab_model)
(4) generating the file which should include all separate univariate mixed model analyses:
pdf(file="mixed linear models.pdf" )
output
dev.off()
Unfortunately currently after step (3) I'm getting the following error message:
Error in model.frame.default(data = my_data, na.action = na.omit, drop.unused.levels = TRUE, :
variable lengths differ (found for 'x')
Any idea on how to adapt the function to resolve this issue?
Thanks!
Formulas have special rules, you can't insert a string into them and expect them to work.
This should work, although you haven't given a reproducible example to test with ...
mixed.modelloop <- function(x) {
form <- reformulate(c(sprintf("Day*%s", x), "(1 + Day | Studynumber)"),
response = "REE")
lmer(form,
data=my_data,
REML=FALSE,
na.action=na.omit,
control = lmerControl(check.nobs.vs.nRE = "ignore"))
}

Use glm with data.table and a parametric definition of the predictors and the response

I want to do VIF testing running consecutive regressions within a dataset, each time using one variable as the response and the remaining as predictors.
To that end I will put my code within a for loop which will give consecutive values to the index of the column that will be used as the response and leave the remaining as predictors.
I am going to use the data.table package and I will use the mtcars dataset found in base R to create a reproducible example:
data(mtcars)
setDT(mtcars)
# Let i-- the index of the response -- be 1 for demonstration purposes
i <- 1
variables <- names(mtcars)
response <- names(mtcars)[i]
predictors <- setdiff(variables, response)
model <- glm(mtcars[, get(response)] ~ mtcars[, predictors , with = FALSE], family = "gaussian")
However, this results to an error message:
Error in model.frame.default(formula = mtcars[, get(response)] ~
mtcars[, :
invalid type (list) for variable 'mtcars[, predictors, with = FALSE]'
Could you explain the error and help me correct the code?
Your advice will be appreciated.
=============================================================================
Edit:
In reproducing the code suggested I got an error message:
> library(car)
> library(data.table)
>
> data(mtcars)
> setDT(mtcars)
> model <- glm(formula = mpg ~ .,data=mtcars , family = "gaussian")
> vif(model)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘vif’ for signature ‘"glm"’
Update:
The code run without problem when I specified explicitly the package, i.e.:
car::vif(model)
Edit 2
I had to amend Fredrik's code as follows to get the coefficients of all the variables:
rhs <- paste(predictors, collapse ="+")
full_formula <- paste(response, "~", rhs)
full_formula <- as.formula(full_formula)
If you want to calculate the VIF of your predictors I would suggest looking at the vif function in package car. It will do the calculations for you and generalizes to predictors with multiple degrees of freedom such as factors.
To get all the vifs you would just hav
library(car)
library(data.table)
data(mtcars)
setDT(mtcars)
model <- glm(formula = mpg ~ .,data=mtcars , family = "gaussian")
vif(model)
As for your error, I see it as you are mixing up glm which takes a formula and a dataset and glm.fit which takes the design matrix and predictions, in that order. You have concepts from both functions in your call.
To fit your model I suggest going with the glm since this will give you an object of class glm with extra features such as the ability to do plot(model) as opposed to glm.fit where you only get a list of values related to the model.
In that case you would just have to create the formula, looking something like:
library(data.table)
data(mtcars)
setDT(mtcars)
# Let i-- the index of the response -- be 1 for demonstration purposes
i <- 1
variables <- names(mtcars)
response <- names(mtcars)[i]
predictors <- setdiff(variables, response)
rhs <- paste(predictors, sep = " + ")
full_formula <- paste(response, "~", rhs)
model <- glm(formula = full_formula ,data=mtcars, family = "gaussian")
In contrast to:
model <- glm.fit(y=mtcars[, get(response)] ,
x=mtcars[, predictors , with = FALSE],
family=gaussian())
Another solution is based on the use of glm.fit:
model <- glm.fit(x=mtcars[, ..predictors], y=mtcars[[response]], family = gaussian())

cv.glm variable lengths differ

I am trying to cv.glm on a linear model however each time I do I get the error
Error in model.frame.default(formula = lindata$Y ~ 0 + lindata$HomeAdv + :
variable lengths differ (found for 'air-force-falcons')
air-force-falcons is the first variable in the dataset lindata. When I run glm I get no errors. All the variables are in a single dataset and there are no missing values.
> linearmod5<- glm(lindata$Y ~ 0 + lindata$HomeAdv + ., data=lindata, na.action="na.exclude")
> set.seed(1)
> cv.err.lin=cv.glm(lindata,linearmod5,K=10)
Error in model.frame.default(formula = lindata$Y ~ 0 + lindata$HomeAdv + :
variable lengths differ (found for 'air-force-falcons')
I do not know what is driving this error or the solution. Any ideas? Thank you!
What is causing this error is a mistake in the way you specify the formula
This will produce the error:
mod <- glm(mtcars$cyl ~ mtcars$mpg + .,
data = mtcars, na.action = "na.exclude")
cv.glm(mtcars, mod, K=11) #nrow(mtcars) is a multiple of 11
This not:
mod <- glm(cyl ~ ., data = mtcars)
cv.glm(mtcars, mod, K=11)
neither this:
mod <- glm(cyl ~ + mpg + disp, data = mtcars)
cv.glm(mtcars, mod, K=11)
What happens is that you specify the variable in like mtcars$cyl this variable have a number of rows equal to that of the original dataset. When you use cv.glm you partition the data frame in K parts, but when you refit the model on the resampled data it evaluates the variable specified in the form data.frame$var with the original (non partitioned) length, the others (that specified by .) with the partitioned length.
So you have to use relative variable in the formula (without $).
Other advices on formula:
avoid using a mix of specified variables and . you double variables. The dot is for all vars in the df except those on the left of tilde.
Why do you add a zero? if it is in the attempt to remove the intercept use -1 instead. However, this is a bad practice in my opinion

R Variable Length Differ when build linear model for residuals

I am working on a problem where I want to build a linear model using residuals of two other linear models. I have used UN3 data set to show my problem since its easy put the problem here than using my actual data set.
Here is my R code:
head(UN3)
m1.lgFert.purban <- lm(log(Fertility) ~ Purban, data=UN3)
m2.lgPPgdp.purban <- lm(log(PPgdp) ~ Purban, data=UN3)
m3 <- lm(residuals(m1.lgFert.purban) ~ residuals(m2.lgPPgdp.purban))
Here is the error I am getting:
> m3 <- lm(residuals(m1.lgFert.purban) ~ residuals(m2.lgPPgdp.purban))
Error in model.frame.default(formula = residuals(m1.lgFert.purban) ~ residuals(m2.lgPPgdp.purban), :
variable lengths differ (found for 'residuals(m2.lgPPgdp.purban)')
I am not really understanding the why this error actually take place. If it was log related issue then I should have gotten the error when I am building first two models.
Your default na.action is most likely na.omit (check with options("na.action")). This means that NA values get removed silently, resulting in different lengths of the residuals vectors. You probably want to use na.action="na.exclude", which pads the residuals with NAs.
library(alr3)
options("na.action")
#$na.action
#[1] "na.omit"
m1.lgFert.purban <- lm(log(Fertility) ~ Purban, data=UN3,na.action="na.exclude")
m2.lgPPgdp.purban <- lm(log(PPgdp) ~ Purban, data=UN3,na.action="na.exclude")
m3 <- lm(residuals(m1.lgFert.purban) ~ residuals(m2.lgPPgdp.purban))
#Coefficients:
# (Intercept) residuals(m2.lgPPgdp.purban)
# -0.01245 -0.18127

Resources