cv.glm variable lengths differ - r

I am trying to cv.glm on a linear model however each time I do I get the error
Error in model.frame.default(formula = lindata$Y ~ 0 + lindata$HomeAdv + :
variable lengths differ (found for 'air-force-falcons')
air-force-falcons is the first variable in the dataset lindata. When I run glm I get no errors. All the variables are in a single dataset and there are no missing values.
> linearmod5<- glm(lindata$Y ~ 0 + lindata$HomeAdv + ., data=lindata, na.action="na.exclude")
> set.seed(1)
> cv.err.lin=cv.glm(lindata,linearmod5,K=10)
Error in model.frame.default(formula = lindata$Y ~ 0 + lindata$HomeAdv + :
variable lengths differ (found for 'air-force-falcons')
I do not know what is driving this error or the solution. Any ideas? Thank you!

What is causing this error is a mistake in the way you specify the formula
This will produce the error:
mod <- glm(mtcars$cyl ~ mtcars$mpg + .,
data = mtcars, na.action = "na.exclude")
cv.glm(mtcars, mod, K=11) #nrow(mtcars) is a multiple of 11
This not:
mod <- glm(cyl ~ ., data = mtcars)
cv.glm(mtcars, mod, K=11)
neither this:
mod <- glm(cyl ~ + mpg + disp, data = mtcars)
cv.glm(mtcars, mod, K=11)
What happens is that you specify the variable in like mtcars$cyl this variable have a number of rows equal to that of the original dataset. When you use cv.glm you partition the data frame in K parts, but when you refit the model on the resampled data it evaluates the variable specified in the form data.frame$var with the original (non partitioned) length, the others (that specified by .) with the partitioned length.
So you have to use relative variable in the formula (without $).
Other advices on formula:
avoid using a mix of specified variables and . you double variables. The dot is for all vars in the df except those on the left of tilde.
Why do you add a zero? if it is in the attempt to remove the intercept use -1 instead. However, this is a bad practice in my opinion

Related

Loop mixed linear model longitudinal time data assessing groups effect on the continous y variable

EDITED:
I'm trying to assess the effect of variables (e.g. presence of severe trauma) on a continous variable (here energy expenditure (=REE) in calories) over time (Day). The dataframe is called my_data. Amongst the variables
Following I would like to display the results using the mixed linear model for each assessed variable in one large file.
General concept:
REE ~ Time*predictor + (1 + Time | Case identifier)
(1) Starting creating the lmer model:
library(tidyverse)
library(ggpmisc)
library(sjPlot)
library(lme4)
mixed.modelloop <- function(x) {
lmer(REE ~ Day*(x) + (1 + Day | Studynumber),
data=my_data,
REML=FALSE,
na.action=na.omit,
control = lmerControl(check.nobs.vs.nRE = "ignore"))
}
(2) Then creating the predictors (x)
cols <- c(colnames(my_data))
(3) And then generating the overall purrr function:
output <- purrr::map(cols, ~ mixed.modelloop(.x) %>% tab_model)
(4) generating the file which should include all separate univariate mixed model analyses:
pdf(file="mixed linear models.pdf" )
output
dev.off()
Unfortunately currently after step (3) I'm getting the following error message:
Error in model.frame.default(data = my_data, na.action = na.omit, drop.unused.levels = TRUE, :
variable lengths differ (found for 'x')
Any idea on how to adapt the function to resolve this issue?
Thanks!
Formulas have special rules, you can't insert a string into them and expect them to work.
This should work, although you haven't given a reproducible example to test with ...
mixed.modelloop <- function(x) {
form <- reformulate(c(sprintf("Day*%s", x), "(1 + Day | Studynumber)"),
response = "REE")
lmer(form,
data=my_data,
REML=FALSE,
na.action=na.omit,
control = lmerControl(check.nobs.vs.nRE = "ignore"))
}

MuMIn dredge gam error using default na.omit

I have a global model I'm trying to dredge, but I keep getting the error "Error in dredge(myglobalmod, evaluate = TRUE, trace = 2) :
'global.model' uses 'na.action' = "na.omit"
I tried running the global model with na.action="na.omit" within the gam() call and leaving it out (since it's the default).
myglobalmod <- gam(response~ s(x1) + s(x2) + s(x3) + offset(x4), data=mydata, family="tw", na.action="na.omit")
options(na.action=na.omit)
mydredge <- dredge(myglobalmod, evaluate=TRUE, trace=2)
When I didn't include na.action="na.omit" within the gam, I got a similar error.
I then tried with a subset of the data that has all the NA rows removed, but same error.
I've gotten dredge to work before so I'm not sure why it doesn't like the na.omit now, I'm using the same code.
MuMIn insists that you use na.action = na.fail, in order to ensure that the same data set is used for every model (if NA values were left in the data set, different subsets could be used for different models depending on which variables were used). You can use na.omit(mydata) or mydata[complete.cases(mydata), ] to get rid of NA values before you start (assuming that the NA values in your data set occur only in variables you will be using for the full model).
> library(MuMIn)
> m1 <- lm(mpg ~ ., data = mtcars)
> d0 <- dredge(m1)
Error in dredge(m1) :
'global.model''s 'na.action' argument is not set and options('na.action') is "na.omit"
> m1 <- lm(mpg ~ ., data = mtcars, na.action = na.fail)
> d1 <- dredge(m1)
Fixed term is "(Intercept)"

glm fit with iris invalid first argument, must be vector (list or atomic)

I have the following working code
glm.fit <- glm(Income ~ .,data=train,family=binomial)
summary(glm.fit)
However there are some questions I want to ask, and so I can ask the questions I decided to try and reproduce the code using the iris data set.
I tried
cf<-iris
glm.fit(Petal.Width ~ ., cf, family = binomial)
but I get an error
Error in dim(data) <- dim : invalid first argument, must be vector (list or atomic)
[Update]
I see the data I expect using the following
library(dplyr)
cf<-iris
cf %>% head(10)
There are some issues with your code.
First, there's no need to create the variable cf. You can just use iris.
Second, glm.fit takes as its first 2 arguments x and y. From the documentation, accessible at ?glm.fit:
For glm.fit: x is a design matrix of dimension n * p, and y is a vector of observations of length n.
Your first line of code uses glm to create a variable named glm.fit - this is not the same as the function of that name.
If you want to use glm, that function can take a formula and the name of a data frame as arguments. So this works:
glm(Petal.Width ~ ., data = iris)
But this gives an error:
glm(Petal.Width ~ ., data = iris, family = binomial)
Error in eval(family$initialize) : y values must be 0 <= y <= 1
That's because the response variable, Petal.Width is continuous. You use the binomial family when the response takes 2 values (yes/no, 0/1, true/false).

Issues with logit regression in r

I am trying to run a logit regression and I tried two approaches:
m.logit <- glm(p4 ~ scale(log(gdp,orthodox,swb)),
data = happiness,
family = binomial("logit"))
summary(m.logit)
Throws: Error in summary(m.logit) : object 'm.logit' not found
While
m1.logit <- glm(p4 ~ gdp + orthodox + swb, family = binomial(link = "logit"), data = happiness)
Throws: Error in eval(family$initialize) : y values must be 0 <= y <= 1
I kind of understood the errors (in the former case m.logit is not found, and in the latter, I need to transform the variables I think...) but don't know how to solve it...
Any help?

Force inclusion of observations with missing data in lmer

I want to fit a linear mixed-effects model using lme4::lmer without discarding observations with missing data. That is, I want lmer to go ahead and maximize the likelihood using all the data.
Am I correct in thinking that using na.pass produces this behavior? This unanswered question is making me wonder if this might be wrong.
lmer(like most model functions) can't deal with missing data. To illustrate that:
data(Orthodont,package="nlme")
Orthodont$nsex <- as.numeric(Orthodont$Sex=="Male")
Orthodont$nsexage <- with(Orthodont, nsex*age)
Orthodont[1, 2] <- NA
lmer(distance ~ age + (age|Subject) + (0+nsex|Subject) +
(0 + nsexage|Subject), data=Orthodont, na.action = na.pass)
#Error in lme4::lFormula(formula = distance ~ age + (age | Subject) + (0 + :
# NA in Z (random-effects model matrix): please use "na.action='na.omit'" or "na.action='na.exclude'"
If you don't want to discard observations with missing data, your only option is imputation. Check out packages like mice or Amelia.

Resources