R not removing terms when doing MAM - r

I want to do a MAM, but I'm having difficulty in removing some terms:
full.model<-glm(SSB_sq~Veg_height+Bare+Common+Birds_Foot+Average_March+Average_April+
Average_May+Average_June15+Average_June20+Average_June25+Average_July15
+Average_July20+Average_July25,family="poisson")
summary(full.model)
I believe I have to remove these terms to start the MAM like so:
model1<-update(full.model,~.-Veg_height:Bare:Common:Birds_Foot:Average_March
:Average_April:Average_May:Average_June15:Average_June20:Average_June25:
Average_July15:Average_July20:Average_July25,family="poisson")
summary(model1)
anova(model1,full.model,test="Chi")
But I get this output:
anova(model1,full.model,test="Chi")
Analysis of Deviance Table
Model 1: SSB_sq ~ Veg_height + Bare + Common + Birds_Foot + Average_March +
Average_April + Average_May + Average_June15 + Average_June20 +
Average_June25 + Average_July15 + Average_July20 + Average_July25
Model 2: SSB_sq ~ Veg_height + Bare + Common + Birds_Foot + Average_March +
Average_April + Average_May + Average_June15 + Average_June20 +
Average_June25 + Average_July15 + Average_July20 + Average_July25
Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1 213 237.87
2 213 237.87 0 0
I've tried putting plus signs in model1 instead of colons, as I was clutching at straws when reading my notes but the same thing happens.
Why are both my models the same? I've tried searching on Google but I'm not very good at the terminology so my searches aren't bringing up much.

If I read your intention correctly, are you trying to fit a null model with no terms in it? If so, a simpler way is just to use the SSB_sq ~ 1 as the formula, meaning a model with only an intercept.
fit <- lm(sr ~ ., data = LifeCycleSavings)
fit0 <- lm(sr ~ 1, data = LifeCycleSavings)
## or via an update:
fit01 <- update(fit, . ~ 1)
Which gives, for example:
> anova(fit)
Analysis of Variance Table
Response: sr
Df Sum Sq Mean Sq F value Pr(>F)
pop15 1 204.12 204.118 14.1157 0.0004922 ***
pop75 1 53.34 53.343 3.6889 0.0611255 .
dpi 1 12.40 12.401 0.8576 0.3593551
ddpi 1 63.05 63.054 4.3605 0.0424711 *
Residuals 45 650.71 14.460
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(fit, fit0)
Analysis of Variance Table
Model 1: sr ~ pop15 + pop75 + dpi + ddpi
Model 2: sr ~ 1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 45 650.71
2 49 983.63 -4 -332.92 5.7557 0.0007904 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
An explanation of the formulae I use:
The first model used the shortcut ., which means all remaining variables in argument data (in my model it meant all variables in LifeCycleSavings on the RHS of the formula, except for sr which is already on the LHS).
In the second model (fit0), we only include 1 on the RHS of the formula. In R, 1 means an intercept, so sr ~ 1 means fit an intercept-only model. By default, an intercept is assumed, hence we did not need 1 when specifying the first model fit.
If you want to suppress an intercept, add - 1 or + 0 to your formula.
For your data, the first model would be:
full.model <- glm(SSB_sq ~ ., data = FOO, family = "poisson")
where FOO is the data frame holding your variables - you are using a data frame, aren't you? The null, intercept-only model would be specified using one of:
null.model <- glm(SSB_sq ~ 1, data = FOO, family = "poisson")
or
null.model <- update(full.model, . ~ 1)

Related

How to run a linear hypothesis with interaction?

My regression is as follow :
model <- lm(y ~ a:b + a + b + c)
And I want to test whereas the coefficients of my interaction so "a:b" and my variable "a" are equal to 0, or if at least one is different from 0.
I know that I need to use linearHypothesis.
But I only managed to test if at least one of the coefficients of my interaction is different from 0.
linearHypothesis(model,matchCoefs(model,":"))
Do you know how to enter into the linearHypothesis my variable "a" ?
Thanks for your help.
You can pass the names of the variables you want to test if ar both equal 0.
Dummy data:
a = rnorm(100)
b = rnorm(100)
c = rnorm(100)
y = 4 + 1*a + 3*b + 0.5*c + 2*a*b + rnorm(100)
mod = lm(y ~ a:b + a + b + c)
>car::linearHypothesis(mod, c("a","a:b"))
Linear hypothesis test
Hypothesis:
a = 0
a:b = 0
Model 1: restricted model
Model 2: y ~ a:b + a + b + c
Res.Df RSS Df Sum of Sq F Pr(>F)
1 97 657.20
2 95 116.31 2 540.88 220.89 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If there was only one of a or a*b in the model, the null would also be rejected, only if none of them were present. To see that try setting y = 4 + 3*b + 0.5*c + rnorm(100) and y = 4 + 3*b + 0.5*c + a + rnorm(100). But the test isn't perfect, if there is too much noise on y, even if a and a:b were in the model we would accept the null (try y = 4 + 1*a + 3*b + 0.5*c + 2*a*b + rnorm(100, sd=1000))

Passing predictors to a cox model from a list

I'm running a pretty simple cox model from the survminer package.
surv_object <- Surv(time, event)
model <- coxph(surv_object ~ female + age + ethnicity + imd, data = df)
I need to run multiple Cox models, and for each model, my predictors change. I have all my predictors stored in a separate data frame such as this (we'll call it pred_df):
> pred_df
# A tibble: 4 x 2
predictor endpoint
<chr> <chr>
1 female Mortality
2 age Mortality
3 ethnicity Mortality
4 imd Mortality
Is there an easy way to pass the items from the predictor column to coxph()? Something like this:
coxph(surv_object ~ predictors, data = df)
What I've tried already:
I've tried a rather clumsy hack along these lines:
pred_vars <- pred_df %>%
pull(predictor) %>% # extract column values as a vector
paste(collapse = " + ") %>% # combine values in a string
parse(text = . ) # parse the string as an expression
model <- coxph(surv_object ~ eval(pred_vars), data = df)
R actually understands this and runs the model. But the output is uninterpretable. The model seems to run but does not output individual predictors i.e. female, age, ethnicity and imd. Instead it just outputs eval(pred_vars)
Call:
coxph(formula = Surv(time, event) ~ eval(pred_vars), data = df)
n= 62976, number of events= 12882
(3287 observations deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
eval(pred_vars) 3.336e-05 1.000e+00 5.339e-06 6.249 4.14e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
eval(pred_vars) 1 1 1 1
Concordance= 0.515 (se = 0.003 )
Rsquare= 0.001 (max possible= 0.989 )
Likelihood ratio test= 38.28 on 1 df, p=6e-10
Wald test = 39.04 on 1 df, p=4e-10
Score (logrank) test = 39.07 on 1 df, p=4e-10
There must be a simpler way of doing this?
Try reformulate.
formula <- reformulate(
termlabels = pred_df[[1, "predictor"]],
response = pred_df[[1, "endpoint"]]
)
coxph(formula = formula, data = df)
You can do this in base R with as.formula and paste(..., collapse = " + "), like...
foo <- as.formula(paste0("Surv(time, event) ~ ", paste(pred_df$predictors, collapse = " + ")))
Result of that line:
> foo
Surv(time, event) ~ female + age + ethnicity + imd
And then you just pass foo to your call to coxph.

How to build AIC model selection table using mlogit models

I'm trying to build an AIC table for my candidate set of models in R, which were run using mlogit. I've used glm and glmer in the past, and have always used the package AICcmodavg and aictab to extract values and create a model selection table. This package doesn't seem to work for mlogit, so I'm wondering if there are any other ways of creating an AIC table in R besides manual calculation using the log-likelihood value?
Example of mlogit model output:
Call:
mlogit(formula = Case ~ Dist_boulder + Mesohabitat + Depth +
Size + Size^2 | -1, data = reach.dc, method = "nr")
Frequencies of alternatives:
0 1 2 3
1 0 0 0
nr method
5 iterations, 0h:0m:0s
g'(-H)^-1g = 1.19E-05
successive function values within tolerance limits
Coefficients :
Estimate Std. Error z-value Pr(>|z|)
Dist_boulder -0.052165 0.162047 -0.3219 0.74752
Mesohabitatriffle -1.400752 0.612329 -2.2876 0.02216 *
Mesohabitatrun 0.302697 0.420181 0.7204 0.47128
Depth 0.137524 0.162521 0.8462 0.39745
Size 0.336949 0.145036 2.3232 0.02017 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log-Likelihood: -86.627
example of models run (from my candidate set of 14)
predation.reach<-mlogit(Case ~ Dist_boulder + Mesohabitat + Depth + Size + Size^2 | -1, data=reach.dc)
velocity.reach<-mlogit(Case ~ Mid_vel | -1, data=reach.dc)
spaces.reach<-mlogit(Case~ Embedded + Class | -1, data=reach.dc)
substrate.reach<-mlogit(Case ~ Class | -1, data=reach.dc)
defining candidate set list
cand.set.reach<-list(predation.reach, velocity.reach, spaces.reach, substrate.reach)
bbmle::AICtab() appears to work.
library("mlogit")
m1 <- mlogit(formula = mode ~ price + catch | income,
data = Fish,
alt.subset = c("charter", "pier", "beach"), method = "nr")
m2 <- update(m1, . ~ . - price)
bbmle::AICtab(m1,m2)
## dAIC df
## m1 0.0 6
## m2 412.1 5
By default bbmle::AICtab() gives only delta-AIC and the model degrees of freedom/number of parameters, but you can use optional arguments to get the absolute AIC, AIC weights, etc..
It also works with a list:
L <- list(m1,m2)
bbmle::AICtab(L)
In the tidyverse world,
library(broom)
L %>% purrr::map(augment) %>% bind_rows()
ought to work, but doesn't yet.

About using anova.cca() in vegan packages that show me :Error in terms.formula(formula,"Condition",data=data): 'data' argument is of the wrong type

Thanks to look at this question!
I make Redundancy analysis in R with vegan package using two data sets(one is microbial species, another is environment variables e.g pH DOC ...).
Code showing:
mod<- rda(species~.,data=environment)
summary(mod)
Call:
rda(formula = species ~ Treatment + SR + DOC + NH4 + NO3 + AG + BG + CB +
XYL + LAP + NAG + PHOS + PHOX + PEOX + pH + AP +MBC + MBN, data =
environment)
Partitioning of variance:
Inertia Proportion
Total 644.87 1.0000
Constrained 560.18 0.8687
Unconstrained 84.69 0.1313
Eigenvalues, and their contribution to the variance
Importance of components:
RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
RDA7
Eigenvalue 237.4215 126.5881 66.9355 31.62478 21.82210 18.56123
9.04857
Proportion Explained 0.3682 0.1963 0.1038 0.04904 0.03384 0.02878
0.01403
Cumulative Proportion 0.3682 0.5645 0.6683 0.71731 0.75115 0.77993
0.79396 ...
and than i run the following code. it's normal.
anova.cca(mod)
Permutation test for rda under reduced model
Permutation: free
Number of permutations: 999
Model: rda(formula = species ~ Treatment + SR + DOC + NH4 + NO3 + AG + BG +
CB + XYL + LAP + NAG + PHOS + PHOX + PEOX + pH + AP + MBC + MBN, data =
environment)
Df Variance F Pr(>F)
Model 21 560.18 7.2444 0.001 ***
Residual 23 84.69
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
after above i run the following to know the significance of axis.
anova.cca(mod,by = "axis")
Error in terms.formula(formula, "Condition", data = data) :
'data' argument is of the wrong type
i don't know what's happen, it confused me. I run the example:
data(dune)
data(dune.env)
dune.Manure <- rda(dune ~ ., dune.env)
anova.cca(dune.Manure,by='axis')
Permutation test for rda under reduced model
Forward tests for axes
Permutation: free
Number of permutations: 999
Model: rda(formula = dune ~ A1 + Moisture + Management + Use + Manure, data = dune.env)
Df Variance F Pr(>F)
RDA1 1 22.3955 7.4946 0.003 **
RDA2 1 16.2076 5.4239 0.066 .
RDA3 1 7.0389 2.3556 0.679
RDA4 1 4.0380 1.3513 0.998
....
Residual 7 20.9175
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
this works.I compare the data with example and mine, and find no mistakes (i think). The question is the error in "anova.cca(mod,by = "axis")", why it occurred?
. So how to deal this question.Thanks a lot!!!

Error comparing linear mixed effects models

I want to see whether the fixed effect Group2 in my model is significant. The model is:
Response ~ Group1 + Group2 + Gender + Age + BMI + (1 | Subject)
To check the significance I create a null model not containing the effect Group2:
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
and the full model containing the effect Group2:
Resp.model = lmer(Response~Group1+Group2+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
Then I use anova() to compare the two, but I get an error:
anova(Resp.null, Resp.model)
## Error in anova.merMod(Resp.null, Resp.model) :
## models were not all fitted to the same size of dataset
I think that the problem is that Group1 contains NaN, but I thought that linear mixed models were robust to missing data.
How can I solve this problem and compare the two models?
Do I have to delete the rows corresponding to NaN and fit Resp.null without these rows?
The data can be downloaded here.
Please note that you should replace "<undefined>" with NaN like this:
mydata = read.csv("mydata.csv")
mydata[mydata == "<undefined>"] <- NA
To avoid the "models were not all fitted to the same size of dataset" error in anova, you must fit both models on the exact same subset of data.
There are two simple ways to do this, and while this reproducible example uses lm and update, for lmer objects the same approach should work:
# 1st approach
# define a convenience wrapper
update_nested <- function(object, formula., ..., evaluate = TRUE){
update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}
# prepare data with NAs
data(mtcars)
for(i in 1:ncol(mtcars)) mtcars[i,i] <- NA
xa <- lm(mpg~cyl+disp, mtcars)
xb <- update_nested(xa, .~.-cyl)
anova(xa, xb)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 2nd approach
xc <- update(xa, .~.-cyl, data=na.omit(mtcars[ , all.vars(formula(xa))]))
anova(xa, xc)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If however you're only interested in testing a single variable (e.g. Group2), then perhaps the Anova() or linearHypothesis() in car would work as well for this usecase.
See also:
How to update `lm` or `glm` model on same subset of data?
R error which says "Models were not all fitted to the same size of dataset"
Fit Resp.model first, then use Resp.model#frame as data argument.
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=Resp.model#frame,REML=FALSE)

Resources