Here's my model:
revisitsm0 <- glmmTMB(cbind(revisits_per_bout, tot_visits_bout - revisits_per_bout) ~ experiment_type * foraging_bout + (1|colony/bee_id), data=table_training, family=binomial)
My model doesn't fit so well because of dispersion, so I square-rooted my variables "revisits_per_bout" and "tot_visits_bout", hence giving me non-integers.
Since quasibinomial is not available in GLMMTMB, how can I fix this?
Thanks!
This really belongs on CrossValidated, because it is a question of "what to do" more than "how to do it".
family = "betabinomial" should be the simplest way to handle overdispersion (I would not recommend transforming the response variable. It is a good rule of thumb that if you have count-type (non-negative integer) responses, it's best to model them on the original scale).
if overdispersion seems to be associated with particular predictors (e.g. dispersion is different for different experiment types, even after taking the mean-variance relationship of the beta-binomial into account; you can use the location-scale plot [sqrt(abs(pearson_resids)) vs. the predictor of interest) to assess this, you can use family = "betabinomial" plus a dispformula = argument
you could also use an observation-level random effect
you can also do a post hoc conversion of a binomial fit to a quasi-binomial fit following the recipe in the GLMM FAQ 'fitting models with overdispersion' section (this link has more general advice/info on methods for dealing with overdispersion)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
I am new to Gamms and gams, so this question may be a bit basic, I'd appreciate your help on this very much:
I am using the following code:
M.gamm <- gamm (bsigsi ~ s(summodpa, sed,k= 1, fx= TRUE, bs="tp") + s(sumlightpa, sed, k=1, fx= TRUE, bs="tp") , random = list(school=~ 1) , method= "ML", na.action= na.omit, data= Pilot_fitbit2)
The code runs, but gives me this feedback:
Warning messages: 1: In smooth.construct.tp.smooth.spec(object,
dk$data, dk$knots) : basis dimension, k, increased to minimum
possible
2: In smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
basis dimension, k, increased to minimum possible
Questions:
My major question is however how I can get an AIC or BIC from this?
I've tried BIC(M.gamm$gam) and BIC(M.gamm$lme), since gamm exists of both parts (lme and gam), and for the latter one (with lme) I do get a value, bot for the first one, I don'get a value. Does anyone know why and how I can get one?
The issue is that I would like to compare this value to the BIC value of a gam model, and I am not sure which one (BIC(M.gamm$lme) or BIC(M.gam$gam)) would be the correct one. It is possible for me to derive a BIC and AIC for the gam and lme model.
If I'd be able to get the AIC or BIC for the gamm model - how can I know I can trust the results? What do I need to be careful with so I interpret the result correctly? Currently, I am using ML in all models and also use the same package (mgcv) to estimate lme, gam, and gamm to estabilish comparability.
Any help/ advice or ideas on this would be greatly appreciated!!
Best wishes,
Noemi
Thank you very much for this!
This warnings come as a result of requesting a smoother basis of a single function for each of your two smooths; this doesn't make any sense as both such bases would only contain equivalent of constant functions, both of which are are unidentifiable given you have another constant term (the intercept) in your model. Once mgcv applies identifiable to constraints the two smooths would get dropped entirely from the model.
Hence the warnings; mgcv didn't do what you wanted. Instead it set k to be the smallest values possible. Set k to something larger; you might as well leave it at the default and not specify it in the s() if you want a low rank smooth. Also, unless you really want an unpenalized spline fit, don't use fix = TRUE.
I'm not really familiar with any theory for BIC applied to GAM(M)s that corrects for smoothness selection. The AIC method for gam() models estimated using REML smoothness selection does have some theory beyond it, including a recent paper by Simon Wood and colleagues.
The mgcv FAQ has the following two things to say
How can I compare gamm models? In the identity link normal errors case, then AIC and hypotheis testing based methods are fine. Otherwise it is best to work out a strategy based on the summary.gam Alternatively, simple random effects can be fitted with gam, which makes comparison straightforward. Package gamm4 is an alternative, which allows AIC type model selection for generalized models.
When using gamm or gamm4, the reported AIC is different for the gam object and the lme or lmer object. Why is this? There are several reasons for this. The most important is that the models being used are actually different in the two representations. When treating the GAM as a mixed model, you are implicitly assuming that if you gathered a replicate dataset, the smooths in your model would look completely different to the smooths from the original model, except for having the same degree of smoothness. Technically you would expect the smooths to be drawn afresh from their distribution under the random effects model. When viewing the gam from the usual penalized regression perspective, you would expect smooths to look broadly similar under replication of the data. i.e. you are really using Bayesian model for the smooths, rather than a random effects model (it's just that the frequentist random effects and Bayesian computations happen to coincide for computing the estimates). As a result of the different assumptions about the data generating process, AIC model comparisons can give rather different answers depending on the model adopted. Which you use should depend on which model you really think is appropriate. In addition the computations of the AICs are different. The mixed model AIC uses the marginal likelihood and the corresponding number of model parameters. The gam model uses the penalized likelihood and the effective degrees of freedom.
So, I'd probably stick to AIC, not use BIC. I'd be thinking about which interpretation of the GAM(M) I was interested most in. I'd also likely fit the random effects you have here using gam() if they are this simple. An equivalent model would include + s(school, bs = 're') in the main formula and exclude the random bit whilst using gam()
gam(bsigsi ~ s(summodpa, sed) + s(sumlightpa, sed) +
s(school, bs = 're'), data = Pilot_fitbit2,
method = 'REML')
Do be careful with 2D isotopic smooths; both sed and summodpa and sumlightpa need to be in the same units have the same degrees of wiggliness in each smooth. If these aren't in the same units or have different wigglinesses, use te() instead of s() for the 2D terms.
Also be careful with variables that appear in two or more smooths like this; mgcv will do it's best to make the models identifiable, but you can easily get into computational problems even so. A better modelling approach would to be estimate the marginal effects of sed and the other terms plus their 2nd order interactions by decomposing the effects in the two 2d smooths as follows:
gam(bsigsi ~ s(sed) + s(summodpa) + s(sumlightpa) +
ti(summodpa, sed) + ti(sumlightpa, sed) +
s(school, bs = 're'), data = Pilot_fitbit2,
method = 'REML')
where the ti() smooths are tensor product interaction bases, when're the main effects of the two marginal variables have been removed from the basis. Hence you can treat them as a pure smooth interaction term. In this way, the main effect of sed is contained in a single smooth term.
I know that there dosens of similar questions/answers, and lots of papers. But please read till the end.
Non-statisticians tend to use stepwise regressions which is strongly argued by statisticians. This is stomething that I don't understand, but I just obey them. "Ok this is not a good way to do your modelling".
Here is (was) my model:
b <- lmer(metric1~a+b+c+d+e+f+g+h+i+j+k+l+(1|X/Y) + (1|Z), data = dataset)
drop1 (b, test="Chisq")
(Just a small note: Watch out for the random effects in my model; random effects are Year, Month, Sampling.location; one of my variables is 1/0: I allready log-transformed my variables)
I am trying to find a exploratory model (with drop1 to reach final model) and evaluating it with my biological knowledge to see if the dependent ("metric" in this case) seems to be responding variables. I will repeat this process with 100 metrics just to evaulate which metrics seems to be responding environmental variables.
I was in the search for an acceptable model instead of stepwise according to the suggestions of statistics gurus.
However, there are lots of alternatives. I read alot, but still feel myself lost. Some say Lasso, some say elastic modelling, some say ridge regression... Which one fits for my purpose?
Any advise for a better alternative and an easy model or a help page for dummies, or examples (that could be better) would be much appreciated.
Thanks in advance.
I have started to work with GAM in R and I’ve acquired Simon Wood’s excellent book on the topic. Based on one of his examples, I am looking at the following:
library(mgcv)
data(trees)
ct1<-gam(log(Volume) ~ Height + s(Girth), data=trees)
I have two general questions to this example:
How does one decide when a variable in the model estimation should be parametric (such as Height) or when it should be smooth (such as Girth)? Does one hold an advantage over the other and is there a way to determine what is the optimal type for a variable is? If anybody has any literature about this topic, I’d be happy to know of it.
Say I want to look closer at the weights of ct1: ct1$coefficients. Can I use them as the gam-procedure outputs them, or do I have to transform them before analyzing them given that I am fitting to log(Volume)? In the case of the latter, I guess I would have to use exp (ct1$coefficients)
What is the generic code to perform a repeated measures ANOVA?
I am currently using the code:
summary(aov(Chlo~Site,data=alldata)).
There are three different sites (Site) and four expirmental variables that I am testing individually (Chlo, SST, DAC and PAR). I am also assessing any differences in these variables and year (between 2003 and 2012):
summary(aov(Chlo~Year,data=year))
Any help would be appreciated!
In general you should avoid performing multiple calls with aov and rather use a mixed effects linear model.
You can find several examples in this post by Paul Gribble
I often use the nlme package, such as:
require(nlme)
model <- lme(dv ~ myfactor, random = ~1|subject/myfactor, data=mydata)
Depending on the situation you may run in more complex situations, I would suggest to have a look at the very clear book by Julian Faraway "Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models".
Also, you may want to ask on CrossValidated if you have more specific statistical questions.
The trick using aov function is that you just need to add Error term. As one of the guides says: The Error term must reflect that we have "treatments nested within subjects".
So, in your case, if Site is a repeated measure you should use:
summary(aov(Chlo ~ Site + Error(subject/Site), data=alldata))