A strange case of singular fit in lme4 glmer - simple random structure with variation among groups - r

Background
I am trying to test for differences in wind speed data among different groups. For the purpose of my question, I am looking only on side wind (wind direction that is 90 deg from the individual), and I only care about the strength of the wind. Thus, I use absolute values. The range of wind speeds is 0.0004-6.8 m/sec and because I use absolute values, Gamma distribution describes it much better than normal distribution.
My data contains 734 samples from 68 individuals, with each individual having between 1-30 repeats. However, even if I reduce my samples to only include individuals with at least 10 repeats (which leaves me with 26 individuals and a total of 466 samples), I still get the problematic error.
The model
The full model is Wind ~ a*b + c*d + (1|individual), but for the purpose of this question, the simple model of Wind ~ 1 + (1|individual) gives the same singularity error, so I do not think that the explanatory variables are the problem.
The complete code line is glmer(Wind ~ 1 + (1|individual), data = X, family = Gamma(log))
The problem and the strange part
When running the model, I get the boundary (singular) fit: see ?isSingular error, although, as you can see, I use a very simple model and random structure. The strange part is that I can solve this by adding 0.1 to the Wind variable (i.e. glmer(Wind+0.1 ~ 1 + (1|Tag), data = X, family = Gamma(log)) does not give any error). I honestly do not remember why I added 0.1 the first time I did it, but I was surprised to see that it solved the error.
The question
Is this a problem with lme4? Am I missing something? Any ideas what might cause this and why does me adding 0.1 to the variable solve this problem?
Edit following questions
I am not sure what's the best way to add data, so here is a link to a csv file in Google drive
using glmmTMB does not produce any warnings with the basic formula glmmTMB(Wind ~ 1 + (1|Tag), data = X, family = Gamma(log)), but gives convergence problems warnings ('non-positive-definite Hessian matrix') when using the full model (i.e., Wind ~ a*b + c*d + (1|individual)), which are then solved if I scale the continuous variables

Related

Gamma GLM: NaN production and divergence errors

Intro
I'm trying to construct a GLM that models the quantity (mass) of eggs the specimens of a fish population lays depending on its size and age.
Thus, the variables are:
eggW: the total mass of layed eggs, a continuous and positive variable ranging between 300 and 30000.
fishW: mass of the fish, continuous and positive, ranging between 3 and 55.
age: either 1 or 2 years.
No 0's, no NA's.
After checking and realising assuming a normal distribution was probably not appropriate, I decided to use a Gamma distribution. I chose Gamma basically because the variable was positive and continuous, with increasing variance with higher values and appeared to be skewed, as you can see in the image below.
Frequency distribution of eggW values:
fishW vs eggW:
The code
myglm <- glm(eggW ~ fishW * age, family=Gamma(link=identity),
start=c(mean(data$eggW),1,1,1),
maxit=100)
I added the maxit factor after seeing it suggested on a post of this page as a solution to glm.fit: algorithm did not converge error, and it worked.
I chose to work with link=identity because of the more obvious and straightforward interpretation of the results in biological terms rather than using an inverse or log link.
So, the code above results in the next message:
Warning messages: 1: In log(ifelse(y == 0, 1, y/mu)) : NaNs
produced 2: step size truncated due to divergence
Importantly, no error warnings are shown if the variable fishW is dropped and only age is kept. No errors are reported if a log link is used.
Questions
If the rationale behind the design of my model is acceptable, I would like to understand why these errors are reported and how to solve or avoid them. In any case, I would appreciate any criticism or suggestions.
You are looking to determine the weight of the eggs based upon age and weight of the fish correct? I think you need to use:
glm(eggW ~ fishW + age, family=Gamma(link=identity)
Instead of
glm(eggW ~ fishW * age, family=Gamma(link=identity)
Does your dataset have missing values?
Are your variables highly correlated?
Turn fishW * age into a seperate column and just pass that to the algo

Why is my glmer model in R taking so long to run?

I had previously been using simple stats in Statistica, but required R for my masters research. I am trying to run the following code to test for any significant interactions, and it is just running forever. If I simplify the model by taking month out, then it runs, but biologically it makes sense that month is significant so I would really like this to run including month as a factor. Once I run the model, the stop sign in R studio just stays present for hours, what could be the reason for this? Like I said I'm very new and it has been really difficult to learn this on my own. I am working with presence/absence data (as %) which I do cbind as my dependent variable. SO far this is what my coad looks like:
library(car)
library(languageR)
library(AICcmodavg)
library(lme4)
Scat <- read.csv("Scat2.csv", header=T)
attach(Scat)
names(Scat)
y <- cbind(Present,Absent)
ScatData <- glmer(y ~ Estate * Species * Month * Content * (1|Site) + Min + Max,family=binomial)
summary(ScatData)
Once I get to running the actual model, I don't even get to do the summary because R is not done computing the results of the actual model. I ran the model for approximately 4 hours, and when I clicked on the stop sign, I received this message:
Warning message:
In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf, :
failure to converge in 10000 evaluations
I would really appreciate some input on this matter.
You have a few problems with your model specification. Your model
y ~ Estate * Species * Month * Content * (1|Site) + Min + Max
is asking for all the main effects and interactions of estate, species, month, content, and site, which is incredibly complex.
Also, you have specified site as a random effect and asked for its interaction with fixed effects. I'm not sure whether that's possible, but it certainly seems wrong. You should decide whether you want site to be a fixed effect or a random effect.
If you post a minimal replicable example, I can give more specific advice.

Repeated effect LMM in R

Until recently I used SPSS for my statistics, but since I am not in University any more, I am changing to R. Things are going well, but I can't seem to replicate the results I obtained for repeated effect LMM in SPSS. I did find some treads here which seemed relevant, but those didn't solve my issues.
This is the SPSS script I am trying to replicate in R
MIXED TriDen_L BY Campaign Watering Heating
/CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1)
SINGULAR(0.000000000001) HCONVERGE(0,
ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
/FIXED=Campaign Watering Heating Campaign*Watering Campaign*Heating
Watering*Heating Campaign*Watering*Heating | SSTYPE(3)
/METHOD=REML
/PRINT=TESTCOV
/RANDOM=Genotype | SUBJECT(Plant_id) COVTYPE(AD1)
/REPEATED=Week | SUBJECT(Plant_id) COVTYPE(AD1)
/SAVE=PRED RESID
Using the lme4 package in R I have tried:
lmm <- lmer(lnTriNU ~ Campaign + Watering + Heating + Campaign*Watering
+ Campaign*Heating + Watering*Heating + Campaign*Watering*Heating
+ (1|Genotype) + (1|Week:Plant_id), pg)
But this -and the other options I have tried for the random part- keep producing an error:
Error: number of levels of each grouping factor must be < number of observations
Obviously in SPSS everything is fine. I am suspecting I am not correctly modelling the repeated effect? Also saving predicted and residual values is not yet straightforward for me...
I hope anyone can point me in the right direction.
You probably need to take out either Week or Plant_id, as I think you have as many values for either variable as you have cases. You can nest observations within a larger unit if you add a variable to model time. I am not familiar with SPSS, but if your time variable is Week (i.e., if week has a value of 1 for the first observation, 2 for the second etc.), then it should not be a grouping factor but a random effect in the model. Something like <snip> week + (1 + week|Plant_id).
k.
Is Plant_id nested within Genotype, and Week indicate different measure points? If so, I assume that following formula leads to the required result:
lmm <- lmer(lnTriNU ~ Campaign + Watering + Heating + Campaign*Watering
+ Campaign*Heating + Watering*Heating + Campaign*Watering*Heating
+ (1+Week|Genotype/Plant_id), pg)
Also saving predicted and residual values is not yet straightforward for me...
Do you mean "computing" by "saving"? In R, all relevant information are in the returned object, and accessible through functions like residuals() or predict() etc., called on the saved object (in your case, residuals(lmm)).
Note that, by default, lmer does not use AD1-covtype.

Nested model in R

I'm having a huge problem with a nested model I am trying to fit in R.
I have response time experiment with 2 conditions with 46 people each and 32 measures each. I would like measures to be nested within people and people nested within conditions, but I can't get it to work.
The code I thought should make sense was:
nestedmodel <- lmer(responsetime ~ 1 + condition +
(1|condition:person) + (1|person:measure), data=dat)
However, all I get is an error:
Error in checkNlevels(reTrms$flist, n = n, control) :
number of levels of each grouping factor must be < number of observations
Unfortunately, I do not even know where to start looking what the problem is here.
Any ideas? Please, please, please? =)
Cheers!
This might be more appropriate on CrossValidated, but: lme4 is trying to tell you that one or more of your random effects is confounded with the residual variance. As you've described your data, I don't quite see why: you should have 2*46*32=2944 total observations, 2*46=92 combinations of condition and person, and 46*32=1472 combinations of measure and person.
If you do
lf <- lFormula(responsetime ~ 1 + condition +
(1|condition:person) + (1|person:measure), data=dat)
and then
lapply(lf$reTrms$Ztlist,dim)
to look at the transposed random-effect design matrices for each term, what do you get? You should (based on your description of your data) see that these matrices are 1472 by 2944 and 92 by 2944, respectively.
As #MrFlick says, a reproducible example would be nice. Other things you could show us are:
fit the model anyway, using lmerControl(check.nobs.vs.nRE="ignore") to ignore the test, and show us the results (especially the random effects variances and the statement of the numbers of groups)
show us the results of with(dat,table(table(interaction(condition,person))) to give information on the number of replicates per combination (and similarly for measure)

Is there an implementation of loess in R with more than 3 parametric predictors or a trick to a similar effect?

Calling all experts on local regression and/or R!
I have run into a limitation of the standard loess function in R and hope you have some advice. The current implementation supports only 1-4 predictors. Let me set out our application scenario to show why this can easily become a problem as soon as we want to employ globally fit parametric covariables.
Essentially, we have a spatial distortion s(x,y) overlaid over a number of measurements z:
z_i = s(x_i,y_i) + v_{g_i}
These measurements z can be grouped by the same underlying undistorted measurement value v for each group g. The group membership g_i is known for each measurement, but the underlying undistorted measurement values v_g for the groups are not known and should be determined by (global, not local) regression.
We need to estimate the two-dimensional spatial trend s(x,y), which we then want to remove. In our application, say there are 20 groups of at least 35 measurements each, in the most simple scenario. The measurements are randomly placed. Taking the first group as reference, there are thus 19 unknown offsets.
The below code for toy data (with a spatial trend in one dimension x) works for two or three offset groups.
Unfortunately, the loess call fails for four or more offset groups with the error message
Error in simpleLoess(y, x, w, span, degree, parametric, drop.square,
normalize, :
only 1-4 predictors are allowed"
I tried overriding the restriction and got
k>d2MAX in ehg136. Need to recompile with increased dimensions.
How easy would that be to do? I cannot find a definition of d2MAX anywhere, and it seems this might be hardcoded -- the error is apparently triggered by line #1359 in loessf.f
if(k .gt. 15) call ehg182(105)
Alternatively, does anyone know of an implementation of local regression with global (parametric) offset groups that could be applied here?
Or is there a better way of dealing with this? I tried lme with correlation structures but that seems to be much, much slower.
Any comments would be greatly appreciated!
Many thanks,
David
###
#
# loess with parametric offsets - toy data demo
#
x<-seq(0,9,.1);
x.N<-length(x);
o<-c(0.4,-0.8,1.2#,-0.2 # works for three but not four
); # these are the (unknown) offsets
o.N<-length(o);
f<-sapply(seq(o.N),
function(n){
ifelse((seq(x.N)<= n *x.N/(o.N+1) &
seq(x.N)> (n-1)*x.N/(o.N+1)),
1,0);
});
f<-f[sample(NROW(f)),];
y<-sin(x)+rnorm(length(x),0,.1)+f%*%o;
s.fs<-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')});
s<-paste(c('y~x',s.fs),collapse='+');
d<-data.frame(x,y,f)
names(d)<-c('x','y',s.fs);
l<-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d,
span=0.4);
yp<-predict(l,newdata=d);
plot(x,y,pch='+',ylim=c(-3,3),col='red'); # input data
points(x,yp,pch='o',col='blue'); # fit of that
d0<-d; d0$f1<-d0$f2<-d0$f3<-0;
yp0<-predict(l,newdata=d0);
points(x,y-f%*%o); # spatial distortion
lines(x,yp0,pch='+'); # estimate of that
op<-sapply(seq(NCOL(f)),function(i){(yp-yp0)[!!f[,i]][1]});
cat("Demo offsets:",o,"\n");
cat("Estimated offsets:",format(op,digits=1),"\n");
Why don't you use an additive model for this? Package mgcv will handle this sort of model, if I understand your Question, just fine. I might have this wrong, but the code you show is relating x ~ y, but your Question mentions z ~ s(x, y) + g. What I show below for gam() is for response z modelled by a spatial smooth in x and y with g being estimated parametrically, with g stored as a factor in the data frame:
require(mgcv)
m <- gam(z ~ s(x,y) + g, data = foo)
Or have I misunderstood what you wanted? If you want to post a small snippet of data I can give a proper example using mgcv...?

Resources