I am trying to fix a glmm for a dataframe with 53 obs. of 17 variables. All variables are standardized, but don't follow the normal distribution and have no missing values. The str() of the data frame is something like below.
species : Factor w/ 19 levels "spp1","spp2",..: 5 18 12 15 19 4 6 14 16 5 ...
association : Factor w/ 4 levels "assoA","assoB",..: 1 1 2 2 2 3 3 4 4 1 ...
site : Factor w/ 2 levels "site1","site2": 1 1 1 1 1 1 1 1 1 2 ...
obs.no : int 1 1 1 1 1 1 1 1 1 1 ...
trait1: num 0.652 0.428 0.535 0.389 0.486 ...
trait2 : num 0.135 0.16 0.134 0.142 0.159
(clipped trait3 to 13)
I executed the following code to check the significance between sites an association classes for the given trait.
model1= glmer(trait1 ~ association+site+ (1 | species),data=df6,family=gaussian)
and received the error given below.
In glmer(trait1 ~ association+site+ (1 | species),data=df6, :
calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
After this I want to estimate parameters with Gauss-Hermite quadrature. Any recommendation to fix this error and code to execute Gauss-Hermite quadrature is very much appreciated.
You actually posted the answer. Use lmer, not glmer:
model1 = lmer(trait1~association+site+(1|species), data=df6)
Clarifying: the reason that glmer(..., family = gaussian(link = "identity")) is not allowed (and that lme4 insists you use lmer(...) instead) is that there is no point using numerical (Gauss-Hermite) quadrature for a linear mixed model (which is exactly the special case of a GLMM with a Gaussian response and an identity link); in this case the integral can be expressed in closed form as a penalized least-squares problem (conditional on the random-effects variance/covariance parameters): see Bates et al. 2015.
Related
I have recently conducted a study where I built a multinomial logistic regression model to investigate whether keystroke logging analytics (e.g., pause time in writing, general typing rate, revision behavior) predict argument elements (i.e., final claim, primary claim, data) in adult persuasive essay writing. In other words, I want to investigate whether adult writers exhibited different patterns of writing behaviors (manifested in their keystroke activities while writing on a keyboard) when they were producing different argument elements in their written argumentation.
Below is the structure of the data I used for the study:
'data.frame': 244 obs. of 11 variables:
$ ID : int 1 1 1 2 2 3 3 3 4 4 ...
$ Prompt : Factor w/ 2 levels "Appearance","Competition": 2 2 2 2 2 2 2 2 2 2 ...
$ element : Factor w/ 3 levels "Data","FinalClaim",..: 1 2 3 1 2 1 2 3 1 2 ...
$ product_process_ratio : num 0.885 0.864 0.992 0.797 0.827 ...
$ chars_process_per_min_incl_space : num 46.3 20.2 12 56 51.8 ...
$ mean_process_time_in_p_burst_pt200 : num 2.04 5.29 9.49 2.75 2.94 ...
$ mean_typed_chars_in_p_burst_pt200 : num 1.57 1.78 1.89 2.57 2.54 ...
$ mean_pause_time_in_seconds_pt200 : num 0.786 1.08 0.643 0.41 0.395 ...
$ proportion_of_pause_time_pt200 : num 0.385 0.2031 0.0656 0.1477 0.1327 ...
$ mean_pause_time_sec_within_words_pt200 : num 0.357 0.538 0.354 0.321 0.375 ...
$ mean_pause_time_sec_between_words_pt200: num 1.28 1.611 1.777 0.712 0.45 ...
Initially, I built a multinomial logistic regression model using the nnet R package where I entered the three-class categorical variable "element" as the dependent variable. I then included the eight keystroke logging measures (from product_process_ratio to mean_pause_time_sec_between_words_pt200) and the categorical variable "prompt" as the independent variables. The model worked well and I've got some interesting results form the analyses.
But then I realized that the observations were not independent of each other. In this case, each participant produced 1-3 argument elements of different categories although each of them just wrote one essay. I am wondering if I should account for the random effects of the individuals.
To sum up, here are my two questions:
Given the dependence of observations in my data as each individual created at least one argument element of the three categories: Final Claim, Primary Claim, Data, should I build a multinomial mixed logistic regression model to investigate whether keystroke analytics (e.g., proportion of pause time, mean P-burst length, product process ratio) predict different argument elements? Or is it ok for me to stick with a simpler multinomial logistic regression model? In other words, do the dependent observations need to be concerned in a traditional multinomial logistic regression model?
If the dependency in the observations should be accounted for, what model should I use to analyze the data? Is there a good R package available for this purpose? (P.S. I've tried the mlogit package to build a multinomial mixed logit model, but it seems that this package cannot handle the type of data as the one I use here.)
Thanks!
I am working on a two-way mixed ANOVA using the data below, using one dependent variable, one between-subjects variable and one within-subjects variable. When I tested the normality of the residuals, of the dependent variable, I find that they are not normally distributed. But at this point I am able to perform the two-way ANOVA. Howerver, when I perform a log10 transformation, and run the script again using the log transformed variable, I get the error "contrasts can be applied only to factors with 2 or more levels".
> str(m_runjumpFREQ)
'data.frame': 564 obs. of 8 variables:
$ ID1 : int 1 2 3 4 5 6 7 8 9 10 ...
$ ID : chr "ID1" "ID2" "ID3" "ID4" ...
$ Group : Factor w/ 2 levels "II","Non-II": 1 1 1 1 1 1 1 1 1 1 ...
$ Pos : Factor w/ 3 levels "center","forward",..: 2 1 2 3 2 2 1 3 2 2 ...
$ Match_outcome : Factor w/ 2 levels "W","L": 2 2 2 2 2 2 2 2 2 1 ...
$ time : Factor w/ 8 levels "runjump_nADJmin_q1",..: 1 1 1 1 1 1 1 1 1 1 ...
$ runjump : num 0.0561 0.0858 0.0663 0.0425 0.0513 ...
$ log_runjumpFREQ: num -1.25 -1.07 -1.18 -1.37 -1.29 ...
Some answers on StackOverflow to this error have mentioned that one or more factors in the data set, used for the ANOVA, are of less than two levels. But as seen above they are not.
Another explanation I have read is that it may be the issue of missing values, where there may be NA's. There is:
m1_nasum <- sum(is.na(m_runjumpFREQ$log_runjumpFREQ))
> m1_nasum
[1] 88
However, I get the same error even after removing the rows including NA's as follows.
> m_runjumpFREQ <- na.omit(m_runjumpFREQ)
> m1_nasum <- sum(is.na(m_runjumpFREQ$log_runjumpFREQ))
> m1_nasum
[1] 0
I could run the same script without log transformation and it would work, but with it, I get the same error. The factors are the same either way and the missing values do not make a difference. Either I am doing a crucial mistake or the issue is in the line of the log transformation below.
log_runjumpFREQ <- log10(m_runjumpFREQ$runjump)
m_runjumpFREQ <- cbind(m_runjumpFREQ, log_runjumpFREQ)
I appreciate the help.
It is not good enough that the factors have 2 levels. In addition those levels must actually be present. For example, below f has 2 levels but only 1 is actually present.
y <- (1:6)^2
x <- 1:6
f <- factor(rep(1, 6), levels = 1:2)
nlevels(f) # f has 2 levels
## [1] 2
lm(y ~ x + f)
## Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
## contrasts can be applied only to factors with 2 or more levels
I have a data.frame, df:
> str(df_ss)
'data.frame': 571 obs. of 4 variables:
$ final_grade : num 0.733 0.187 0.502 0.194 0.293 ...
$ time_spent : num -0.2 -0.326 -0.709 -0.168 -0.254 ...
$ gender_female: num 1 0 1 0 0 0 1 1 1 1 ...
$ course_ID : Factor w/ 26 levels "1","2","3","4",..: 14 18 13 21 24 15 3 24 9 13 ...
I am trying to see how time_spent moderates the relationship between gender_female and final_grade. I'm specifying a random effect for course_ID.
The models I specified using are as follows:
med.fit <- lme4::lmer(time_spent ~ gender_female + (1|course_ID), data = df)
out.fit <- lme4::lmer(final_grade ~ time_spent + gender_female + (1|course_ID), data = df_ss)
Those seemed to work fine.
Following an example using the lme4 package in a vignette for the mediation package, I specified this mediation model:
library(mediation)
med.out <- mediate(med.fit, out.fit, treat = "gender_female", mediator = "time_spent", dropobs = T)
This led to this error output: Error in mediate(med.fit, out.fit, treat = "gender_female", mediator = "time_spent",: mediator model is not yet implemented.
Per this mailing list question (and answer), I checked that:
inherits(mediatorModel, "merMod") returned TRUE and
getCall(mediatorModel)[[1]] returned lme4::lmer
Instead of lme4::lmer, you might try loading lme4 using library(lme4), then just call lmer. Looking at the mediate code shows that the error-handling checks are looking for an exact match for lmer i.e. getCall(model.m)[[1]] == "lmer".
I am using the effects package in R to plot the effects of categorical and numerical predictors in a binomial logistic regression estimated using the lme4 package. My dependent variable is the presence or absence of a virus in an individual animal and my predictive factors are various individual traits (eg. sex, age, month/year captured, presence of parasites, scaled mass index (SMI), with site as a random variable).
When I use the allEffects function on my regression, I get the plots below. When compared to the model summary output below, you can see that the slope of each line appears to be zero, regardless of the estimated coefficients, and there is something strange going on with the scale of the y-axes where the ticks and tick labels appear to be overwritten on the same point.
Here is my code for the model and the summary output:
library(lme4)
library(effects)
virus1.mod<-glmer(virus1~ age + sex + month.yr + parasites + SMI + (1|site) , data=virus1data, family=binomial)
virus1.effects<-allEffects(virus1.mod)
plot(virus1.effects, ylab="Probability(infected)", rug=FALSE)
> summary(virus1.mod)
Generalized linear mixed model fit by maximum likelihood ['glmerMod']
Family: binomial ( logit )
Formula: virus1 ~ age + sex + month.yr + parasite + SMI + (1 | site)
Data: virus1data
AIC BIC logLik deviance
189.5721 248.1130 -76.7860 153.5721
Random effects:
Groups Name Variance Std.Dev.
site (Intercept) 4.729e-10 2.175e-05
Number of obs: 191, groups: site, 6
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.340e+00 2.572e+00 2.076 0.03789 *
ageJ 1.126e+00 8.316e-01 1.354 0.17583
sexM -3.943e-02 4.562e-01 -0.086 0.93113
month.yrFeb-08 -2.259e+01 6.405e+04 0.000 0.99972
month.yrFeb-09 -2.201e+01 2.741e+04 -0.001 0.99936
month.yrJan-08.516e+00 8.175e-01 -3.078 0.00208 **
month.yrJan-09 -2.607e+00 8.066e-01 -3.232 0.00123 **
month.yrJul-08 -1.428e+00 8.571e-01 -1.666 0.09563 .
month.yrJul-09 -2.795e+00 1.170e+00 -2.389 0.01691 *
month.yrJun-08 -2.259e+01 3.300e+04 -0.001 0.99945
month.yrMar-09 -5.451e-01 6.705e-01 -0.813 0.41622
month.yrMar-08 -1.863e+00 7.921e-01 -2.352 0.01869 *
month.yrMay-09 -6.319e-01 8.956e-01 -0.706 0.48047
month.yrMay-08 3.818e-01 1.015e+00 0.376 0.70691
month.yrSep-08 2.563e+01 5.806e+05 0.000 0.99996
parasiteTRUE -6.329e-03 4.834e-01 -0.013 0.98955
SMI -3.438e-01 1.616e-01 -2.127 0.03342 *
And str of my data frame:
> str(virus1data)
'data.frame': 191 obs. of 8 variables:
$ virus1 : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 2 1 1 ...
$ age : Factor w/ 2 levels "A","J": 1 1 1 1 1 1 1 1 1 1 ...
$ sex : Factor w/ 2 levels "F","M": 2 2 2 2 1 1 2 1 2 2 ...
$ site : Factor w/ 6 levels “site1”,"site2”,"site3",..: 1 1 1 1 2 2 2 3 2 3 ...
$ rep : Factor w/ 7 levels "NRF","L","NR",..: 3 7 3 7 1 1 3 1 7 7 ...
$ month.yr : Factor w/ 17 levels "Feb-08","Feb-09",..: 4 5 5 5 13 7 14 9 9 9 ...
$ parasite : Factor w/ 2 levels "FALSE","TRUE": 1 1 2 1 1 2 2 1 2 1 ...
$ SMI : num 14.1 14.8 14.5 13.1 15.3 ...
- attr(*, "na.action")=Class 'omit' Named int [1:73] 6 12 13 21 22 23 24 25 26 27 ...
.. ..- attr(*, "names")= chr [1:73] "1048" "1657" "1866" "2961" ...
Without making my actual data available, does anyone have an idea of what might be causing this? I have used this function with a different dataset (same independent variables but a different virus as the response variable, and different records) without problems.
This is the first time I have posted on CV, so I hope that the question is appropriate and that I have provided enough (and the right) information.
I'm re-running Kaplan-Meier Survival Curves from previously published data, using the exact data set used in the publication (Charpentier et al. 2008 - Inbreeding depression in ring-tailed lemurs (Lemur catta): genetic diversity predicts parasitism, immunocompetence, and survivorship). This publication ran the curves in SAS Version 9, using LIFETEST, to analyze the age at death structured by genetic heterozygosity and sex of the animal (n=64). She reports a Chi square value of 6.31 and a p value of 0.012; however, when I run the curves in R, I get a Chi square value of 0.9 and a p value of 0.821. Can anyone explain this??
R Code used: Age is the time to death, mort is the censorship code, sex is the stratum of gender, and ho2 is the factor delineating the two groups to be compared.
> survdiff(Surv(age, mort1)~ho2+sex,data=mariekmsurv1)
Call:
survdiff(formula = Surv(age, mort1) ~ ho2 + sex, data = mariekmsurv1)
N Observed Expected (O-E)^2/E (O-E)^2/V
ho2=1, sex=F 18 3 3.23 0.0166 0.0215
ho2=1, sex=M 12 3 2.35 0.1776 0.2140
ho2=2, sex=F 17 5 3.92 0.3004 0.4189
ho2=2, sex=M 17 4 5.50 0.4088 0.6621
Chisq= 0.9 on 3 degrees of freedom, p= 0.821
> str(mariekmsurv1)
'data.frame': 64 obs. of 6 variables:
$ id : Factor w/ 65 levels "","aeschylus",..: 14 31 33 30 47 57 51 39 36 3 ...
$ sex : Factor w/ 3 levels "","F","M": 3 2 3 2 2 2 2 2 2 2 ...
$ mort1: int 0 0 0 0 0 0 0 0 0 0 ...
$ age : num 0.12 0.192 0.2 0.23 1.024 ...
$ sex.1: Factor w/ 3 levels "","F","M": 3 2 3 2 2 2 2 2 2 2 ...
$ ho2 : int 1 1 1 2 1 1 1 1 1 2 ...
- attr(*, "na.action")=Class 'omit' Named int [1:141] 65 66 67 68 69 70 71 72 73 74 ...
.. ..- attr(*, "names")= chr [1:141] "65" "66" "67" "68" ...
Some ideas:
Try running it in SAS -- see if you get the same results as the author. Maybe they didn't send you the exact same dataset they used.
Look into the default values of the relevant SAS PROC and compare to the defaults of the R function you are using.
Given the HUGE difference between the Chi-squared (6.81 and 0.9) and P values (0.012 and 0.821) beteween SAS procedure and R procedure for survival analyses; I suspect that you have used wrong variables in the either one of the procedures.
The procedural difference / (data handling difference between SAS and R can cause some very small differences ) .
This is not a software error, this is highly likely to be a human error.