How do I extract lmer fixed effects by observation? - r

I have a lme object, constructed from some repeated measures nutrient intake data (two 24-hour intake periods per RespondentID):
Male.lme2 <- lmer(BoxCoxXY ~ -1 + AgeFactor + IntakeDay + (1|RespondentID),
data = Male.Data,
weights = SampleWeight)
and I can successfully retrieve the random effects by RespondentID using ranef(Male.lme1). I would also like to collect the result of the fixed effects by RespondentID. coef(Male.lme1) does not provide exactly what I need, as I show below.
> summary(Male.lme1)
Linear mixed model fit by REML
Formula: BoxCoxXY ~ AgeFactor + IntakeDay + (1 | RespondentID)
Data: Male.Data
AIC BIC logLik deviance REMLdev
9994 10039 -4990 9952 9980
Random effects:
Groups Name Variance Std.Dev.
RespondentID (Intercept) 0.19408 0.44055
Residual 0.37491 0.61230
Number of obs: 4498, groups: RespondentID, 2249
Fixed effects:
Estimate Std. Error t value
(Intercept) 13.98016 0.03405 410.6
AgeFactor4to8 0.50572 0.04084 12.4
AgeFactor9to13 0.94329 0.04159 22.7
AgeFactor14to18 1.30654 0.04312 30.3
IntakeDayDay2Intake -0.13871 0.01809 -7.7
Correlation of Fixed Effects:
(Intr) AgFc48 AgF913 AF1418
AgeFactr4t8 -0.775
AgeFctr9t13 -0.761 0.634
AgFctr14t18 -0.734 0.612 0.601
IntkDyDy2In -0.266 0.000 0.000 0.000
I have appended the fitted results to my data, head(Male.Data) shows
NutrientID RespondentID Gender Age SampleWeight IntakeDay IntakeAmt AgeFactor BoxCoxXY lmefits
2 267 100020 1 12 0.4952835 Day1Intake 12145.852 9to13 15.61196 15.22633
7 267 100419 1 14 0.3632839 Day1Intake 9591.953 14to18 15.01444 15.31373
8 267 100459 1 11 0.4952835 Day1Intake 7838.713 9to13 14.51458 15.00062
12 267 101138 1 15 1.3258785 Day1Intake 11113.266 14to18 15.38541 15.75337
14 267 101214 1 6 2.1198688 Day1Intake 7150.133 4to8 14.29022 14.32658
18 267 101389 1 5 2.1198688 Day1Intake 5091.528 4to8 13.47928 14.58117
The first couple of lines from coef(Male.lme1) are:
$RespondentID
(Intercept) AgeFactor4to8 AgeFactor9to13 AgeFactor14to18 IntakeDayDay2Intake
100020 14.28304 0.5057221 0.9432941 1.306542 -0.1387098
100419 14.00719 0.5057221 0.9432941 1.306542 -0.1387098
100459 14.05732 0.5057221 0.9432941 1.306542 -0.1387098
101138 14.44682 0.5057221 0.9432941 1.306542 -0.1387098
101214 13.82086 0.5057221 0.9432941 1.306542 -0.1387098
101389 14.07545 0.5057221 0.9432941 1.306542 -0.1387098
To demonstrate how the coef results relate to the fitted estimates in Male.Data (which were grabbed using Male.Data$lmefits <- fitted(Male.lme1), for the first RespondentID, who has the AgeFactor level 9-13:
- the fitted value is 15.22633, which equals - from the coeffs - (Intercept) + (AgeFactor9-13) = 14.28304 + 0.9432941
Is there a clever command for me to use that will do want I want automatically, which is to extract the fixed effect estimate for each subject, or am I faced with a series of if statements trying to apply the correct AgeFactor level to each subject to get the correct fixed effect estimate, after deducting the random effect contribution off the Intercept?
Update, apologies, was trying to cut down on the output I was providing and forgot about str(). Output is:
>str(Male.Data)
'data.frame': 4498 obs. of 11 variables:
$ NutrientID : int 267 267 267 267 267 267 267 267 267 267 ...
$ RespondentID: Factor w/ 2249 levels "100020","100419",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Gender : int 1 1 1 1 1 1 1 1 1 1 ...
$ Age : int 12 14 11 15 6 5 10 2 2 9 ...
$ BodyWeight : num 51.6 46.3 46.1 63.2 28.4 18 38.2 14.4 14.6 32.1 ...
$ SampleWeight: num 0.495 0.363 0.495 1.326 2.12 ...
$ IntakeDay : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ...
$ IntakeAmt : num 12146 9592 7839 11113 7150 ...
$ AgeFactor : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ...
$ BoxCoxXY : num 15.6 15 14.5 15.4 14.3 ...
$ lmefits : num 15.2 15.3 15 15.8 14.3 ...
The BodyWeight and Gender aren't being used (this is the males data, so all the Gender values are the same) and the NutrientID is similarly fixed for the data.
I have been doing horrible ifelse statements sinced I posted, so will try out your suggestion immediately. :)
Update2: this works perfectly with my current data and should be future-proof for new data, thanks to DWin for the extra help in the comment for this. :)
AgeLevels <- length(unique(Male.Data$AgeFactor))
Temp <- as.data.frame(fixef(Male.lme1)['(Intercept)'] +
c(0,fixef(Male.lme1)[2:AgeLevels])[
match(Male.Data$AgeFactor, c("1to3", "4to8", "9to13","14to18", "19to30","31to50","51to70","71Plus") )] +
c(0,fixef(Male.lme1)[(AgeLevels+1)])[
match(Male.Data$IntakeDay, c("Day1Intake","Day2Intake") )])
names(Temp) <- c("FxdEffct")

Below is how I've always found it easiest to extract the individuals' fixed effects and random effects components in the lme4-package. It actually extracts the corresponding fit to each observation. Assuming we have a mixed-effects model of form:
y = Xb + Zu + e
where Xb are the fixed effects and Zu are the random effects, we can extract the components (using lme4's sleepstudy as an example):
library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
# Xb
fix <- getME(fm1,'X') %*% fixef(fm1)
# Zu
ran <- t(as.matrix(getME(fm1,'Zt'))) %*% unlist(ranef(fm1))
# Xb + Zu
fixran <- fix + ran
I know that this works as a generalized approach to extracting components from linear mixed-effects models. For non-linear models, the model matrix X contains repeats and you may have to tailor the above code a bit. Here's some validation output as well as a visualization using lattice:
> head(cbind(fix, ran, fixran, fitted(fm1)))
[,1] [,2] [,3] [,4]
[1,] 251.4051 2.257187 253.6623 253.6623
[2,] 261.8724 11.456439 273.3288 273.3288
[3,] 272.3397 20.655691 292.9954 292.9954
[4,] 282.8070 29.854944 312.6619 312.6619
[5,] 293.2742 39.054196 332.3284 332.3284
[6,] 303.7415 48.253449 351.9950 351.9950
# Xb + Zu
> all(round((fixran),6) == round(fitted(fm1),6))
[1] TRUE
# e = y - (Xb + Zu)
> all(round(resid(fm1),6) == round(sleepstudy[,"Reaction"]-(fixran),6))
[1] TRUE
nobs <- 10 # 10 observations per subject
legend = list(text=list(c("y", "Xb + Zu", "Xb")), lines = list(col=c("blue", "red", "black"), pch=c(1,1,1), lwd=c(1,1,1), type=c("b","b","b")))
require(lattice)
xyplot(
Reaction ~ Days | Subject, data = sleepstudy,
panel = function(x, y, ...){
panel.points(x, y, type='b', col='blue')
panel.points(x, fix[(1+nobs*(panel.number()-1)):(nobs*(panel.number()))], type='b', col='black')
panel.points(x, fixran[(1+nobs*(panel.number()-1)):(nobs*(panel.number()))], type='b', col='red')
},
key = legend
)

It is going to be something like this (although you really should have given us the results of str(Male.Data) because model output does not tell us the factor levels for the baseline values:)
#First look at the coefficients
fixef(Male.lme2)
#Then do the calculations
fixef(Male.lme2)[`(Intercept)`] +
c(0,fixef(Male.lme2)[2:4])[
match(Male.Data$AgeFactor, c("1to3", "4to8", "9to13","14to18") )] +
c(0,fixef(Male.lme2)[5])[
match(Male.Data$IntakeDay, c("Day1Intake","Day2Intake") )]
You are basically running the original data through a match function to pick the correct coefficient(s) to add to the intercept ... which will be 0 if the data is the factor's base level (whose spelling I am guessing at.)
EDIT: I just noticed that you put a "-1" in the formula so perhaps all of your AgeFactor terms are listed in the output and you can tale out the 0 in the coefficient vector and the invented AgeFactor level in the match table vector.

Related

Error in MEEM(object, conLin, control$niterEM) in lme function

I'm trying to apply the lme function to my data, but the model gives follow message:
mod.1 = lme(lon ~ sex + month2 + bat + sex*month2, random=~1|id, method="ML", data = AA_patch_GLM, na.action=na.exclude)
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
dput for data, copy from https://pastebin.com/tv3NvChR (too large to include here)
str(AA_patch_GLM)
'data.frame': 2005 obs. of 12 variables:
$ lon : num -25.3 -25.4 -25.4 -25.4 -25.4 ...
$ lat : num -51.9 -51.9 -52 -52 -52 ...
$ id : Factor w/ 12 levels "24641.05","24642.03",..: 1 1 1 1 1 1 1 1 1 1 ...
$ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
$ bat : int -3442 -3364 -3462 -3216 -3216 -2643 -2812 -2307 -2131 -2131 ...
$ year : chr "2005" "2005" "2005" "2005" ...
$ month : chr "12" "12" "12" "12" ...
$ patch_id: Factor w/ 45 levels "111870.17_1",..: 34 34 34 34 34 34 34 34 34 34 ...
$ YMD : Date, format: "2005-12-30" "2005-12-31" "2005-12-31" ...
$ month2 : Ord.factor w/ 7 levels "January"<"February"<..: 7 7 7 7 7 1 1 1 1 1 ...
$ lonsc : num [1:2005, 1] -0.209 -0.213 -0.215 -0.219 -0.222 ...
$ batsc : num [1:2005, 1] 0.131 0.179 0.118 0.271 0.271 ...
What's the problem?
I saw a solution applying the lme4::lmer function, but there is another option to continue to use lme function?
The problem is that you have collinear combinations of predictors. In particular, here are some diagnostics:
## construct the fixed-effect model matrix for your problem
X <- model.matrix(~ sex + month2 + bat + sex*month2, data = AA_patch_GLM)
lc <- caret::findLinearCombos(X)
colnames(X)[lc$linearCombos[[1]]]
## [1] "sexM:month2^6" "(Intercept)" "sexM" "month2.L"
## [5] "month2.C" "month2^4" "month2^5" "month2^6"
## [9] "sexM:month2.L" "sexM:month2.C" "sexM:month2^4" "sexM:month2^5"
This is in a weird order, but it suggests that the sex × month interaction is causing problems. Indeed:
with(AA_patch_GLM, table(sex, month2))
## sex January February March April May June December
## F 367 276 317 204 43 0 6
## M 131 93 90 120 124 75 159
shows that you're missing data for one sex/month combination (i.e., no females were sampled in June).
You can:
construct the sex/month interaction yourself (data$SM <- with(data, interaction(sex, month2, drop = TRUE))) and use ~ SM + bat — but then you'll have to sort out main effects and interactions yourself (ugh)
construct the model matrix by hand (as above), drop the redundant column(s), then include all the resulting columns in the model:
d2 <- with(AA_patch_GLM,
data.frame(lon,
as.data.frame(X),
id))
## drop linearly dependent column
## note data.frame() has "sanitized" variable names (:, ^ both converted to .)
d2 <- d2[names(d2) != "sexM.month2.6"]
lme(reformulate(colnames(d2)[2:15], response = "lon"),
random=~1|id, method="ML", data = d2)
Again, the results will be uglier than the simpler version of the model.
use a patched version of nlme (I submitted a patch here but it hasn't been considered)
remotes::install_github("bbolker/nlme")

Inflection point for binomial mixed GLM model

I'd like to explore some possibilities and comparison approaches for inflection point calculation for the binomial mixed GLM model. I find the inflection package that used Extremum Surface Estimator (ESE) and Extremeum Distance Estimator (EDE). I make:
library(inflection)
library(dplyr)
library(glmmTMB)
library(DHARMa)
library(ggplot2)
library(ggeffects)
# My binomial data set
binom.ds <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/mort_binon.csv")
str(binom.ds)
# 'data.frame': 400 obs. of 4 variables:
# $ temp : num 0 0 0 0 0 0 0 0 0 0 ...
# $ days : int 5 5 5 5 5 5 5 5 5 5 ...
# $ rep : chr "r1" "r2" "r3" "r4" ...
# $ mortality: int 0 1 1 1 1 1 1 1 0 1 ...
# Fit a binomial mixed GLM model
m_F <- glmmTMB(mortality ~ temp + days +
(1 | days ), data = binom.ds,
family = "binomial")
# Check the fitted model using DHARMa
plot(s1 <- simulateResiduals(m_F))
# All look likes OK
# Find a inflection point
# for temp
ds_F <- cbind(x=binom.ds$temp,y=exp(predict(m_F)))
ds_F<-as.data.frame(ds_F)
bb=bede(ds_F$x,ds_F$y,0);bb
bb$iplast
# [1] 12.5
# $iters
# n a b EDE
# 1 400 0 25 12.5
# Vizualize the inflection point for temp
ggpredict(m_F, terms = "temp [all]") %>% plot(add.data = TRUE) + geom_vline(xintercept = bb$iplast, colour="red", linetype = "longdash")
#for days
ds_F <- cbind(x=binom.ds$days,y=exp(predict(m_F)))
ds_F<-as.data.frame(ds_F)
bb2=bede(ds_F$x,ds_F$y,0);bb2
bb2$iplast
# [1] 22.5
# $iters
# n a b EDE
# 1 400 5 30 17.5
# 2 221 5 30 17.5
# 3 181 15 5 10.0
# 4 61 15 30 22.5
# Vizualize the inflection point for days
ggpredict(m_F, terms = "days [all]") %>% plot(add.data = TRUE) + geom_vline(xintercept = bb2$iplast, colour="red", linetype = "longdash")
My question is there other approaches/packages for this calculus?

clmm model summary does not show p-values

I run the following model in R:
clmm_br<-clmm(Grado_amenaza~Life_Form + size_max_cm +
leaf_length_mean + petals_length_mean +
silicua_length_mean + bloom_length + categ_color+ (1|Genero) ,
data=brasic1)
I didn't get any warnings or errors but when I run summary(clmm_br) I can't get the p-values:
summary(clmm_br)
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: Grado_amenaza ~ Life_Form + size_max_cm + leaf_length_mean +
petals_length_mean + silicua_length_mean + bloom_length +
categ_color + (1 | Genero)
data: brasic1
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 76 -64.18 160.36 1807(1458) 1.50e-03 NaN
Random effects:
Groups Name Variance Std.Dev.
Genero (Intercept) 0.000000008505 0.00009222
Number of groups: Genero 39
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Life_Form[T.G] 2.233338 NA NA NA
Life_Form[T.Hem] 0.577112 NA NA NA
Life_Form[T.Hyd] -22.632916 NA NA NA
Life_Form[T.Th] -1.227512 NA NA NA
size_max_cm 0.006442 NA NA NA
leaf_length_mean 0.008491 NA NA NA
petals_length_mean 0.091623 NA NA NA
silicua_length_mean -0.036001 NA NA NA
bloom_length -0.844697 NA NA NA
categ_color[T.2] -2.420793 NA NA NA
categ_color[T.3] 1.268585 NA NA NA
categ_color[T.4] 1.049953 NA NA NA
Threshold coefficients:
Estimate Std. Error z value
1|3 -1.171 NA NA
3|4 1.266 NA NA
4|5 1.800 NA NA
(4 observations deleted due to missingness)
I tried with no random effects and excluding the rows with NAs but it's the same.
The structure of my data:
str(brasic1)
tibble[,13] [80 x 13] (S3: tbl_df/tbl/data.frame)
$ ID : num [1:80] 135 137 142 145 287 295 585 593 646 656 ...
$ Genero : chr [1:80] "Alyssum" "Alyssum" "Alyssum" "Alyssum" ...
$ Cons.stat : chr [1:80] "LC" "VU" "VU" "LC" ...
$ Amenazada : num [1:80] 0 1 1 0 1 0 0 1 0 0 ...
$ Grado_amenaza : Factor w/ 5 levels "1","3","4","5",..: 1 2 2 1 4 1 1 2 1 1 ...
$ Life_Form : chr [1:80] "Th" "Hem" "Hem" "Th" ...
$ size_max_cm : num [1:80] 12 6 7 15 20 27 60 62 50 60 ...
$ leaf_length_mean : num [1:80] 7.5 7 11 14.5 31.5 45 90 65 65 39 ...
$ petals_length_mean : num [1:80] 2.2 3.5 5.5 2.55 6 8 10.5 9.5 9.5 2.9 ...
$ silicua_length_mean: num [1:80] 3.5 4 3.5 4 25 47.5 37.5 41.5 17.5 2.9 ...
$ X2n : num [1:80] 32 NA 16 16 NA NA 20 20 18 14 ...
$ bloom_length : num [1:80] 2 1 2 2 2 2 2 2 11 2 ...
$ categ_color : chr [1:80] "1" "4" "4" "4" ...
For a full answer we really need a reproducible example, but I can point to a few things that raise suspicions.
The fact that you can get estimates, but not standard errors, implies that there is something wrong with the Hessian (the estimate of the curvature of the log-likelihood surface at the maximum likelihood estimate), but there are several distinct (possibly overlapping possibilities)
any time you have a "large" parameter estimate (say, absolute value > 10), as in your example (Life_Form[T.Hyd] = -22.632916), it suggests complete separation, i.e. the presence/absence of that parameter perfectly predicts the response. (You can search for that term, e.g. on CrossValidated.) However, complete separation usually leads to absurdly large standard errors (along with the large parameter estimates) rather than to NAs.
you may have perfect multicollinearity, i.e. combinations of your predictor variables that are perfectly (jointly) correlated with other such combinations. Some R estimation procedures can detect and deal with this case (typically by dropping one or more predictors), but clmm might not be able to. (You should be able to construct your model matrix (X <- model.matrix( your_formula, your_data), excluding the random effect from the formula) and then use caret::findLinearCombos(X) to explore this issue.)
More generally, if you want to do reliable inference you may need to cut down the size of your model (not by stepwise or other forms of model selection); a rule of thumb is that you need 10-20 observations per parameter estimated. You're trying to estimate 12 fixed effect parameters plus a few more (ordinal-threshold parameters and random effect variance) from 80 observations ...
In addition to dropping random effects, it may be useful to a diagnosis to fit a regular linear model with lm() (which should tell you something about collinearity, by dropping parameters) or a binomial model based on some threshold grade values (which might help with identifying complete separation).

Interpreting categorical variable importance in logistic regression

I'm using the caret package in R to build a logistic regression model for binary classification and one of my predictors is a categorical variable with 4 levels. Below is my code.
> mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
> mydata$admit <- factor(mydata$admit)
> mydata$rank <- factor(mydata$rank)
> str(mydata)
'data.frame': 400 obs. of 4 variables:
$ admit: Factor w/ 2 levels "0","1": 1 2 2 2 1 2 2 1 2 1 ...
$ gre : int 380 660 800 640 520 760 560 400 540 700 ...
$ gpa : num 3.61 3.67 4 3.19 2.93 3 2.98 3.08 3.39 3.92 ...
$ rank : Factor w/ 4 levels "1","2","3","4": 3 3 1 4 4 2 1 2 3 2 ...
> mymod <- train(admit ~ gre + gpa + rank, data=mydata, method="glm", family="binomial")
> summary(mymod)$coeff
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.989979073 1.139950936 -3.500132 0.0004650273
gre 0.002264426 0.001093998 2.069864 0.0384651284
gpa 0.804037549 0.331819298 2.423119 0.0153878974
rank2 -0.675442928 0.316489661 -2.134171 0.0328288188
rank3 -1.340203916 0.345306418 -3.881202 0.0001039415
rank4 -1.551463677 0.417831633 -3.713131 0.0002047107
> varImp(mymod)
glm variable importance
Overall
rank3 100.00
rank4 90.72
gpa 19.50
rank2 3.55
gre 0.00
My question is, how do I interpret varImp for the model, especially with respect to rank? Since R has assumed rank1 to the be baseline class, does varImp being highest for rank3 mean that admit is most different for the observations when rank is 3 in comparison with when rank is 1? If this is the case, it doesn't seem to have the same story as the coefficients of the model because rank4 has a steeper slope than rank3, even though it is of lower importance according to varImp.

Visualising logistic regression using the effects package in R

I am using the effects package in R to plot the effects of categorical and numerical predictors in a binomial logistic regression estimated using the lme4 package. My dependent variable is the presence or absence of a virus in an individual animal and my predictive factors are various individual traits (eg. sex, age, month/year captured, presence of parasites, scaled mass index (SMI), with site as a random variable).
When I use the allEffects function on my regression, I get the plots below. When compared to the model summary output below, you can see that the slope of each line appears to be zero, regardless of the estimated coefficients, and there is something strange going on with the scale of the y-axes where the ticks and tick labels appear to be overwritten on the same point.
Here is my code for the model and the summary output:
library(lme4)
library(effects)
virus1.mod<-glmer(virus1~ age + sex + month.yr + parasites + SMI + (1|site) , data=virus1data, family=binomial)
virus1.effects<-allEffects(virus1.mod)
plot(virus1.effects, ylab="Probability(infected)", rug=FALSE)
> summary(virus1.mod)
Generalized linear mixed model fit by maximum likelihood ['glmerMod']
Family: binomial ( logit )
Formula: virus1 ~ age + sex + month.yr + parasite + SMI + (1 | site)
Data: virus1data
AIC BIC logLik deviance
189.5721 248.1130 -76.7860 153.5721
Random effects:
Groups Name Variance Std.Dev.
site (Intercept) 4.729e-10 2.175e-05
Number of obs: 191, groups: site, 6
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.340e+00 2.572e+00 2.076 0.03789 *
ageJ 1.126e+00 8.316e-01 1.354 0.17583
sexM -3.943e-02 4.562e-01 -0.086 0.93113
month.yrFeb-08 -2.259e+01 6.405e+04 0.000 0.99972
month.yrFeb-09 -2.201e+01 2.741e+04 -0.001 0.99936
month.yrJan-08.516e+00 8.175e-01 -3.078 0.00208 **
month.yrJan-09 -2.607e+00 8.066e-01 -3.232 0.00123 **
month.yrJul-08 -1.428e+00 8.571e-01 -1.666 0.09563 .
month.yrJul-09 -2.795e+00 1.170e+00 -2.389 0.01691 *
month.yrJun-08 -2.259e+01 3.300e+04 -0.001 0.99945
month.yrMar-09 -5.451e-01 6.705e-01 -0.813 0.41622
month.yrMar-08 -1.863e+00 7.921e-01 -2.352 0.01869 *
month.yrMay-09 -6.319e-01 8.956e-01 -0.706 0.48047
month.yrMay-08 3.818e-01 1.015e+00 0.376 0.70691
month.yrSep-08 2.563e+01 5.806e+05 0.000 0.99996
parasiteTRUE -6.329e-03 4.834e-01 -0.013 0.98955
SMI -3.438e-01 1.616e-01 -2.127 0.03342 *
And str of my data frame:
> str(virus1data)
'data.frame': 191 obs. of 8 variables:
$ virus1 : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 2 1 1 ...
$ age : Factor w/ 2 levels "A","J": 1 1 1 1 1 1 1 1 1 1 ...
$ sex : Factor w/ 2 levels "F","M": 2 2 2 2 1 1 2 1 2 2 ...
$ site : Factor w/ 6 levels “site1”,"site2”,"site3",..: 1 1 1 1 2 2 2 3 2 3 ...
$ rep : Factor w/ 7 levels "NRF","L","NR",..: 3 7 3 7 1 1 3 1 7 7 ...
$ month.yr : Factor w/ 17 levels "Feb-08","Feb-09",..: 4 5 5 5 13 7 14 9 9 9 ...
$ parasite : Factor w/ 2 levels "FALSE","TRUE": 1 1 2 1 1 2 2 1 2 1 ...
$ SMI : num 14.1 14.8 14.5 13.1 15.3 ...
- attr(*, "na.action")=Class 'omit' Named int [1:73] 6 12 13 21 22 23 24 25 26 27 ...
.. ..- attr(*, "names")= chr [1:73] "1048" "1657" "1866" "2961" ...
Without making my actual data available, does anyone have an idea of what might be causing this? I have used this function with a different dataset (same independent variables but a different virus as the response variable, and different records) without problems.
This is the first time I have posted on CV, so I hope that the question is appropriate and that I have provided enough (and the right) information.

Resources