split block in R lmer - r

A factorial combination of 16 treatments (4*2*2) was replicated three times and laid out in a strip-split block. Treatments consisted of eight site preparations (4*2) applied as whole plot treatments and two levels of weeding(weeding/no-weeding) were applied randomly to subplots. The analysis was run in Genstat giving the following results:
Variate: result
Source of variation d.f. s.s. m.s. v.r. F pr.
Rep stratum 2 35.735 17.868
Rep.Burning stratum
Burning 1 0.003 0.003 0.00 0.972
Residual 2 3.933 1.966 1.53
Rep.Site_prep stratum
Site_prep 3 7.981 2.660 0.45 0.727
Residual 6 35.477 5.913 4.61
Rep.Burning.Site_prep stratum
Burning.Site_prep 3 2.395 0.798 0.62 0.626
Residual 6 7.691 1.282 0.60
Rep.Burning.Site_prep.*Units* stratum
Weeding 1 13.113 13.113 6.13 0.025
Burning.Weeding 1 0.486 0.486 0.23 0.640
Site_prep.Weeding 3 17.703 5.901 2.76 0.076
Burning.Site_prep.Weed.3 3.425 1.142 0.53 0.666
Residual 16 34.248 2.141
Total 47 162.190
I want to repeat these results in R. I used both the base::aov function and the lmerTest::lmer function. I managed to get the correct results with aov using function
result ~ Burning * Weeding * Site.prep + Error(Rep/Burning*Site.prep). With lmer I used the function
result ~ Burning*Site.prep*Weeding+(1|Rep/(Burning:Site.prep)) giving me only partially correct results. The SS values and the F-values for Burning, Site.prep and Burning:Site.prep deviated (although not too much)from the Genstat results, but the Weeding and Weeding interactions gave the same SS and F-valus as the Genstat output.
I would like to know how I should specify the lmer model to reproduce the Genstat and aov results.
Data and code below:
x <- structure(list(
Rep = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1", "2", "3"
), class = "factor"),Burning = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("Burn",
"No-burn"), class = "factor"), Site.prep = structure(c(4L, 4L,4L, 4L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L),
.Label = c("Chop_Pit", "Chop_Rip", "Pit", "Rip"), class = "factor"), Weeding = structure(c(1L,
2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L),
.Label = c("Weedfree", "Weedy"), class = "factor"),
Dbh14 = c(27.4, 28.4083333333333, 27.7066666666667, 27.3461538461538, 28.6, 28.3333333333333, 27.0909090909091,
27.8076923076923, 27.1833333333333, 27.5461538461538, 24.3076923076923,
29.3461538461538, 27.4, 25.1, 26.61, 28.0461538461538, 27.71,
25.2533333333333, 25.3833333333333, 24.2307692307692, 24.2533333333333,
24.95, 24.34375, 26.9909090909091, 24.775, 25.9076923076923,
25.1666666666667, 25.9933333333333, 27.0466666666667, 30.5625,
27.36, 25.2636363636364, 29.6846153846154, 27.7, 28.3071428571429,
29.4857142857143, 27.025, 30.1, 31.2454545454545, 24.2888888888889,
28.4875, 29.23, 30, 28.5, 29.3615384615385, 27.45, 28.8153846153846,
29.1866666666667)), .Names = c("Rep", "Burning", "Site.prep",
"Weeding", "result"), class = "data.frame", row.names = c(NA, -48L))
model1 <- aov(result ~ Burning* Weeding*Site.prep+ Error(Rep/Burning*Site.prep), data=x)
summary(model1)
model2 <- lmer(result ~ Burning*Site.prep*Weeding+(1|Rep/(Burning:Site.prep)),data=x)
anova(model2)

Applying the three-way split-plot-factorial ANOVA example from the site mentioned by #cuttlefish44, leads to:
library(lme4)
library(nlme)
m1 <- aov(result ~ Weeding*Burning*Site.prep + Error(Rep/Burning*Site.prep), data=x)
m2 <- lmer(result ~ Weeding*Burning*Site.prep + (1|Rep) + (1|Burning:Rep) +
(1|Site.prep:Rep), data=x)
m3 <- anova(lme(result ~ Weeding*Burning*Site.prep,
random=list(Rep=pdBlocked(list(~1, pdIdent(~Burning-1), pdIdent(~Site.prep-1)))),
method="ML", data=x))
summary(m1)
anova(m2)
m3
Except for Site.prep, the results match. Moreover, the results between lmer() and lme() are pretty similar (also for Site.prep). I'm not sure whether this is the result of differences in modelling approaches: the multi-level approach takes both within and between effects into account.

Related

calculating adjusted R-Squared in R

I have the following dataset and I would like to calculate the adjusted r-squared based on this dataset.
I have the formula for adjusted R-Squared "Adjusted R2 = 1 – [(1-R2)*(n-1)/(n-k-1)]".
where:
R2: The R-Squared
n: is the number of observations, in this case, "DV.obs"
k: is the number of predictor variables, in this case, "nParam" (where its either 0,1,2,3)
the R code to calculate it is the following, where it is grouped by "ITER", iterations, we have 4 iterations.So the idea is to calculate adjusted R-Squared based on the iterations(4)
iteration 1, the nParam should only be 0, iteration 2, the nParam should only be 1, etc, instead of choosing every nParam in the dataset, since the nParam is exactly the same for each iteration.
The output should be only 4 rows ( for every iteration, as its grouped by(ITER)) and 2 columns (R2, and adjusted R-Squared) and not for every row in the data.
i hope i have explained myself well.
library(dplyr)
ff <- df %>%
group_by(ITER) %>%
summarise(
Rsq = cor(x= DV.obs, y = DV.sim)^2,
adjRsq = 1 - ((1-Rsq)*(length(DV.obs)-1)/(length(DV.obs)- nParam - 1 ))
)
ff
however, this formula will go through every predictor variable(nParam),
df<-structure(list(CASE = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), ITER = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L), nParam = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L
), DV.obs = c(0.101483807, 0.069196694, 0.053869542, 0.043831971,
0.030330271, 0.023612088, 0.01978679, 0.014310351, 0.01164389,
0.007267871, 0.004536453, 0.002873573, 0.002408037, 0.001417053,
0.001136154, 0.101483807, 0.069196694, 0.053869542, 0.043831971,
0.030330271, 0.023612088, 0.01978679, 0.014310351, 0.01164389,
0.007267871, 0.004536453, 0.002873573, 0.002408037, 0.001417053,
0.001136154, 0.101483807, 0.069196694, 0.053869542, 0.043831971,
0.030330271, 0.023612088, 0.01978679, 0.014310351, 0.01164389,
0.007267871, 0.004536453, 0.002873573, 0.002408037, 0.001417053,
0.001136154, 0.101483807, 0.069196694, 0.053869542, 0.043831971,
0.030330271, 0.023612088, 0.01978679, 0.014310351, 0.01164389,
0.007267871, 0.004536453, 0.002873573, 0.002408037, 0.001417053,
0.001136154, 0.000116054, 0.003829787, 0.01206963, 0.02088975,
0.027388781, 0.03423598, 0.037833661, 0.037369438, 0.035164408,
0.034584139, 0.02947776, 0.023210831, 0.014622821, 0.009632495,
0.006731141, 0.0027853, 0.000116054, 0.003829787, 0.01206963,
0.02088975, 0.027388781, 0.03423598, 0.037833661, 0.037369438,
0.035164408, 0.034584139, 0.02947776, 0.023210831, 0.014622821,
0.009632495, 0.006731141, 0.0027853, 0.000116054, 0.003829787,
0.01206963, 0.02088975, 0.027388781, 0.03423598, 0.037833661,
0.037369438, 0.035164408, 0.034584139, 0.02947776, 0.023210831,
0.014622821, 0.009632495, 0.006731141, 0.0027853, 0.000116054,
0.003829787, 0.01206963, 0.02088975, 0.027388781, 0.03423598,
0.037833661, 0.037369438, 0.035164408, 0.034584139, 0.02947776,
0.023210831, 0.014622821, 0.009632495, 0.006731141, 0.0027853
), DV.sim = c(0, 0.0889808909410658, 0.0947484349571132, 0.0798169790285827,
0.0574006922793388, 0.0505799935506284, 0.0468774569150804, 0.0417447990739346,
0.0375742405164242, 0.0306761993989349, 0.0251120797996223, 0.0205737193532288,
0.0168649279846251, 0.0138327510148287, 0.0113531698574871, 0,
0.0829660195227578, 0.0876380159497916, 0.0723450386112931, 0.0464863987773657,
0.0380595525625348, 0.0343245102453232, 0.0307144539731741, 0.0283392784461379,
0.0245820489723981, 0.0214487023548782, 0.0187365858632326, 0.0163729577744008,
0.0143107050991059, 0.0125108672587574, 0, 0.0762191578459362,
0.0737615750578683, 0.0549565160764756, 0.0280085518714786, 0.0206076781625301,
0.0172540310333669, 0.0134899928846955, 0.0108952926749736, 0.00728254194885496,
0.00491441482789815, 0.00332488210681827, 0.00225250494349749,
0.00152820673925803, 0.00103880306820386, 0, 0.0329456788891303,
0.0365534415712808, 0.03318406650424, 0.0278133129626513, 0.0238151342895627,
0.0205330317793787, 0.0155563822799921, 0.0119589968463779, 0.0072024345056713,
0.00437676923945547, 0.00266755578568207, 0.00162810577310623,
0.000994532813206324, 0.000607859854716811, 0, 0.00238890872602278,
0.02000716184065, 0.0509446502289174, 0.0907202677155637, 0.173563302880525,
0.223891823887825, 0.2226231635499, 0.19175603264451, 0.168494781267643,
0.150974664176703, 0.136206244819164, 0.111464575245381, 0.0913691590994598,
0.0749306779146197, 0.0504548476848009, 0, 0.00141190656836649,
0.0124264488774641, 0.0328390336436031, 0.0603613019163447, 0.123470497330427,
0.172404586815834, 0.178024356626272, 0.151606226187945, 0.130227694458962,
0.117105708281994, 0.107832603356838, 0.0935153502613309, 0.081651206263304,
0.0713645335614684, 0.0545446672743561, 0, 0.00122455342249632,
0.00957195676775054, 0.0233009280455857, 0.0398901057214595,
0.069490838356018, 0.0753487069702148, 0.0619427798080445, 0.0388082119899989,
0.0282194718351961, 0.0223033058814705, 0.0181158699408174, 0.012206885059923,
0.00828045272134247, 0.00562572468560191, 0.00260434861259537,
0, 0.00337575118759914, 0.0123247819279197, 0.0212808990854769,
0.0292664165479362, 0.0407316533482074, 0.0457373328155279, 0.0440263413557409,
0.0350818961969019, 0.0268987657874823, 0.0206920115460456, 0.0160182394650579,
0.00970028643496338, 0.00590740063816313, 0.00360522091817113,
0.00134665597468616)), row.names = c(NA, 124L), class = "data.frame")
You could add distinct(ITER, .keep_all = TRUE)
library(tidyverse)
df %>%
group_by(ITER) %>%
summarise(
Rsq = cor(x = DV.obs, y = DV.sim)^2,
adjRsq = 1 - ((1 - Rsq) * (length(DV.obs) - 1) / (length(DV.obs) - nParam - 1))
) %>%
distinct(ITER, .keep_all = T)
#> `summarise()` has grouped output by 'ITER'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups: ITER [4]
#> ITER Rsq adjRsq
#> <int> <dbl> <dbl>
#> 1 1 0.113 0.113
#> 2 2 0.116 0.0858
#> 3 3 0.334 0.286
#> 4 4 0.268 0.187
The issue is that you get a value per row as your are using the nParam column to compute the adjusted R^2 without any aggregating operation. This could be fixed by using unique(nParam) to "aggregate" nParam to just one value per group:
library(dplyr)
df %>%
group_by(ITER) %>%
summarise(
Rsq = cor(x = DV.obs, y = DV.sim)^2,
adjRsq = 1 - ((1 - Rsq) * (n() - 1) / (n() - unique(nParam) - 1))
)
#> # A tibble: 4 × 3
#> ITER Rsq adjRsq
#> <int> <dbl> <dbl>
#> 1 1 0.113 0.113
#> 2 2 0.116 0.0858
#> 3 3 0.334 0.286
#> 4 4 0.268 0.187

Why am I getting the same result from Anova and aov_car in R, and why is it different from SPSS?

I am conducting a reanalysis of some data. The dv is continuous (beta value ie neural activity) and the iv is categorical (position) with three levels (1, 2, 3). Position is set as a factor. It is repeated measures, and there are 126 observations. The original analysis was done in SPSS, and I am trying to replicate those results with R.
I don't understand how to make a MRE of this, so my data from dput is at the bottom.
My ANOVA results are different from those reported in the original paper (the data is identical). Specifically, they reported F(2,82) = 18.262, p = 0.00, but my table (below) is totally different. I used the Anova function, and now get the impression that I should be using aov_car but the output is the same between the two.
> Anova(lm(Beta ~ Position, data = stack_ex))
Anova Table (Type II tests)
Response: Beta
Sum Sq Df F value Pr(>F)
Position 60.57 2 1.5213 0.2225
Residuals 2448.70 123
> aov_car(Beta ~ Position + Error(Beta), data = stack_ex)
Contrasts set to contr.sum for the following variables: Position
Anova Table (Type 3 tests)
Response: Beta
Effect df MSE F ges p.value
1 Position 2, 123 19.91 1.52 .024 .222
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
I didn't know what to use as the error term, so I put Beta. Is this the issue?
Here is the data, apologies for not being able to reduce it down.
> dput(stack_ex)
structure(list(Beta = c(9.97627322506813, 4.51007015616003, 12.5899137493145,
5.16107528902195, 0.69934803628816, 3.05576441003722, 9.73415586595716,
3.48253752326239, 8.72271400892749, 6.58254223482513, 9.64252595049282,
6.2575247824253, 5.55086088416984, 3.26575073266956, -0.189486641765607,
-6.34627220217585, 3.03699535774724, 5.38452950644857, 7.2247809046584,
1.05684383099248, 0.745997758871227, 13.4708766693015, 6.22313273382721,
7.60691743953363, 7.95869706610072, 0.0733745510036445, 5.74455260852637,
9.10243217750976, 3.83463985621549, 6.51540068169028, 6.74657874951813,
9.06748922888841, 4.18661204617864, 8.13865720827057, 4.97289378228525,
4.79399790512039, -12.5433736154914, 3.22520674616528, 4.83924807559523,
6.89780284608954, 2.01175994751707, 1.58936731656692, 8.65646845487533,
2.03332866864119, 6.59573013233866, 4.35624613417537, 3.22584501764675,
3.01812749198894, 8.67739700219412, 5.14273744714805, 7.54959191256081,
7.83244934217214, 8.67126128885367, 3.99955715822518, 2.95804569815409,
2.25327292671231, 0.258342636171449, -6.87648408967595, 1.9848049507549,
2.45033479610578, 7.41525416520838, 1.11896377050173, 0.0698315480648937,
9.90975895502056, 5.03717210651178, 4.67127493715398, 7.90306051043896,
3.0618932143297, 5.43781266582611, 8.9383987897543, 4.7982992164727,
6.90576740201611, 4.43862196057089, 9.06484925843098, 3.35645527138813,
5.42103905597134, 2.32859166774007, 3.65962841104834, -11.716124636774,
7.15256990819002, 4.02640955184303, 7.10747478179406, 2.81026958853589,
1.21494403713035, 9.06256308202033, 2.40170878761068, 6.45729748790901,
4.88232212084591, 1.55722661655526, 3.09556060018938, 6.6629967466337,
4.38848062553557, 4.38871083406173, 6.40367918458127, 6.361735558817,
4.21279189431753, 2.08838813524482, 2.21632202746396, -0.491401226521853,
-7.3685373528786, 2.12839354041543, 4.22958686769682, 4.25606944426722,
0.330400668298046, 1.02776552933976, 10.6734745608271, 3.01238218831987,
4.03318609054561, 6.45849154079659, 0.45593329021199, 5.76390726591623,
7.21202360734704, 4.62140561321984, 3.72714943200746, 5.49911004676976,
9.15658405382221, 3.25231083403689, 3.67627240704932, 3.48390458422993,
2.98674297337782, -19.5189775914798, 2.59812967326379, 2.78334604762499,
3.70635047793331, -0.223282095324164, 2.17552096286021), Position = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1",
"2", "3"), class = "factor")), row.names = c(NA, -126L), class = c("tbl_df",
"tbl", "data.frame"))

Multiple fixed effect levels missing in lmer from lme4

I am running mixed linear models using lmer from lme4. We are testing the effect of family, strain and temperature on several growth factors for brook trouts. I have 4 families (variable FAMILLE) form which we sampled our individuals. 2 are from the selected strain and 2 are from the control strain (variable Lignee). For each strain, the 2 families were either marked as resistant (Res) or sensible (Sens). So my fixed effect variable (FAMILLE), is nested in my variable Lignee. The expermiment was conducted at 3 different temperatures.
Here is what my dataframe looks like :
structure(list(BASSIN = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1",
"2", "3", "4"), class = "factor"), t.visee = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("15", "17", "19"), class = "factor"), FAMILLE = structure(c(2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L), .Label = c("RES", "SENS"), class = "factor"), Lignee = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("CTRL", "SEL"), class = "factor"), taux.croiss.sp.poids = c(0.8,
1.14285714285714, 1.42857142857143, 0.457142857142857, -0.228571428571429,
0.628571428571429, 0.971428571428571, 0.742857142857143, 1.08571428571429,
0.8, 0.571428571428571, 1.02857142857143, 0.8, 0.285714285714286,
0.285714285714286, 0.571428571428571, 0.742857142857143, 1.14285714285714,
0.628571428571429, 0.742857142857143, 1.02857142857143, 0.285714285714286,
0.628571428571429, 0.628571428571429, 0.857142857142857, 0.8,
1.08571428571429, 1.37142857142857, 0.742857142857143, 1.08571428571429,
0.0571428571428571, 0.571428571428571, 0.171428571428571, 0.8,
0.685714285714286, 0.285714285714286, 0.285714285714286, 0.8,
0.457142857142857, 1.02857142857143, 0.342857142857143, 0.742857142857143,
0.857142857142857, 0.457142857142857, 0.742857142857143, 1.25714285714286,
0.971428571428571, 0.857142857142857, 0.742857142857143, 0.514285714285714
)), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame"
))
Lignee has 2 levels (Sel and Ctrl)
FAMILLE has 2 levels (Sens and Res)
So I have 4 distinct levels :
Lignee Sel and FAMILLE Sens
Lignee Sel and FAMILLE Res
Lignee Ctrl and FAMILLE Sens
Lignee Ctrl and FAMILLE Res
when I run for example this line to test the effect of the variables on the rate of weight gain:
model6 <- lmer((taux.croiss.sp.poids) ~ t.visee + Lignee/FAMILLE + (1 |BASSIN), data = mydata1, REML = FALSE)
and then
summary(model6)
<Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: (taux.croiss.sp.poids) ~ t.visee + Lignee/FAMILLE + (1 | BASSIN)
Data: mydata1
AIC BIC logLik deviance df.resid
115.2 139.5 -50.6 101.2 228
Scaled residuals:
Min 1Q Median 3Q Max
-3.11527 -0.59489 0.05557 0.69775 2.79920
Random effects:
Groups Name Variance Std.Dev.
BASSIN (Intercept) 0.01184 0.1088
Residual 0.08677 0.2946
Number of obs: 235, groups: BASSIN, 4
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.770942 0.209508 194.702337 3.680 0.000302 ***
t.visee -0.019077 0.011682 231.005933 -1.633 0.103809
LigneeSEL 0.214062 0.054471 231.007713 3.930 0.000112 ***
LigneeCTRL:FAMILLESENS -0.008695 0.054487 231.038877 -0.160 0.873358
LigneeSEL:FAMILLESENS -0.205001 0.054242 231.016973 -3.779 0.000200 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) t.vise LgnSEL LCTRL:
t.visee -0.948
LigneeSEL -0.131 0.000
LCTRL:FAMIL -0.124 -0.007 0.504
LSEL:FAMILL 0.000 0.000 -0.498 0.000>
From what I can understand, the model chooses 1 family as the reference group, which won't be in the output. But the problem here is that 2 groups are missing :
LigneeCTRL:FAMILLERES
AND
LigneeSEL:FAMILLERES
Does somebody knows why my output is missing not ONE but TWO of the groups?
I'm french canadian so don't hesitate if some things are not clear, I will try to re-explain in other words!
Also, this is my 1st message on Stack, I tried to include everything needed but dont hesitate if I need to include some other things!
Thanks in advance

Error in Anova.III.lm(mod, error, singular.ok = singular.ok, ...) : there are aliased coefficients in the model

For my experiment, I have 3 independent variables: trial type, sex and gaming experience (all of which are categorical).
I have one dependent variable: proportion of correct trials (which is continuous).
When I tried running a 3-way ANOVA, the assumptions were not met, and so I used an aligned-rank transformation ANOVA.
m1 <- art(Proportioncorrect ~ Videogamefrequency + Biologicalsex + + Trialtype + Videogamefrequency:Biologicalsex + Videogamefrequency:Trialtype + Biologicalsex:Trialtype + Biologicalsex:Trialtype:Videogamefrequency, data = Gaming)
The model gave me the error:
Error in Anova.III.lm(mod, error, singular.ok = singular.ok, ...) :
there are aliased coefficients in the model
Could anyone give me a helping hand?
My data is here:
structure(list(ID = c("P_200214123342", "P_200224092247", "P_200219163622",
"P_200220130332", "P_200219091823", "P_200225184226", "P_200219123120",
"P_200219175102", "P_200214103155", "P_200219111605", "P_200217101213",
"P_200219102411", "P_200221101028", "P_200220145557", "P_200225171612",
"P_200224092247", "P_200219163622", "P_200220130332", "P_200214123342",
"P_200219091823", "P_200225184226", "P_200219123120", "P_200219175102",
"P_200214103155", "P_200219111605", "P_200217101213", "P_200219102411",
"P_200221101028", "P_200220145557", "P_200225171612"), Trialtype = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Beaconed",
"Probe"), class = "factor"), Proportioncorrect = c(0.729727660699102,
1.33933990048532, 0.729727660699102, 1.075862200454, 0.578378233982015,
1.16808048521424, 1.33933990048532, 1.13531397797248, 1.28700221758657,
1.13531397797248, 1.28700221758657, 1.13531397797248, 1.28700221758657,
1.28700221758657, 1.20358829695229, 0.297711691252463, 0.160690652951911,
0.147197653346961, 0.0667161517509908, 0.080085580033659, 0.160690652951911,
0.133731586046578, 0.214985569478799, 0.160690652951911, 0.269932799291976,
0.339836905918588, 0.242365851038963, 0.214985569478799, 0.677268408841807,
1.20358829695229), Videogamefrequency = structure(c(2L, 1L, 1L,
1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), .Label = c("Monthly",
"Never", "Weekly", "Yearly"), class = "factor"), Biologicalsex = structure(c(1L,
1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L), .Label = c("Female",
"Male"), class = "factor")), row.names = c(NA, -30L), class = "data.frame")

How to extract random effects and variance components from lme4 wrapped in dlply

This post How can I extract elements from lists of lists in R? answers some of my questions but that still doesn't quite work for me and what I need to do goes beyond my R knowledge.
I have data from field trials in 2 environments (=trials), 2 years and 5 traits of interest (defined by trait_id). GID is the unique line identifier. My model in lme4 is:
mods <- dlply(data,.(trial,trait_id),
function(d)
lmer(phenotype_value ~(1|GID)+(1|year)+(1|year:GID)+(1|year:rep),
na.action = na.omit,data=d))
Running this returns a large list of 10 elements and I would like to store the random effects for GID for all traits per trial in a data frame. I tried several things:
blup=lapply(mods,ranef, drop = FALSE)
blup1=blup[[1]]
blup2=blup1$GID
will give me a df with the random effects for one trait per trial, I was hoping for something more streamlined that will preserve some of info like $irrigation.GRYLD in the column names.
Here is a reproducible example with only two traits (GRYLD, PTHT), 2 years (11OBR, 12OBR), and two reps:
structure(list(GID = structure(c(1L, 2L, 3L, 4L, 5L, 5L, 1L,
2L, 4L, 3L, 1L, 2L, 3L, 4L, 5L, 5L, 1L, 2L, 4L, 3L, 1L, 2L, 3L,
4L, 5L, 5L, 2L, 1L, 4L, 3L, 1L, 2L, 3L, 4L, 5L, 5L, 2L, 1L, 4L,
3L, 1L, 2L, 3L, 4L, 5L, 5L, 1L, 2L, 4L, 3L, 1L, 2L, 3L, 4L, 5L,
5L, 1L, 2L, 4L, 3L, 1L, 2L, 3L, 4L, 5L, 5L, 2L, 1L, 4L, 3L, 1L,
2L, 3L, 4L, 5L, 5L, 2L, 1L, 4L, 3L), .Label = c("A", "B", "C",
"D", "E"), class = "factor"), year = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("11OBR",
"12OBR"), class = "factor"), trial = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("heat",
"irrigation"), class = "factor"), rep = c(1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), trait_id = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("GRYLD",
"PTHT"), class = "factor"), phenotype_value = c(3.93, 3.38, 1.65,
4.33, 2.45, 2.48, 3.98, 3.3, 4.96, 1.53, 87.5, 69.5, 65.5, 84.5,
77, 81, 94.5, 84.5, 89, 81, 6.56, 4.3, 5.76, 7.3, 5.73, 4.14,
5.93, 6.96, 8.43, 5.81, 114.5, 100, 104.5, 110, 110, 106, 99,
97.5, 105, 100, 0.119, 0.131, 0.681, 0.963, 0.738, 1.144, 0.194,
0.731, 0.895, 0.648, 35, 50, 45, 50, 45, 50, 55, 45, 50, 55,
2.79, 3.73, 3.96, 4.64, 5.03, 2.94, 3.78, 4.14, 3.89, 3.21, 90,
95, 105, 100, 105, 85, 95, 100, 100, 95)), .Names = c("GID",
"year", "trial", "rep", "trait_id", "phenotype_value"), class = "data.frame", row.names = c(NA,
-80L))
I'm not quite sure what you want as an output format, but how about:
all_ranef <- function(object) {
rr <- ranef(object)
ldply(rr,function(x) data.frame(group=rownames(x),x,check.names=FALSE))
}
ldply(mods,all_ranef)
## trial trait_id .id group (Intercept)
## 1 heat GRYLD year:GID 11OBR:A 7.935352e-01
## 2 heat GRYLD year:GID 11OBR:B 1.960487e-01
## 3 heat GRYLD year:GID 11OBR:C -1.504116e+00
## ...
## 82 irrigation PTHT year:rep 12OBR:2 -1.595022e+00
## 83 irrigation PTHT year 11OBR 2.915033e+00
## 84 irrigation PTHT year 12OBR -2.915033e+00
this works reasonably well because all of your random effects are intercept-only. If you had some random-slopes terms in the models you might either want to reshape2:::melt() the individual random effects, or use rbind.fill() to combine data frames with different random-effects columns.
library("ggplot2"); theme_set(theme_bw())
ggplot(vals, aes(y=group,x=`(Intercept)`))+
geom_point(aes(colour=interaction(trial,trait_id)))+
facet_wrap(~.id,scale="free")
By the way, it's usually inadvisable to use a factor with only 2 levels (YEAR) as a grouping variable ...

Resources