Random effects model in R - error - r

I am running econometric model with panel data in R. I am using plm package and pooled model and fixed effects model works great. But I get this error when trying to do random effects model and I don't know how to fix it.
There is my whole dataset and code:
auto <- structure(list(Country = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L,
6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L,
10L, 10L, 10L, 11L, 11L, 11L, 11L), .Label = c("Bahrain", "Cuba",
"China", "Kuwait", "Lao PDR", "Qatar", "Saudi Arabia", "Swaziland",
"Syria", "United Arab Emirates", "Vietnam"), class = "factor"),
Year = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L), .Label = c("1971", "1981", "1991", "2001"
), class = "factor"), AVG_GR_. = c(2.44, -2.93, 1.77, -1.04,
3.17, 3.5, -1.59, 5.13, 4.29, 7.51, 9.42, 9.83, -7.39, -5.52,
10.72, -0.14, 1.77, 3.38, 3.68, 5.33, -1.55, -5.72, 4.64,
1.5, 6.06, -5.25, 0.54, 2.28, 6.99, 2.82, 0.82, 1.12, 6.72,
-2, 3.09, 2.15, -1.06, -4.88, 0.2, -6.04, 1.61, 3.21, 5.88,
6.24), GDP_PC = c(17444.65, 19550.76, 15970.05, 18212.71,
2067.93, 3127.98, 3221.25, 3081.73, 153.5, 231.14, 491.26,
1207.52, 70184.35, 23911.92, 9559.35, 27681.03, 162.06, 212.46,
261.98, 386.38, 72617.74, 55370.39, 31970, 51090.02, 13752.55,
21124.79, 12891.51, 12446.49, 881.75, 1595.82, 1995.8, 2191.36,
738.63, 1349.2, 1057.84, 1380.2, 88377.72, 75348.77, 43306.13,
45038.43, 164.15, 194.45, 267.17, 481.92), POP_. = c(5.39,
3.26, 3.03, 6.49, 1.22, 0.75, 0.5, 0.13, 1.91, 1.71, 0.95,
0.6, 6.22, 4.16, -0.66, 4.61, 1.93, 2.7, 2.42, 1.73, 7.44,
7.9, 2.23, 11.57, 5.43, 5.12, 2.2, 3.08, 3.07, 3.64, 2.12,
1.16, 3.45, 3.35, 2.77, 2.78, 15.96, 5.94, 5.3, 10.95, 2.29,
2.3, 1.62, 0.97), CONSUMP_. = c(64.21, 52.81, 51.47, 40.51,
54.58, 54.96, 62.74, 54.02, 51.72, 51.01, 45.63, 39, 27.44,
48.61, 49.76, 35.74, 90.19, 90.65, 89.15, 70.38, 21.33, 26.27,
26.84, 16.81, 22.96, 46.85, 44.2, 31.61, 54.77, 74.9, 80.42,
79.36, 67.09, 69.71, 69.92, 61.26, 15.28, 33.07, 46.79, 59.97,
90, 89.89, 73.9, 65.33), GOV_CON_. = c(11.1, 19.55, 19.21,
14.27, 31.67, 31.66, 29.47, 34.91, 12.99, 14.11, 14.53, 14.1,
12.04, 23.7, 48.98, 18.45, 8.05, 8.29, 7.21, 8.96, 20.47,
36.49, 31.09, 14.5, 16.02, 30.12, 26.94, 22.53, 19.07, 17.11,
17.65, 14.76, 19.93, 19.6, 12.75, 12.67, 10.87, 19.27, 16.99,
7.66, 6.73, 6.85, 7.46, 6.19), CAP_FORM_. = c(34.15, 32.51,
24.24, 26.56, 25.94, 25.49, 10.76, 10.7, 34.57, 35.19, 37.79,
42.21, 13.55, 18.68, 17.9, 17.28, 7.57, 10.24, 16.68, 30.28,
22.49, 18.37, 26.13, 36.58, 22.59, 22.7, 20.49, 23.68, 30.77,
21.42, 17.65, 14.55, 25.34, 20.68, 22.53, 23.48, 29.93, 26.28,
27.29, 22.63, 14.45, 14.46, 25.22, 36.44), NAT_RES_. = c(27.42,
20.18, 17.52, 23.34, 1.81, 1.87, 2.5, 3.42, 41.09, 38.83,
40.09, 17.91, 66.53, 41.25, 35.94, 48.41, 5.28, 4.2, 3.01,
10.15, 63.5, 40.84, 39.7, 54.17, 57.89, 31.24, 32.74, 42.77,
6.47, 3.64, 2.25, 1.32, 9.55, 9.14, 14.19, 22.92, 51.04,
37.08, 27.99, 31.36, 3.95, 4.17, 8.39, 13.57), TRADE = c(1.69,
1.48, 1.37, 1.34, 0.77, 0.76, 0.33, 0.34, 0.11, 0.21, 0.35,
0.58, 1.03, 0.99, 1.09, 0.9, 0.15, 0.23, 0.63, 0.57, 0.95,
0.82, 0.85, 0.91, 0.89, 0.76, 0.66, 0.8, 1.47, 1.54, 1.42,
1.62, 0.51, 0.44, 0.66, 0.71, 1.1, 0.97, 1.37, 1.23, 0.62,
0.62, 0.86, 1.43), INFL_. = c(13.26, 3.24, 1.64, 5.65, 5.22,
0.11, 5.49, 2.44, 1.17, 5.72, 6.85, 4.2, 31.52, -0.47, 3.25,
7.29, 43.86, 56.9, 32.37, 7.95, 20.84, -1.59, 3.18, 8.65,
26.67, -1.16, 2.4, 5.73, 10.71, 11.36, 10.97, 8.04, 11.62,
17.43, 6.74, 6.78, 28.31, 1.25, 2.03, 6.94, 7.05, 156.6,
18.99, 9.45), LIFE_EXP = c(67.39, 71.47, 73.66, 75.55, 72.28,
74.46, 75.6, 77.81, 65.7, 68.43, 70.64, 73.99, 68.17, 71.25,
72.92, 73.79, 47.79, 51.39, 58.38, 64.68, 71.16, 74.31, 76.18,
77.53, 58.65, 66.77, 71.16, 74.03, 51.33, 57.45, 54.96, 46.81,
63.01, 68.42, 72.03, 74.56, 65.49, 70.19, 73.24, 75.66, 62.69,
69.09, 72.28, 74.66), EDU_T = c(0.68, 1.59, 2.63, 3.14, 0.75,
1.46, 2.81, 3.84, 0.37, 0.62, 1.08, 1.71, 1.41, 2.71, 3.53,
3.54, 0.16, 0.35, 0.65, 1, 1.61, 2.11, 2.5, 3.06, 1.06, 1.44,
2.13, 2.66, 0.35, 0.74, 1.07, 0.91, 0.34, 0.74, 1.27, 1.3,
1.14, 1.65, 2.61, 3.85, 0.67, 1.21, 0.67, 1.54)), .Names = c("Country",
"Year", "AVG_GR_.", "GDP_PC", "POP_.", "CONSUMP_.", "GOV_CON_.",
"CAP_FORM_.", "NAT_RES_.", "TRADE", "INFL_.", "LIFE_EXP", "EDU_T"
), row.names = c(1L, 2L, 3L, 4L, 9L, 10L, 11L, 12L, 5L, 6L, 7L,
8L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 46L, 47L, 48L), class = c("plm.dim", "data.frame"
))
Y <- cbind(auto$AVG_GR_.)
X <- cbind(auto$GDP_PC, auto$POP_., auto$CONSUMP_., auto$GOV_CON_.,
auto$CAP_FORM_., auto$NAT_RES_., auto$TRADE, auto$INFL_.,
auto$LIFE_EXP, auto$EDU_T)
pdata <- plm.data(auto, c("Country", "Year"))
random <- plm(Y~X, data=pdata, model="random")
Everything is OK until the last row. I get this error:
Error in if (sigma2$id < 0) stop(paste("the estimated variance of the", :
missing value where TRUE/FALSE needed
Thanks for your help :)

I am looking for help, but solved your problem. The first column has row. Names automatically filled in. You need to delete first column.
This worked:
> pdata <- pdata[,2:13];
> random <- plm(Y~X, data=pdata, model="random")
Just replace last row of your code with the above two lines.

Related

How can I get all model estimates automatically in R?

I know I can calculate models' estimates by hand, but I'm sure there's a way to get all model estimates for all categorical levels automatically. Since I'm dealing with lmers, maybe this should be suitable. Note: I don't want to predict new data, I just wanna get all estimates automatically. (just edited the post to make it easier to understand)
an example:
> model <- lmer(Score ~ Proficiency_c * testType + (1|ID), data = myData, REML = F)
> summary(model)
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 4.8376 0.2803 156.9206 17.259 < 2e-16 ***
Proficiency_c -1.3381 0.4405 156.9206 -3.038 0.00279 **
testTypeTestB 0.2088 0.3269 126.0000 0.639 0.52421
testTypeTestC 0.4638 0.3269 126.0000 1.418 0.15853
Proficiency_c:testTypeTestB 0.5008 0.5138 126.0000 0.975 0.33157
Proficiency_c:testTypeTestC 0.2357 0.5138 126.0000 0.459 0.64727
---
> contrasts(myData$testType)
TestB TestC
TestA 0 0
TestB 1 0
TestC 0 1
'by hand', I would:
## estimate for Test A:
y = b0 + b1x1 + b2x2 + b3x3 + b4(x1 * x2) + b5(x1 * x3)
y = b0 + b1 * 1 + 0 + 0 + 0
y = b0 + b1
y = 3.49
## estimate for Test B:
y = b0 + b1x1 + b2x2 + b3x3 + b4(x1 * x2) + b5(x1 * x3)
y = b0 + b1 * 1 + b2 * 1 + 0 + b4(1 * 1) + 0
y = b0 + b2 + (b1 + b4)x1
y = 4.20
## estimate for Test C:
y = b0 + b1x1 + b2x2 + b3x3 + b4(x1 * x2) + b5(x1 * x3)
y = b0 + b1 * 1 + b2 * 0 + b3 * 1 + 0 + b5 (1 * 1)
y = b0 + b3 + (b1 + b5)x1
y = 4.19
edited question
I usually deal with people who don't know how to come up with the model's estimates by themselves, so I usually have to calculate them all 'by hand'. I just wish there was a way to get all 'ys' estimates concerning each categorical level (as I did 'by hand' above) without doing that manually? Again, for now, I don't want to predict new values. Thanks in advance!
data:
dput(myData)
structure(list(ID = c("p1", "p1", "p1", "p2", "p2", "p2", "p3",
"p3", "p3", "p4", "p4", "p4", "p5", "p5", "p5", "p6", "p6", "p6",
"p7", "p7", "p7", "p8", "p8", "p8", "p9", "p9", "p9", "p10",
"p10", "p10", "p11", "p11", "p11", "p12", "p12", "p12", "p13",
"p13", "p13", "p14", "p14", "p14", "p15", "p15", "p15", "p16",
"p16", "p16", "p17", "p17", "p17", "p18", "p18", "p18", "p19",
"p19", "p19", "p20", "p20", "p20", "p21", "p21", "p21", "p22",
"p22", "p22", "p23", "p23", "p23", "p24", "p24", "p24", "p25",
"p25", "p25", "p26", "p26", "p26", "p27", "p27", "p27", "p28",
"p28", "p28", "p29", "p29", "p29", "p30", "p30", "p30", "p31",
"p31", "p31", "p32", "p32", "p32", "p33", "p33", "p33", "p34",
"p34", "p34", "p35", "p35", "p35", "p36", "p36", "p36", "p37",
"p37", "p37", "p38", "p38", "p38", "p39", "p39", "p39", "p40",
"p40", "p40", "p41", "p41", "p41", "p42", "p42", "p42", "p43",
"p43", "p43", "p44", "p44", "p44", "p45", "p45", "p45", "p46",
"p46", "p46", "p47", "p47", "p47", "p48", "p48", "p48", "p49",
"p49", "p49", "p50", "p50", "p50", "p51", "p51", "p51", "p52",
"p52", "p52", "p53", "p53", "p53", "p54", "p54", "p54", "p55",
"p55", "p55", "p56", "p56", "p56", "p57", "p57", "p57", "p58",
"p58", "p58", "p59", "p59", "p59", "p60", "p60", "p60", "p61",
"p61", "p61", "p62", "p62", "p62", "p63", "p63", "p63"), Score = c(5.33,
5.05, 5.15, 5.82, 2.29, 7.54, 4.46, 2.43, 1.53, 8.97, 7.69, 7.21,
6.76, 8.41, 3.77, 3.33, 11.57, 7.69, 2.15, 3.84, 3.29, 3.36,
6.66, 5.6, 4.23, 4.41, 3.07, 2.29, 4.9, 4.46, 3.22, 1.72, 2.08,
4.47, 2.4, 2.54, 2.73, 6.57, 7.31, 4.46, 9.27, 4.31, 4.54, 6.32,
8.97, 3.44, 4.68, 9.7, 2.15, 5.68, 5.26, 9.3, 5.68, 8.97, 4.65,
4.13, 4.57, 11.22, 11.39, 7.52, 3.94, 4.47, 3.52, 5, 8, 5.81,
2.96, 4.05, 2.22, 4.41, 5.64, 4.79, 2.43, 2.5, 4.16, 7.57, 9.21,
2.59, 3.12, 3.84, 7.76, 8.77, 5.08, 7.81, 4.49, 2.17, 7.4, 5.81,
4.9, 3.19, 3.2, 2.72, 3.67, 4.42, 3.57, 1.02, 4.42, 2.45, 5.88,
7.84, 4.93, 9.61, 3.75, 1.8, 3.47, 0.65, 1.39, 2.9, 6.36, 2.77,
2.67, 6.89, 6.74, 6.81, 1.94, 3.22, 3.12, 4.08, 5.31, 11.23,
4.1, 4.28, 3.89, 2.98, 3.52, 3.64, 3.63, 5.08, 4.9, 6.66, 7.56,
3.14, 5.26, 1.03, 4.58, 2.9, 2.5, 3.57, 4, 7.54, 3.5, 5.19, 2.56,
2.38, 1.4, 3.97, 2, 8.69, 5.33, 6.42, 3.62, 2.59, 4.63, 4.85,
6.87, 5.55, 3.14, 2.29, 4.68, 7.76, 3.53, 8.88, 3.44, 8, 5.15,
6.77, 12.28, 6.25, 4.91, 7.01, 7.4, 5.21, 3, 4.87, 7.5, 5.47,
8.97, 7.89, 7.54, 9.25, 7.24, 5.37, 6.41, 2.94, 5.47, 7.14, 5.4,
5.06, 6.32), Proficiency_c = c(0.44, 0.44, 0.44, 0.69, 0.69,
0.69, 1.24, 1.24, 1.24, -0.16, -0.16, -0.16, 1.14, 1.14, 1.14,
0.69, 0.69, 0.69, -0.26, -0.26, -0.26, 0.94, 0.94, 0.94, -0.26,
-0.26, -0.26, 1.04, 1.04, 1.04, 0.39, 0.39, 0.39, -0.06, -0.06,
-0.06, -0.41, -0.41, -0.41, 0.54, 0.54, 0.54, -0.51, -0.51, -0.51,
-0.81, -0.81, -0.81, 0.14, 0.14, 0.14, -0.31, -0.31, -0.31, 0.44,
0.44, 0.44, -0.11, -0.11, -0.11, -0.21, -0.21, -0.21, -0.51,
-0.51, -0.51, 0.24, 0.24, 0.24, 0.59, 0.59, 0.59, -0.21, -0.21,
-0.21, -0.66, -0.66, -0.66, -0.06, -0.06, -0.06, -1.01, -1.01,
-1.01, -0.26, -0.26, -0.26, 0.19, 0.19, 0.19, 0.84, 0.84, 0.84,
-0.11, -0.11, -0.11, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.79,
0.79, 0.79, 1.09, 1.09, 1.09, -0.76, -0.76, -0.76, 0.14, 0.14,
0.14, 0.64, 0.64, 0.64, 0.49, 0.49, 0.49, -0.71, -0.71, -0.71,
-0.31, -0.31, -0.31, -0.11, -0.11, -0.11, -0.61, -0.61, -0.61,
0.19, 0.19, 0.19, -0.36, -0.36, -0.36, -0.31, -0.31, -0.31, -1.01,
-1.01, -1.01, 1.19, 1.19, 1.19, -0.96, -0.96, -0.96, 0.99, 0.99,
0.99, 0.74, 0.74, 0.74, 0.24, 0.24, 0.24, -0.06, -0.06, -0.06,
-0.31, -0.31, -0.31, -0.66, -0.66, -0.66, -0.96, -0.96, -0.96,
0.89, 0.89, 0.89, -0.96, -0.96, -0.96, -1.01, -1.01, -1.01, -0.66,
-0.66, -0.66, -0.71, -0.71, -0.71, -0.36, -0.36, -0.36), testType = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("TestA",
"TestB", "TestC"), class = "factor")), row.names = c(NA, -189L
), class = c("tbl_df", "tbl", "data.frame"))
I'm not sure why you're calculating predictions at a reference proficiency of 1 (0 would be the default), but maybe you're looking for emmeans?
library(emmeans)
emmeans(model, ~testType, at = list(Proficiency_c=1))
The at = argument is the way to specify in emmeans that we want to calculate marginal means with the non-focal parameters (Proficiency_c in this case) set to a value other than the default [typically the mean of a numeric covariate]. See vignette("basics", package = "emmeans") (emmeans has many high-quality vignettes). It's specified as a list because we may have multiple non-focal parameters to set.
Results:
NOTE: Results may be misleading due to involvement in interactions
testType emmean SE df lower.CL upper.CL
TestA 3.50 0.529 162 2.45 4.54
TestB 4.21 0.529 162 3.16 5.25
TestC 4.20 0.529 162 3.15 5.24
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
If you're looking for the estimated slope within each test type, use emtrends:
emtrends(model, ~testType, "Proficiency_c")
testType Proficiency_c.trend SE df lower.CL upper.CL
TestA -1.338 0.448 162 -2.22 -0.4541
TestB -0.837 0.448 162 -1.72 0.0467
TestC -1.102 0.448 162 -1.99 -0.2185
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95

Creating a plot for each column of a dataframe and create a list of plots

I want to create a QQplot for each column in a dataframe then return as a list of plots
A small section of my df
structure(list(LAB.ID = c(4L, 3L, 8L, 7L, 4L, 5L, 2L, 2L, 3L,
10L, 5L, 12L, 7L, 12L, 7L, 10L, 2L, 8L, 5L, 12L, 4L, 8L, 10L,
3L, 4L, 5L, 10L, 3L, 7L, 5L, 8L, 3L, 12L, 4L, 2L, 2L, 10L, 3L,
4L, 8L, 2L, 5L, 10L, 12L, 7L, 7L, 8L, 12L), Fe = c(56.39, 56.83,
56.382, 56.48, 56.43, 56.32, 55.94, 55.7, 56.54, 56.3, 56.29,
56.11, 56.4, 56.46, 56.54, 56.5, 56.59, 56.258, 56.31, 56.1,
56.53, 56.442, 56.2, 56.18, 56.31, 56.32, 56.5, 56.5, 56.43,
56.39, 56.258, 56.51, 56.35, 56.47, 56.5, 55.98, 56.7, 56.34,
56.35, 56.532, 55.93, 56.32, 56.5, 56.36, 56.73, 56.62, 56.264,
56.37), SiO2 = c(7.67, 7.84, 7.936, 7.77, 7.74, 7.91, 7.63, 7.65,
7.69, 7.872684992, 7.84, 7.64, 7.83, 7.71, 7.76, 7.851291827,
7.73, 7.685, 7.96, 7.71, 7.62, 7.863, 7.872684992, 7.59, 7.81,
7.87, 7.722932832, 7.77, 7.78, 7.84, 7.838, 7.74, 7.65, 7.66,
7.67, 7.67, 7.680146501, 7.64, 7.8, 7.828, 7.67, 7.92, 7.615967003,
7.82, 7.65, 7.74, 7.767, 7.68), Al2O3 = c(2, 2.01, 2.053, 1.88,
2.03, 2.02, 2.01, 2.02, 2.01, 2.002830415, 2.02, 2.09, 1.9, 2.05,
1.89, 2.021725042, 2.03, 2.044, 2.05, 1.96, 1.99, 2.041, 2.021725042,
2, 2.01, 2.03, 1.983935789, 2.02, 1.88, 2.02, 2.038, 2.02, 2.09,
2.01, 2.01, 2.02, 2.002830415, 2.03, 2.01, 2.008, 2, 2.03, 2.021725042,
2.06, 1.88, 1.87, 2.02, 2.02)), row.names = c(NA, -48L), class = "data.frame")
I have the following code
library(purr)
qqplots <- imap(df[-1], ~{
ggplot(df[-1], aes(sample = .y)) + # Create QQplot with ggplot2 package
ggtitle(paste0(.y, " Q-Q Plot")) +
theme(plot.title = element_text(hjust = 0.5)) +
ylab('Grade %')+
stat_qq() +
stat_qq_line(col = "red", lwd = 0.5)
})
which produces many plots like below
but what I am expecting is something like this
what am I doing wrong?
You can use :
library(ggplot2)
qqplots <- purrr::imap(df[-1], ~{
ggplot(df, aes(sample = .data[[.y]])) + # Create QQplot with ggplot2 package
ggtitle(paste0(.y, " Q-Q Plot")) +
theme(plot.title = element_text(hjust = 0.5)) +
ylab('Grade %')+
stat_qq() +
stat_qq_line(col = "red", lwd = 0.5)
})
Or with lapply :
qqplots <- lapply(names(df)[-1], function(x) {
ggplot(df, aes(sample = .data[[x]])) +
ggtitle(paste0(x, " Q-Q Plot")) +
theme(plot.title = element_text(hjust = 0.5)) +
ylab('Grade %')+
stat_qq() +
stat_qq_line(col = "red", lwd = 0.5)
})

How to show the results of a Tukey test with boxplots showing CLD letters

I have collected data on 216 individuals. I measured the concentration of the same 7 Substances in each individual, represented by Sub1:Sub7. The concentration of these Substances may be different in individuals from different Locations. I am interested in the level of refinement at which these individuals can be classified into groups based on their concentrations of these substances. I am also interested in seeing how these Substances may be correlated with each other, as the concentration of some may effect the concentration of others. Each Individual in my data set is represented by a unique ID number. Three "nested" grouping variables (Location, State, and Region) can be used to separate these individuals. Multiple Locations are in each State, and multiple States are part of larger Regions. For instance, the individuals in the Locations: APNG, BLEA, and NEAR are all in FL, while the individuals in the Locations: CACT, OYLE, and PIY are all in GA. The states FL and GA are both in Region A. I used this function to conduct an anova:
library(tidyverse)
library(multicomp)
library(multicompView)
tests <- list()
Groups <- c(1:3)
Variables <- 6:12
for(i in Groups){
Group <- as.factor(data[[i]])
for(j in Variables)
{
test_name <- paste0(names(data)[j], "_by_", names(data[i]))
Response <- data[[j]]
sublist <- list()
sublist$aov <- aov(Response ~ Group)
sublist$tukey <- TukeyHSD(sublist$aov)
sublist$multcomp <- multcompLetters(extract_p(sublist$tukey$Group))
tests[[test_name]] <- sublist
}
}
#i can access the results like this:
lapply(tests, function(x) summary(x$aov))
#and access the compact letter display results like this:
lapply(tests, function(x) x$multcomp)
using the object tests, how can I tell R to create boxplots of the TukeyHSD results and show the CLD letters and paste the plots onto a pdf?
This website: r-graph-gallery.com/84-tukey-test.html explains how to do this, but I cannot get it to work with the object tests.
here is my data:
> dput(data)
structure(list(Region = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"),
State = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 10L, 10L, 10L,
10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L), .Label = c("DE", "FL", "GA", "MA",
"MD", "ME", "NC", "NH", "NY", "SC", "VA", "VT"), class = "factor"),
Location = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 14L, 14L, 14L, 14L, 14L,
14L, 14L, 14L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L,
17L, 17L, 17L, 17L, 17L, 20L, 20L, 20L, 20L, 20L, 20L, 22L,
22L, 22L, 22L, 22L, 22L, 22L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L,
19L, 19L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 21L
), .Label = c("APNG", "BATO", "BLEA", "CACT", "CHAG", "CHOG",
"COTR", "DTU", "HAB", "LOP", "MASV", "NEAR", "NGUP", "OYLE",
"PIRT", "PIY", "PKE", "PONO", "PPP", "ROG", "VONG", "YENQ"
), class = "factor"), Sex = structure(c(1L, 1L, 1L, 2L, 1L,
1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L,
1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L,
2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
2L), .Label = c("F", "M"), class = "factor"), ID = 1:216,
Sub1 = c(0.03, 0.03, 0.03, 0.04, 0.04, 0.03, 0.03, 0.03,
0.03, 0.03, 0.04, 0.03, 0.04, 0.03, 0.03, 0.03, 0.02, 0.04,
0.03, 0.03, 0.03, 0.02, 0.04, 0.04, 0.02, 0.03, 0.02, 0.03,
0.05, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03,
0.03, 0.03, 0.04, 0.03, 0.04, 0.06, 0.03, 0.03, 0.03, 0.03,
0.02, 0.03, 0.03, 0.03, 0.04, 0.03, 0.02, 0.02, 0.04, 0.03,
0.04, 0.03, 0.03, 0.03, 0.05, 0.03, 0.03, 0.04, 0.03, 0.02,
0.04, 0.02, 0.03, 0.02, 0.02, 0.04, 0.03, 0.02, 0.03, 0.03,
0.05, 0.04, 0.03, 0.02, 0.03, 0.05, 0.02, 0.04, 0.03, 0.05,
0.03, 0.04, 0.02, 0.03, 0.02, 0.03, 0.03, 0.03, 0.02, 0.05,
0.03, 0.03, 0.04, 0.02, 0.02, 0.04, 0.05, 0.03, 0.03, 0.02,
2.03, 2.03, 2.03, 2.04, 2.04, 2.03, 2.03, 2.03, 2.03, 2.03,
2.04, 2.03, 2.04, 2.03, 2.03, 2.03, 2.02, 2.04, 2.03, 2.03,
2.03, 2.02, 2.04, 2.04, 2.02, 2.03, 2.02, 2.03, 2.05, 2.03,
2.03, 2.03, 2.03, 2.03, 2.03, 2.03, 2.03, 2.03, 2.03, 2.03,
2.04, 2.03, 2.04, 2.06, 2.03, 2.03, 2.03, 2.03, 2.02, 2.03,
2.03, 2.03, 2.04, 2.03, 2.02, 2.02, 2.04, 2.03, 2.04, 2.03,
2.03, 2.03, 2.05, 2.03, 2.03, 2.04, 2.03, 2.02, 2.04, 2.02,
2.03, 2.02, 2.02, 2.04, 2.03, 2.02, 2.03, 2.03, 2.05, 2.04,
2.03, 2.02, 2.03, 2.05, 2.02, 2.04, 2.03, 2.05, 2.03, 2.04,
2.02, 2.03, 2.02, 2.03, 2.03, 2.03, 2.02, 2.05, 2.03, 2.03,
2.04, 2.02, 2.02, 2.04, 2.05, 2.03, 2.03, 2.02), Sub2 = c(0.69,
1.28, 1.27, 2.25, 1.05, 1.76, 1.57, 1.09, 0.68, 1.35, 0.85,
1.55, 0.12, 0, 0.58, 1.13, 0.1, 1.9, 0.54, 1.48, 0.8, 0.52,
1.76, 1.77, 1.24, 0.63, 0.63, 0.57, 0.63, 0.53, 1.32, 1.79,
1.16, 1.11, 1.1, 1.92, 1.06, 1.18, 0.43, 0.67, 0.75, 2.37,
3.93, 0.3, 2.8, 1.25, 0.9, 1.32, 0.5, 0.4, 0.72, 0.34, 0.12,
0.89, 0.69, 1.13, 1.22, 0.88, 4.13, 1.27, 0.62, 2.9, 2.42,
0.9, 0.4, 1.29, 1.61, 0.3, 1.47, 0.36, 1.27, 0.84, 1.81,
0.18, 0.47, 1.01, 0.85, 0.59, 1.73, 0.72, 0.5, 0.83, 0.9,
0.81, 0.59, 2.84, 2.24, 2.68, 1.18, 1.36, 0.84, 1.79, 1.01,
0.34, 0.41, 2.22, 0.51, 0.42, 1.26, 2.26, 1.79, 1.43, 1.3,
1.8, 2.21, 1.65, 2.39, 0.31, 2.69, 3.28, 3.27, 4.25, 3.05,
3.76, 3.57, 3.09, 2.68, 3.35, 2.85, 3.55, 2.12, 2, 2.58,
3.13, 2.1, 3.9, 2.54, 3.48, 2.8, 2.52, 3.76, 3.77, 3.24,
2.63, 2.63, 2.57, 2.63, 2.53, 3.32, 3.79, 3.16, 3.11, 3.1,
3.92, 3.06, 3.18, 2.43, 2.67, 2.75, 4.37, 5.93, 2.3, 4.8,
3.25, 2.9, 3.32, 2.5, 2.4, 2.72, 2.34, 2.12, 2.89, 2.69,
3.13, 3.22, 2.88, 6.13, 3.27, 2.62, 4.9, 4.42, 2.9, 2.4,
3.29, 3.61, 2.3, 3.47, 2.36, 3.27, 2.84, 3.81, 2.18, 2.47,
3.01, 2.85, 2.59, 3.73, 2.72, 2.5, 2.83, 2.9, 2.81, 2.59,
4.84, 4.24, 4.68, 3.18, 3.36, 2.84, 3.79, 3.01, 2.34, 2.41,
4.22, 2.51, 2.42, 3.26, 4.26, 3.79, 3.43, 3.3, 3.8, 4.21,
3.65, 4.39, 2.31), Sub3 = c(1.32, 0.19, 0.27, 0.73, 0.41,
0.37, 0.89, 1.35, 0.49, 1.32, 0.69, 0, 0.57, 0.24, 0.23,
0.71, 0, 0, 0, 0.58, 0.32, 1.1, 0.45, 0.61, 0.38, 0.3, 0.01,
0.06, 0.48, 0.62, 0.64, 1.96, 0.61, 0.43, 0.25, 0.34, 0.17,
0.57, 0.1, 0.6, 1.07, 0.44, 0.12, 0.55, 0.08, 0.56, 0.59,
0.66, 0.44, 0.58, 0.75, 0.99, 0.77, 0.57, 0.35, 0.18, 0.16,
0.31, 0.04, 0.17, 0.46, 0.19, 0.8, 0.61, 1.14, 0.3, 0.08,
0.25, 0.78, 1.07, 0.38, 0.17, 0.42, 0.48, 0.55, 0.74, 2.98,
1.96, 0.51, 0.63, 0, 0.52, 0.32, 0.23, 0.31, 0.09, 0.06,
0.26, 0.23, 0.58, 1.49, 0.46, 0.33, 0.37, 1.16, 0.91, 0.41,
0.72, 0.2, 0.84, 0.71, 0.56, 0.34, 0.68, 0.81, 0.52, 0.78,
0.19, 3.32, 2.19, 2.27, 2.73, 2.41, 2.37, 2.89, 3.35, 2.49,
3.32, 2.69, 2, 2.57, 2.24, 2.23, 2.71, 2, 2, 2, 2.58, 2.32,
3.1, 2.45, 2.61, 2.38, 2.3, 2.01, 2.06, 2.48, 2.62, 2.64,
3.96, 2.61, 2.43, 2.25, 2.34, 2.17, 2.57, 2.1, 2.6, 3.07,
2.44, 2.12, 2.55, 2.08, 2.56, 2.59, 2.66, 2.44, 2.58, 2.75,
2.99, 2.77, 2.57, 2.35, 2.18, 2.16, 2.31, 2.04, 2.17, 2.46,
2.19, 2.8, 2.61, 3.14, 2.3, 2.08, 2.25, 2.78, 3.07, 2.38,
2.17, 2.42, 2.48, 2.55, 2.74, 4.98, 3.96, 2.51, 2.63, 2,
2.52, 2.32, 2.23, 2.31, 2.09, 2.06, 2.26, 2.23, 2.58, 3.49,
2.46, 2.33, 2.37, 3.16, 2.91, 2.41, 2.72, 2.2, 2.84, 2.71,
2.56, 2.34, 2.68, 2.81, 2.52, 2.78, 2.19), Sub4 = c(0.63,
0.05, 0.2, 0.41, 0.43, 0.54, 0.26, 0.78, 0.13, 0.8, 0.47,
0.65, 0, 0.22, 0.45, 0.85, 0.47, 0, 0.62, 0.59, 0.14, 0.8,
0.9, 0.88, 0.56, 0.56, 0.47, 0.24, 0.62, 1.77, 0.56, 0.99,
0.21, 0.9, 0.62, 0.58, 0.41, 0.97, 0.2, 0.9, 0.68, 0.52,
0.14, 1.27, 0.63, 0.51, 0.12, 0.61, 0.31, 0.43, 0.62, 1.18,
0.95, 0.59, 0.39, 0.26, 0.53, 0.77, 0.4, 0.39, 0, 0.19, 0.82,
1.1, 0.46, 0.25, 0.29, 0.2, 2.01, 0.36, 0.62, 0.54, 0.48,
0.87, 0.66, 1.46, 2.59, 1.37, 1.28, 0.99, 0.71, 0.32, 0.64,
0.66, 0.47, 0.48, 0.38, 0.67, 0.18, 1.02, 0.54, 0.53, 0.25,
0.43, 1.02, 0.58, 0.58, 0.48, 0.2, 0.7, 0.38, 0.28, 0.65,
1.21, 1.03, 0.38, 0.6, 0.44, 2.63, 2.05, 2.2, 2.41, 2.43,
2.54, 2.26, 2.78, 2.13, 2.8, 2.47, 2.65, 2, 2.22, 2.45, 2.85,
2.47, 2, 2.62, 2.59, 2.14, 2.8, 2.9, 2.88, 2.56, 2.56, 2.47,
2.24, 2.62, 3.77, 2.56, 2.99, 2.21, 2.9, 2.62, 2.58, 2.41,
2.97, 2.2, 2.9, 2.68, 2.52, 2.14, 3.27, 2.63, 2.51, 2.12,
2.61, 2.31, 2.43, 2.62, 3.18, 2.95, 2.59, 2.39, 2.26, 2.53,
2.77, 2.4, 2.39, 2, 2.19, 2.82, 3.1, 2.46, 2.25, 2.29, 2.2,
4.01, 2.36, 2.62, 2.54, 2.48, 2.87, 2.66, 3.46, 4.59, 3.37,
3.28, 2.99, 2.71, 2.32, 2.64, 2.66, 2.47, 2.48, 2.38, 2.67,
2.18, 3.02, 2.54, 2.53, 2.25, 2.43, 3.02, 2.58, 2.58, 2.48,
2.2, 2.7, 2.38, 2.28, 2.65, 3.21, 3.03, 2.38, 2.6, 2.44),
Sub5 = c(1.14, 1.38, 1.5, 1.43, 1.65, 1.34, 1.29, 1.72, 1.32,
1.17, 1.19, 1.35, 1.34, 1.06, 1.24, 1.33, 1.2, 1.31, 1.29,
1.37, 1.42, 1.08, 1.77, 1.32, 1.2, 1.14, 1.48, 0.98, 1.33,
1.65, 1.24, 1.43, 1.41, 1.2, 1.42, 1.09, 1.04, 1.57, 0.78,
1.37, 0.99, 1.4, 1.13, 1.34, 1.35, 1.23, 0.93, 0.94, 1.02,
1.16, 1.08, 0.96, 1.33, 1.19, 1.25, 1.44, 1.62, 1.27, 1.4,
1.4, 1.29, 1.53, 1.43, 1.33, 1.25, 1.82, 1.45, 1.36, 1.38,
1.34, 1.29, 1.86, 1.15, 1.31, 1.21, 1.23, 1.42, 1.57, 1.23,
0.99, 1.33, 1.74, 1.03, 1.33, 1.41, 1.01, 0.97, 1.46, 1.55,
1.04, 1.22, 1.19, 1.74, 1.64, 1.35, 1.34, 1.21, 1.55, 1.31,
1.5, 1.45, 1.21, 0.83, 1.17, 1.25, 1.54, 1.5, 1.11, 3.14,
3.38, 3.5, 3.43, 3.65, 3.34, 3.29, 3.72, 3.32, 3.17, 3.19,
3.35, 3.34, 3.06, 3.24, 3.33, 3.2, 3.31, 3.29, 3.37, 3.42,
3.08, 3.77, 3.32, 3.2, 3.14, 3.48, 2.98, 3.33, 3.65, 3.24,
3.43, 3.41, 3.2, 3.42, 3.09, 3.04, 3.57, 2.78, 3.37, 2.99,
3.4, 3.13, 3.34, 3.35, 3.23, 2.93, 2.94, 3.02, 3.16, 3.08,
2.96, 3.33, 3.19, 3.25, 3.44, 3.62, 3.27, 3.4, 3.4, 3.29,
3.53, 3.43, 3.33, 3.25, 3.82, 3.45, 3.36, 3.38, 3.34, 3.29,
3.86, 3.15, 3.31, 3.21, 3.23, 3.42, 3.57, 3.23, 2.99, 3.33,
3.74, 3.03, 3.33, 3.41, 3.01, 2.97, 3.46, 3.55, 3.04, 3.22,
3.19, 3.74, 3.64, 3.35, 3.34, 3.21, 3.55, 3.31, 3.5, 3.45,
3.21, 2.83, 3.17, 3.25, 3.54, 3.5, 3.11), Sub6 = c(0.2, 0.15,
0.16, 0.14, 0.19, 0.12, 0.14, 0.35, 0.29, 0.25, 0.06, 0.16,
0.18, 0.65, 0.18, 0.12, 0.42, 0.09, 0.13, 0.12, 0.22, 0.49,
0.18, 0.11, 0.29, 0.16, 0.18, 0.15, 0.46, 0.19, 0.15, 0.19,
0.1, 0.09, 0.11, 0.14, 0.1, 0.31, 0.53, 0.32, 0.23, 0.18,
0.14, 0.38, 0.19, 0.1, 0.14, 0.08, 0.21, 0.13, 0.08, 0.08,
0.26, 0.14, 0.17, 0.09, 0.09, 0.22, 0.26, 0.09, 0.3, 0.16,
0.17, 0.09, 0.12, 0.17, 0.14, 0.34, 0.12, 0.21, 0.1, 0.27,
0.11, 0.13, 0.15, 0.17, 0.21, 0.16, 0.12, 0.36, 0.16, 0.17,
0.27, 0.32, 0.15, 0.13, 0.14, 0.15, 0.1, 0.26, 0.25, 0.08,
0.25, 0.19, 0.38, 0.08, 0.64, 0.71, 0.1, 0.18, 0.12, 0.13,
0.1, 1.17, 0.14, 0.19, 0.14, 0.24, 2.2, 2.15, 2.16, 2.14,
2.19, 2.12, 2.14, 2.35, 2.29, 2.25, 2.06, 2.16, 2.18, 2.65,
2.18, 2.12, 2.42, 2.09, 2.13, 2.12, 2.22, 2.49, 2.18, 2.11,
2.29, 2.16, 2.18, 2.15, 2.46, 2.19, 2.15, 2.19, 2.1, 2.09,
2.11, 2.14, 2.1, 2.31, 2.53, 2.32, 2.23, 2.18, 2.14, 2.38,
2.19, 2.1, 2.14, 2.08, 2.21, 2.13, 2.08, 2.08, 2.26, 2.14,
2.17, 2.09, 2.09, 2.22, 2.26, 2.09, 2.3, 2.16, 2.17, 2.09,
2.12, 2.17, 2.14, 2.34, 2.12, 2.21, 2.1, 2.27, 2.11, 2.13,
2.15, 2.17, 2.21, 2.16, 2.12, 2.36, 2.16, 2.17, 2.27, 2.32,
2.15, 2.13, 2.14, 2.15, 2.1, 2.26, 2.25, 2.08, 2.25, 2.19,
2.38, 2.08, 2.64, 2.71, 2.1, 2.18, 2.12, 2.13, 2.1, 3.17,
2.14, 2.19, 2.14, 2.24), Sub7 = c(0.01, 0, 0, 0.01, 0, 0,
0.01, 0.01, 0.02, 0.03, 0.01, 0, 0.03, 0, 0.02, 0, 0, 0,
0.01, 0.03, 0.03, 0.02, 0.02, 0.02, 0.01, 0.01, 0.01, 0,
0, 0.05, 0.02, 0.04, 0.02, 0, 0.02, 0.02, 0.02, 0.04, 0.01,
0.02, 0.04, 0.02, 0.01, 0.01, 0.01, 0.01, 0.03, 0.02, 0,
0.02, 0.05, 0.14, 0, 0.01, 0, 0.01, 0.01, 0, 0.01, 0.02,
0.01, 0.02, 0.01, 0.03, 0.05, 0.06, 0.03, 0.02, 0.11, 0.05,
0.02, 0.02, 0, 0.01, 0, 0.01, 0.06, 0.04, 0.02, 0.02, 0,
0.02, 0.01, 0.02, 0.01, 0, 0.01, 0.01, 0.02, 0.01, 0.02,
0.01, 0, 0.01, 0.06, 0.01, 0.02, 0.01, 0.01, 0.03, 0.02,
0.03, 0.03, 0.02, 0.09, 0, 0.19, 0.02, 2.01, 2, 2, 2.01,
2, 2, 2.01, 2.01, 2.02, 2.03, 2.01, 2, 2.03, 2, 2.02, 2,
2, 2, 2.01, 2.03, 2.03, 2.02, 2.02, 2.02, 2.01, 2.01, 2.01,
2, 2, 2.05, 2.02, 2.04, 2.02, 2, 2.02, 2.02, 2.02, 2.04,
2.01, 2.02, 2.04, 2.02, 2.01, 2.01, 2.01, 2.01, 2.03, 2.02,
2, 2.02, 2.05, 2.14, 2, 2.01, 2, 2.01, 2.01, 2, 2.01, 2.02,
2.01, 2.02, 2.01, 2.03, 2.05, 2.06, 2.03, 2.02, 2.11, 2.05,
2.02, 2.02, 2, 2.01, 2, 2.01, 2.06, 2.04, 2.02, 2.02, 2,
2.02, 2.01, 2.02, 2.01, 2, 2.01, 2.01, 2.02, 2.01, 2.02,
2.01, 2, 2.01, 2.06, 2.01, 2.02, 2.01, 2.01, 2.03, 2.02,
2.03, 2.03, 2.02, 2.09, 2, 2.19, 2.02)), class = "data.frame", row.names = c(NA,
-216L))
I think the issue with your tests object is that it holds too much informations to figure out how to plot it.
Here, I focused only on Regions columns, but you can apply the same workflow to other categorical columns of your dataset.
1) We need to obtain the label (letters) associated to each region for each substance, so recycling your loop, I did this:
library(multcomp)
library(multcompView)
Labels_box = NULL
Group <- as.factor(data[,"Region"])
for(j in 6:12)
{
Response <- data[, j]
TUKEY <- TukeyHSD(aov(lm(Response ~ Group)))
MultComp <- multcompLetters(extract_p(TUKEY$Group))
Region <- names(MultComp$Letters)
Labels <- MultComp$Letters
df <- data.frame(Region, Labels)
df$Substance <- colnames(data)[j]
if(j == 1){Labels_box = df}
else{Labels_box = rbind(Labels_box,df)}
}
Now, the dataset Labels_box should look like:
head(Labels_box)
Region Labels Substance
B B a Sub1
C C b Sub1
D D b Sub1
E E b Sub1
A A a Sub1
B1 B a Sub2
2) Next, in order to add them on the top of each boxplot, we will have to define the y position for each labels. So, we are going to calculate the max value of each region for each substance using dplyr and tidyr:
library(tidyverse)
Max_Val <- data %>% pivot_longer(., cols = starts_with("Sub"), names_to = "Substance", values_to = "Value") %>%
group_by(Region, Substance) %>% summarise(MAX = max(Value)+0.2)
# A tibble: 6 x 3
# Groups: Region [1]
Region Substance MAX
<fct> <chr> <dbl>
1 A Sub1 0.26
2 A Sub2 4.13
3 A Sub3 1.55
4 A Sub4 2.21
5 A Sub5 2.06
6 A Sub6 0.85
And we combine both Labels_box and Max_Val datasets using left_join:
Labels_box <- left_join(Labels_box, Max_Val, by = c("Region" = "Region", "Substance" = "Substance"))
Region Labels Substance MAX
1 B a Sub1 0.25
2 C b Sub1 2.25
3 D b Sub1 2.26
4 E b Sub1 2.25
5 A a Sub1 0.26
6 B a Sub2 4.33
3) Finally, we need to reshape in a long format all values for each substances from your data to match the grammar used by ggplot. For that, we can re-use the pivot_longer function seen in 2):
library(tidyverse)
data_box <- data %>% pivot_longer(., cols = starts_with("Sub"), names_to = "Substance", values_to = "Value")
# A tibble: 6 x 7
Region State Location Sex ID Substance Value
<fct> <fct> <fct> <fct> <int> <chr> <dbl>
1 A FL APNG F 1 Sub1 0.03
2 A FL APNG F 1 Sub2 0.69
3 A FL APNG F 1 Sub3 1.32
4 A FL APNG F 1 Sub4 0.63
5 A FL APNG F 1 Sub5 1.14
6 A FL APNG F 1 Sub6 0.2
We are almost ready but in order to set a color matching group identified by Tukey test, we need to add the label on our data_box.
For that, we can do a left_join:
data_box <- left_join(data_box,Labels_box, by = c("Region" = "Region", "Substance" = "Substance"))
# A tibble: 6 x 9
Region State Location Sex ID Substance Value Labels MAX
<fct> <fct> <fct> <fct> <int> <chr> <dbl> <fct> <dbl>
1 A FL APNG F 1 Sub1 0.03 a 0.26
2 A FL APNG F 1 Sub2 0.69 a 4.13
3 A FL APNG F 1 Sub3 1.32 a 1.55
4 A FL APNG F 1 Sub4 0.63 a 2.21
5 A FL APNG F 1 Sub5 1.14 a 2.06
6 A FL APNG F 1 Sub6 0.2 a 0.85
4) Now, we are ready to plot everything:
library(ggplot2)
ggplot(data_box, aes(x = Region, y = Value, fill = Labels))+
geom_boxplot()+
geom_text(data = Labels_box,aes( x = Region, y = MAX, label = Labels))+
facet_grid(.~Substance, scales = "free")
And you get this:
Does it look satisfying for you ?

How to determine a value in a column immediately before a value in another column in R?

Plot
Following is a plot of speeds of two vehicles over time. The subject vehicle (blue) is following the lead vehicle (red) in the same lane. So, the speed profile of subject vehicle is very similar to lead vehicle's.
I have manually labelled the points where a vehicle changes its speed by acceleration/deceleration. Now, I want to determine these points from the data. Following are the sample data:
Data
> dput(veh)
structure(list(Time = c(287, 288, 289, 290, 291, 292, 293, 294,
295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307,
308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320,
321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331), fit_p = c(NA,
NA, NA, 8.86, 8.5, 8.15, 7.79, 7.44, 7.08, 6.73, 6.38, 6.1, 6.48,
6.86, 7.24, 7.63, 8.01, 8.38, 8.58, 8.68, 8.7, 8.53, 8.33, 8.12,
7.92, 7.71, 7.74, 8.1, 8.45, 8.8, 9.15, 9.29, 9.22, 9.16, 9.09,
9.13, 9.25, 9.37, 9.49, 9.51, 9.34, 9.17, NA, NA, NA), psi_p2 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 298, NA, NA, NA, NA,
NA, 304, 305, NA, 307, NA, NA, NA, NA, NA, 313, NA, NA, NA, 317,
NA, NA, NA, 321, NA, NA, NA, NA, 326, NA, NA, NA, NA, NA), slo_p = c(-0.35,
-0.35, -0.35, -0.35, -0.35, -0.35, -0.35, -0.35, -0.35, -0.35,
-0.35, 0.38, 0.38, 0.38, 0.38, 0.38, 0.38, 0.2, 0.02, 0.02, -0.2,
-0.2, -0.2, -0.2, -0.2, -0.2, 0.35, 0.35, 0.35, 0.35, -0.06,
-0.06, -0.06, -0.06, 0.12, 0.12, 0.12, 0.12, 0.12, -0.17, -0.17,
-0.17, -0.17, -0.17, -0.17), fit_v = c(NA, NA, NA, 9.16, 8.57,
7.99, 7.4, 7.23, 7.13, 7.04, 6.94, 6.85, 6.75, 6.66, 7.07, 7.57,
8.06, 8.56, 9.04, 9.15, 9.26, 9.37, 9.15, 8.92, 8.68, 8.45, 8.22,
7.99, 8.03, 8.24, 8.55, 8.87, 9.02, 8.96, 8.89, 8.82, 8.75, 8.99,
9.28, 9.47, 9.42, 9.37, NA, NA, NA), psi_v2 = c(NA, NA, NA, NA,
NA, NA, 293, NA, NA, NA, NA, NA, NA, 300, NA, NA, NA, NA, 305,
NA, NA, 308, NA, NA, NA, NA, NA, 314, 315, 316, NA, NA, 319,
NA, NA, NA, 323, NA, NA, 326, NA, NA, NA, NA, NA), slo_v = c(-0.59,
-0.59, -0.59, -0.59, -0.59, -0.59, -0.1, -0.1, -0.1, -0.1, -0.1,
-0.1, -0.1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.11, 0.11, 0.11, -0.23,
-0.23, -0.23, -0.23, -0.23, -0.23, 0.04, 0.16, 0.32, 0.32, 0.32,
-0.07, -0.07, -0.07, -0.07, 0.29, 0.29, 0.29, -0.05, -0.05, -0.05,
-0.05, -0.05, -0.05)), .Names = c("Time", "fit_p", "psi_p2",
"slo_p", "fit_v", "psi_v2", "slo_v"), row.names = c(NA, -45L), class = "data.frame")
In the column psi_v2, I have the time where subject vehicle changed the speed. These are all the S points. The points where the lead vehicle changed the speed are in the column psi_p2. But, I only want to determine the location of those points in psi_p2 which happened immediately before point S. These points are all the L points on the plot. For instance, S1 happened at psi_v2=300, therefore, L1 is 298 in psi_p2.
Question
I guess that I need to use which() to determine the relevant points from psi_p2. But I don't know how to code the part where only the "immediately before" point is picked.
Once the points are identified, I want to check if the subject vehicle accelerated in response to lead vehicle's acceleration. The acceleration of subject vehicle is in slo_v and that of lead vehicle is inslo_p. Example: For S1, slo_v = 0.5, and for L1, slo_p = 0.38. Since subject vehicle accelerated due to acceleration of lead vehicle, we call it "opening" (or "closing" in opposite case).
So, my desired output is:
structure(list(Time = 287:331, fit_p = c(NA, NA, NA, 8.86, 8.5,
8.15, 7.79, 7.44, 7.08, 6.73, 6.38, 6.1, 6.48, 6.86, 7.24, 7.63,
8.01, 8.38, 8.58, 8.68, 8.7, 8.53, 8.33, 8.12, 7.92, 7.71, 7.74,
8.1, 8.45, 8.8, 9.15, 9.29, 9.22, 9.16, 9.09, 9.13, 9.25, 9.37,
9.49, 9.51, 9.34, 9.17, NA, NA, NA), psi_p2 = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 298L, NA, NA, NA, NA, NA, 304L, 305L,
NA, 307L, NA, NA, NA, NA, NA, 313L, NA, NA, NA, 317L, NA, NA,
NA, 321L, NA, NA, NA, NA, 326L, NA, NA, NA, NA, NA), slo_p = c(-0.35,
-0.35, -0.35, -0.35, -0.35, -0.35, -0.35, -0.35, -0.35, -0.35,
-0.35, 0.38, 0.38, 0.38, 0.38, 0.38, 0.38, 0.2, 0.02, 0.02, -0.2,
-0.2, -0.2, -0.2, -0.2, -0.2, 0.35, 0.35, 0.35, 0.35, -0.06,
-0.06, -0.06, -0.06, 0.12, 0.12, 0.12, 0.12, 0.12, -0.17, -0.17,
-0.17, -0.17, -0.17, -0.17), fit_v = c(NA, NA, NA, 9.16, 8.57,
7.99, 7.4, 7.23, 7.13, 7.04, 6.94, 6.85, 6.75, 6.66, 7.07, 7.57,
8.06, 8.56, 9.04, 9.15, 9.26, 9.37, 9.15, 8.92, 8.68, 8.45, 8.22,
7.99, 8.03, 8.24, 8.55, 8.87, 9.02, 8.96, 8.89, 8.82, 8.75, 8.99,
9.28, 9.47, 9.42, 9.37, NA, NA, NA), psi_v2 = c(NA, NA, NA, NA,
NA, NA, 293L, NA, NA, NA, NA, NA, NA, 300L, NA, NA, NA, NA, 305L,
NA, NA, 308L, NA, NA, NA, NA, NA, 314L, 315L, 316L, NA, NA, 319L,
NA, NA, NA, 323L, NA, NA, 326L, NA, NA, NA, NA, NA), slo_v = c(-0.59,
-0.59, -0.59, -0.59, -0.59, -0.59, -0.1, -0.1, -0.1, -0.1, -0.1,
-0.1, -0.1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.11, 0.11, 0.11, -0.23,
-0.23, -0.23, -0.23, -0.23, -0.23, 0.04, 0.16, 0.32, 0.32, 0.32,
-0.07, -0.07, -0.07, -0.07, 0.29, 0.29, 0.29, -0.05, -0.05, -0.05,
-0.05, -0.05, -0.05), label = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 9L, 1L, 1L, 1L, 3L, 10L, 1L,
4L, 11L, 1L, 1L, 1L, 1L, 5L, 1L, 1L, 12L, 6L, 1L, 13L, 1L, 7L,
1L, 14L, 1L, 1L, 8L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "L1",
"L2", "L3", "L4", "L5", "L6", "L7&S7", "S1", "S2", "S3", "S4",
"S5", "S6"), class = "factor"), condition = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L,
3L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 2L, 1L,
1L, 1L, 3L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "closing",
"opening"), class = "factor")), .Names = c("Time", "fit_p", "psi_p2",
"slo_p", "fit_v", "psi_v2", "slo_v", "label", "condition"), class = "data.frame", row.names = c(NA,
-45L))
Kindly guide me what function should I use to identify these points? I prefer using dplyr because I have multiple pairs like this example. An operation for one data frame can then be used on all others using group_by().

Using mob() trees (partykit package) with nls() model

I'm trying to use model-based recursive partitioning (MOB) with the mob() function (from the partykit package) to separate several curves that were derived using the nls() function. I had to define my model and determine the starting values. I've been trying to see if this could be used with the mob() function to no avail.
I tried following this example on page 7:
https://cran.r-project.org/web/packages/partykit/vignettes/mob.pdf
I created a fit function that estimates the starting values and would return the estimates etc. of the nls(). But I can't seem to get anything going after that. I'd like to know if it is at all possible to use a custom model, with coefficients and both dependent and independent variables and to include them in mob() and get it to work. I tried the lmtree() function but of course this will only give a straight line.
My code is below. Basically I use a segmented linear regression to get the starting values of a double exponential curve that I am using. This is the furthest I got basically. The parameter estimates give an error etc, if you even get past that it just won't run. I just need to know if it is at possible for the mob() function to run nls().
I loaded sample data, but if it is possible to use the nls()
photo.try <- function(y, x,start = NULL, weights = NULL, offset = NULL, estfun = FALSE, object = TRUE)
{
lin.mod1 <- lm(y ~ x)
segmented.mod.2 <- segmented(lin.mod1, seg.Z = ~x, psi=1)
segmented.mod1 <- segmented(lin.mod1, seg.Z = ~x, psi = segmented.mod.2$psi[1,2])
nls(y ~ (a*exp(-b * x) - c* exp(-d* x)), start = list(a = -1*(intercept(segmented.mod1)[[1]][1,1]) , b = slope(segmented.mod1)[[1]][1,1],
c = -1*(intercept(segmented.mod1)[[1]][2,1]),
d = -1*slope(segmented.mod1)[[1]][2,1]))
}
photo_form <- Pn ~ (a*exp(-b * PAR) - c* exp(-d* PAR))| Species
photo_tree <- mob(photo_form, data = eco, fit = (photo.try))
Here is my sample data:
eco <- structure(list(Species = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("Bogum",
"Clethra", "Eugene", "Guarria", "Melo", "Santa", "Sapium"), class = "factor"),
PAR = c(0, 58.6, 101.4, 228.6, 462.4, 904.7, 1565.8, 1992.1,
2395.9, 0, 72.8, 125.9, 232.8, 411, 841.1, 1669.6, 2394.5,
2394.9, 0, 53.5, 122.1, 231.6, 451, 808.5, 1575, 2394.6,
2395.1, 0, 70.9, 104.8, 251.1, 474.6, 858.3, 1612.3, 2393.3,
2395.1, 0, 63.1, 124.6, 277.1, 417.7, 824.4, 1649.6, 2377.7,
2381.9, 0, 31, 46.5, 115.7, 228.1, 424.3, 822.5, 1644.2,
2380.7, 2381.2, 0, 50.1, 118.1, 203.3, 413.2, 804.5, 1587.3,
2385.3, 0, 28.8, 36.9, 101.2, 211.7, 423.1, 793, 0, 43.6,
106.7, 200.8, 468.6, 808.4, 1567, 2367.1, 2376.5, 0.1, 40.4,
104.1, 202.2, 447.3, 794.7, 1546, 2391.8, 2393.3, 0.1, 44.1,
107.5, 227.4, 429.6, 802.5, 1668.4, 2391, 0, 42.2, 125.3,
126.2, 127.3, 240.3, 433.4, 791, 1600, 2396.8, 2397, 2399.3,
0, 72.7, 118.1, 236.9, 425, 828.4, 1613.3, 1615.4, 2396.1,
2396.5, 2397.2, 2397.5, 0, 62, 116.2, 235.5, 401.7, 879,
879.8, 1552.2, 1553.9, 2394.3, 2394.4, 2394.7, 2396.6, 0,
84.8, 135, 209.8, 425.3, 859.1, 1597.6, 2377.3, 2379.5, 2385.1,
0.1, 62, 106.3, 226.2, 442.9, 822.5, 1462.3, 2389.8, 2392.1,
0.1, 0.1, 73.9, 126, 249.8, 428.5, 846.5, 1555.3, 2390.1,
2390.7, 2390.8, 0, 68.7, 121.5, 209.7, 426.2, 803, 1525.9,
2389.8, 0, 52.8, 96.9, 211.1, 441.3, 787.9, 1566.5, 2415.2,
2415.3, 2415.5, 2417.5, 2417.7, 2418.5, 0.1, 46.5, 108.4,
233.5, 461.7, 792.3, 1635.7, 2415.1, 2415.6, 2415.6, 2416.5,
2416.6, 2417.8, 0.1, 68.3, 110, 239.5, 531.7, 847.2, 1591.4,
2387.3, 2387.6, 2389.7, 0, 49.7, 114.6, 230.6, 397.7, 398.2,
817.7, 1596.4, 2376.2, 2376.4, 2380.9, 0, 62.9, 65.5, 117,
209, 431.2, 854.5, 1611.3, 2387.3, 2388.5, 2390.3, 0, 49.1,
108.9, 200.3, 408.8, 842.2, 1630.2, 2386.5, 2386.8, 2388.2,
0, 64.8, 122.9, 226, 422.9, 801.6, 1635.7, 2383.6, 2383.6,
2384.3, 2386.1, 0, 36.7, 143.2, 213.7, 444.9, 814.9, 816.2,
1496.5, 2384.7, 2386.5, 2388.6, 0.1, 45.6, 105.2, 206.7,
494.8, 901.2, 1610.9, 2388, 2388.1, 2388.3, 2388.6, 0, 0.1,
45.9, 48.5, 100.2, 209.4, 432.4, 778, 1600.3, 2408.8, 2408.8,
0, 71.8, 121.6, 216.4, 404.3, 815.2, 1622, 2414.9, 2415.1,
2416.1, 2416.1, 0, 36.2, 97.5, 186.7, 417.9, 840.4, 1597.5,
2390.7, 2390.9, 2391.2, 2391.2, 2391.5, 2392.1, 2392.5, 0,
53.8, 138.2, 227, 403.6, 800.8, 1642.3, 2396.9, 2397.1, 0,
57.9, 95.1, 246.6, 466.8, 796.2, 1574.2, 2395.5, 2397.3,
0, 54.9, 94.9, 201.7, 408.1, 822.6, 1596, 2384.1, 0, 55.6,
131, 202.5, 419.8, 798.5, 1614, 2387.4, 2387.8, 0, 39.1,
109.6, 197.1, 403.3, 835.4, 836.9, 1725.9, 1727.4, 1729.3,
1730.6, 54.5, 58.6, 125.4, 226.9, 409, 806.8, 1578.8, 2377.2,
2380.1, 2388.3, 0, 68, 127.4, 206.9, 510.5, 814.9, 1561,
2404.1, 2404.8, 0, 58.4, 95.3, 229.6, 457.2, 781.5, 1634.4,
2399.8, 2401, 2403, 0.1, 56.5, 101.9, 221.8, 394.3, 815.1,
1655.4, 2411.8, 2411.9, 0, 50.2, 107.3, 220.5, 434.4, 819.8,
1630.6, 2412.4, 2412.6, 0, 48.4, 117.7, 195.3, 403.2, 801,
1632.7, 2388.9, 2389.3, 2390.7, 0, 50.4, 120.3, 234.7, 460.3,
829.1, 1581.7, 2398.5, 2402.3, 0, 60.8, 105.8, 215.8, 466.6,
826, 828.3, 1570.8, 2405.6, 2406.1, 2408.8, 0, 52.6, 106.9,
206.5, 414.3, 868.4, 1629.9, 1655.1, 2409.1, 2413, 0, 49.5,
100.6, 232.9, 389.4, 808.2, 1588.2, 2412.4, 2413.3, 2415.9,
0.1, 70.9, 110.5, 208.4, 409, 807.5, 1579.9, 2382.2, 2382.5,
2383.6, 2383.8, 0, 61.5, 106.5, 213.9, 473.8, 814.2, 1561.9,
2390.7, 2391.9, 2393.1, 0, 59.9, 64, 112, 216, 397.6, 807.4,
1625, 2392.3, 2395.1, 0, 74, 108.8, 109.7, 236.1, 433.6,
794.7, 1590.3, 2381.9, 2382.5, 0.1, 56.3, 114.5, 254.1, 487.7,
864.3, 1593.5, 2369.3, 2369.3, 2372.3, 2373.9, 0.2, 57.1,
110, 201.4, 402.7, 807.2, 1572.9, 2392.8, 2393.5, 0.1, 56.4,
122.5, 224.5, 420.2, 853.7, 1502.1, 2390.3, 2392.9, 0, 50.5,
53.7, 118.2, 230, 462.8, 794.3, 1513.4, 2391.4, 2392.3, 2393.4,
2393.4, 2394.1, 0.1, 49.7, 98.3, 208.3, 383.2, 850.7, 1653.5,
2395.3, 2396, 2397.1, 0, 48.4, 121.2, 228.8, 423.9, 817,
1708.5, 2389.9, 2389.9, 0, 66.4, 129.7, 209.4, 431.5, 794.1,
1673.7, 2383.7, 2384.2, 0, 57, 122.6, 215, 434.1, 838.5,
1657.5, 2386.4, 0.1, 22.6, 127.8, 220.4, 404.3, 810.9, 1592.3,
2386.7, 2388.7, 0, 49.8, 119.7, 200.5, 463.8, 828.7, 1560.7,
2384.5, 2385.7, 2391.2, 0, 73.1, 138.2, 226.6, 408.5, 815.3,
1627.3, 2390.2, 2395.4, 0, 61.2, 108.8, 233.8, 417.7, 824.5,
1502.7, 2395, 2396.2, 0, 56, 101.4, 226.3, 282.1, 412.9,
873.8, 1672.6, 2380.4, 2380.9, 2381.5, 0.1, 70.7, 138, 246,
444.4, 817.1, 1643.2, 2391.5, 2391.8, 2392), Pn = c(-0.95,
0.75, 0.94, 1.27, 1.5, 1.9, 2.14, 2.35, 2.38, 1.48, 3.51,
3.7, 3.99, 4.4, 4.32, 4.52, 4.73, 4.72, 1.97, 3.24, 4.23,
4.35, 4.41, 4.66, 4.57, 4.68, 4.88, 1.16, 3.64, 4.05, 4.75,
5.42, 5.57, 5.55, 5.89, 5.8, 1.48, 3.89, 4.7, 5.34, 5.47,
5.62, 5.71, 5.7, 6.08, 1.26, 0.59, 2.96, 4.34, 5, 4.82, 5.22,
5.2, 5.33, 5.51, 1.2, 2.95, 3.67, 3.9, 4.06, 4.59, 4.6, 4.62,
2.01, 1.92, 2.41, 2.19, 2.22, 2.41, 2.21, 1.6, 3.29, 3.97,
4.39, 4.89, 5.12, 4.93, 5.12, 5.1, 2.39, 3.84, 4.45, 4.63,
4.43, 4.93, 4.78, 4.73, 5.04, 3.09, 3.74, 4.03, 3.89, 4.52,
4.43, 4.24, 4.26, 1.5, 2.73, 2.83, 3.14, 2.89, 3.39, 2.89,
2.84, 3.34, 3.11, 3.16, 3.31, 0.1, 1.17, 1.72, 1.61, 1.64,
2.06, 2.17, 1.99, 2.31, 2.14, 2.27, 2.08, 0.17, 1.17, 1.32,
1.33, 1.4, 1.8, 1.48, 2, 1.81, 1.95, 2.09, 1.73, 1.85, 2.95,
4.33, 4.82, 4.98, 4.97, 5.03, 5.08, 5.22, 5.32, 4.88, 2.17,
3.08, 3.32, 3.42, 3.45, 3.67, 3.64, 3.71, 3.71, 2.85, 2.33,
3.15, 2.81, 3.22, 2.99, 3.16, 3.33, 3.56, 3.61, 3.63, 2.52,
3.55, 4.07, 4.1, 4.17, 4.41, 4.53, 4.56, 2.06, 2.57, 2.91,
2.61, 3.08, 3.29, 3.99, 6.49, 5.23, 6.08, 5.74, 4.41, 6.5,
1.59, 3.22, 3.59, 3.75, 3.84, 4.5, 4.93, 6.87, 6.75, 6.97,
6.53, 6.04, 6.82, 1.28, 3.56, 4.39, 5.27, 5.51, 6.38, 7.05,
7.46, 7.16, 7.24, 0.87, 2.45, 3.86, 4.32, 4.57, 4.43, 4.68,
4.71, 4.86, 4.36, 4.68, 1.06, 2.79, 4.05, 4.86, 5.48, 5.9,
6.38, 6.79, 7.46, 7.12, 7.03, 2.76, 3.92, 3.96, 4.07, 4.2,
4.5, 4.91, 5.52, 5.49, 5.33, 2.84, 4.78, 4.83, 4.76, 4.74,
4.84, 5.19, 5.59, 5.74, 5.7, 5.65, 3.02, 3.61, 4.14, 4.23,
4.45, 4.37, 4.5, 4.6, 4.78, 4.79, 4.85, 2.71, 4.26, 5.42,
6.24, 6.58, 6.63, 6.55, 7.29, 7.43, 7.24, 7, 3.36, 2.19,
2.86, 2.87, 2.37, 3.16, 2.68, 3, 3.4, 3.6, 4.35, 1.28, 2.62,
2.92, 3.3, 3.35, 3.58, 3.73, 4.02, 4, 3.7, 3.75, 1.61, 2.26,
2.5, 2.52, 2.71, 2.61, 2.75, 3.19, 2.92, 3.99, 4.36, 3.67,
4.14, 4.37, -0.28, 1.91, 2.78, 2.84, 2.96, 3.04, 3.24, 3.44,
3.58, 1.78, 4.12, 4.58, 4.33, 4.8, 4.7, 5.02, 5.09, 5.22,
2.79, 4.71, 4.89, 4.93, 4.87, 4.92, 4.83, 4.81, 1.66, 3,
4.04, 4.35, 4.56, 4.75, 4.75, 4.66, 4.89, 1.56, 2.77, 3.86,
3.58, 3.7, 3.76, 3.58, 4.55, 4.63, 4.05, 3.73, 1.76, 2.71,
2.98, 3.01, 3.06, 3.22, 2.99, 3.15, 3.32, 3.34, 1.58, 3.76,
4.97, 5.21, 5.29, 5.5, 5.59, 5.71, 5.74, 1.89, 2.67, 3.01,
3.14, 3.39, 3.57, 3.45, 3.91, 4.11, 3.94, 1.15, 2.88, 3.63,
4.32, 4.09, 4.43, 4.58, 4.61, 4.63, 1.23, 2.26, 3.15, 3.33,
3.3, 3.61, 3.46, 3.65, 3.67, 0.19, 2.23, 3.43, 4.1, 4.85,
5.21, 5.8, 6.27, 6.34, 6.08, 1.94, 3.72, 4.88, 5.51, 6.71,
6.51, 6.96, 7.01, 7.4, 0.48, 2.29, 2.5, 2.87, 3.18, 3.51,
3.13, 3.86, 4.13, 4.34, 4.03, 1.63, 3.64, 5.15, 5.95, 6.43,
6.57, 6.61, 6.51, 6.65, 6.56, 1.93, 3.95, 4.63, 5.66, 6.03,
6.28, 6.67, 6.69, 6.95, 6.75, 0.93, 3.14, 3.46, 3.9, 4.19,
4.27, 4.77, 5.39, 5.36, 5.24, 5.02, 1.71, 3.31, 3.86, 4.02,
4.02, 4.29, 4.36, 4.73, 4.88, 4.59, 1.63, 2.65, 2.63, 2.48,
2.93, 3.45, 4.01, 4.67, 5.02, 5.08, 1.93, 3.54, 3.8, 3.81,
4.04, 4.17, 4.38, 4.55, 4.99, 4.99, 1.29, 2.73, 3.32, 3.66,
3.77, 3.79, 4.14, 4.37, 4.22, 4.1, 4.14, 1.06, 2.89, 3.65,
4.01, 4.11, 4.19, 4.66, 5.03, 5.12, 0.97, 2.45, 2.99, 3.32,
3.34, 3.35, 3.47, 3.12, 3.38, 2.29, 1.72, 4.33, 5.49, 6.44,
6.96, 7.91, 7.49, 8.45, 8.21, 8.17, 8.71, 8.35, 0.29, 2.99,
3.93, 4.52, 5.69, 6.23, 6.23, 6.81, 6.96, 6.68, 0.99, 3.67,
4.62, 5.52, 5.86, 6.23, 5.91, 6.64, 6.29, -0.08, 3.34, 4.89,
6.02, 6.37, 6.59, 6.99, 6.95, 7.2, 0.99, 2.28, 2.72, 2.67,
2.99, 3.18, 3.55, 3.58, 1.31, 2.18, 5.55, 7.37, 8.42, 9.14,
9.44, 9.26, 9.5, 1.23, 3.11, 5.01, 6.21, 7.14, 7.44, 7.79,
7.73, 8.1, 7.96, 1.35, 3.33, 5.67, 6.58, 7.05, 7.36, 7.73,
7.75, 7.99, 0.4, 2.25, 2.83, 3.31, 3.55, 3.66, 3.96, 3.54,
3.77, 1.46, 2.91, 3.51, 3.64, 4.5, 3.83, 3.96, 4.17, 4.66,
4.09, 4.44, 2.41, 4.77, 5.49, 6.05, 6.15, 6.28, 6.6, 6.76,
6.75, 6.78)), .Names = c("Species", "PAR", "Pn"), class = "data.frame", row.names = c(NA,
-628L))
Yes, we can! ;-)
In principle, you were attempting to do the right thing but a few aspects were not quite correct. The main issue is how you pass around the data and the formula: As mob() does not know anything about the way nls() specifies its formulas, a plain formula Pn ~ PAR | Species needs to be used and then the fit function needs to know what to do with the data. The pre-processing offered by mob() can either set up a model matrix (with intercept, dummy/contrast codings, etc.) or a model frame (where factors are still factors etc.). In this case it is easiest to use the default model matrix and then to omit the intercept in the fitting function.
The second problem with your code was that you used the extended specification of the fit function (with estfun and object arguments) but only supplied the fitted model object. With that specification mob() expects that the fit function sets up a suitable list with coefficients and objfun etc.
In combination, this means that your fit function should look like this:
photofit <- function(y, x = NULL, start = NULL, weights = NULL, offset = NULL, ...,
estfun = FALSE, object = FALSE)
{
## only use first real regressor (without intercept)
x <- x[, 2]
## obtain starting values if necessary
if(is.null(start)) {
aux_lm <- lm(y ~ x)
aux_seg_2 <- segmented::segmented(aux_lm, seg.Z = ~ x, psi = 1)
aux_seg_1 <- segmented::segmented(aux_lm, seg.Z = ~ x, psi = aux_seg_2$psi[1, 2])
start <- list(
a = -1 * (segmented::intercept(aux_seg_1)[[1]][1, 1]),
b = segmented::slope(aux_seg_1)[[1]][1, 1],
c = -1 * (segmented::intercept(aux_seg_1)[[1]][2, 1]),
d = -1 * segmented::slope(aux_seg_1)[[1]][2, 1]
)
} else {
start <- as.list(start)
}
## estimate NLS model
rval <- nls(y ~ (a * exp(-b * x) - c * exp(-d * x)), start = start)
## return processed information for mob()
list(
coefficients = coef(rval),
objfun = deviance(rval),
estfun = if(estfun) sandwich::estfun(rval) else NULL,
object = if(object) rval else NULL
)
}
And then you can grow the MOB tree. Specifying the verbose = TRUE control option will give you a little bit of progress information while you wait:
photomob <- mob(Pn ~ PAR | Species, data = eco, fit = photofit,
control = mob_control(verbose = TRUE))
coef(photomob)
## a b c d
## 4 2.967680 -3.216708e-05 1.519680 1.076879e+01
## 5 -1.811596 1.967366e-02 -3.573079 -4.877852e-05
## 6 -2.772783 1.438087e-02 -4.177953 -7.821814e-05
## 8 -2.427253 1.757744e-02 -4.449105 -1.328930e-04
## 9 -4.579248 1.020021e-02 -5.714575 -7.502393e-05
You can then also visualize the tree. By default a numeric summary is shown in each node but you can also easily display the fitted curves:
plot(photomob)
plot(photomob, terminal_panel = node_bivplot, tnex = 2)
As you see the tree selected five terminal nodes with different parameters. I would recommend that you do some more diagnostics on the model fits in the different nodes because I'm not sure how well all parameters are identified. I'm not very familiar with NLS and might be completely wrong but it seems that not always all parameters can be reliably determined.
As one illustration I do the following: I extract all nine fitted nls objects from the tree. For the model from the root node (node 1) I compute the gradient by summing over all observation-wise gradient contributions (as computed by the estfun() method):
photonls <- refit.modelparty(photomob)
library("sandwich")
colSums(estfun(photonls[[1]]))
## a b c d
## 2.010552e-05 5.753230e-02 -1.166331e-04 6.771585e+00
The gradients of parameters a-c are reasonably close to zero but for d it isn't. This may also affect the inference in mob() which is based on the observation-wise gradient contributions (aka model scores or estimating functions).
In short: What you want to do, can be done! But I would recommend considering a simpler model. If you do, you just need to modify the photofit() function accordingly and run it through mob() again.

Resources