MANOVA with variables from different datasets - r

This question was already asked on stats.stackexchange, but no one answered. Since I'm not sure which forum is the appropriate one, I post this here again with some data.
I have done experiments on various characteristics of tree bark and now want to compare in how far the five examined tree species differ in regards to the assessed parameters. So, it was suggested that I should use a MANOVA to analyse my data and it seems reasonable to me. My analysis is conducted in R.
However, unlike most examples I've found on how to do a MANOVA (i.e. here, here, here), my data stems from different measurements and from different individuals. Now, I've found only this thread discussing unequal sample sizes, but this targets only sample sizes within the explaining factor.
To illustrate a bit further, imagine I have per tree species...
9 measurements of the bark roughness.
4 measurements of the bark thickness,
3 pH measurements,
5 measurements of the water-holding capacity,
5 measurements of the water retention.
Of course, I could do separate ANOVAs for each of these variables (and I already did), but I think there should be some advantages in a MANOVA, right?
My Question:
Would a MANOVA be appropriate for such kind of data? Can I just ignore my different variable sizes? Is there an alternative way to do this or rather an alternative statistic test? Does my small sample size matter?
My results so far:
In R, I just put all the variables into one data.frame and filled the missing values due to unequal sample size by NAs (that's why there is the nums column in my data.frame below). Then, I ran a MANOVA like this: pH + water content + thickness + roughness ~ tree species with the manova function.
Example Data:
manova_df = structure(list(abbr = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L), .Label = c("AS", "BU", "CL", "MB", "PR"
), class = "factor"), nums = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L), comb_rugosity = c(3.44, 2.29, 5.21, 1.45,
2.84, 4.25, 1.54, 2.97, 1.38, 2.45, 9.44, 0, 0.58, 7.71, 5.53,
0.84, 1.22, 1, 10.83, 15.77, 5.5, 8.49, 10.46, 9.16, 5.52, 6.55,
1.77, 10.68, 13.43, 20.8, 8.82, 18.09, 15.1, 15.41, 16.3, 13.2,
2.67, 0.95, 1.49, 2.7, 0, 0.92, 0.83, 0, 1.89), bark_mm = c(9.59,
4.17, 17.23, 8.49, 3.58, NA, NA, NA, NA, 8.06, 13.53, 6.33, 10.96,
12.14, NA, NA, NA, NA, 17.94, 7.33, 10.54, 14.68, 16.66, NA,
NA, NA, NA, 8.52, 8.72, 7.57, 11.89, 6.41, NA, NA, NA, NA, 2.59,
9, 3.26, 5.81, NA, NA, NA, NA, NA), pH = c(6.5, 7.33, 8.17, NA,
NA, NA, NA, NA, NA, 7.84, 3.71, 12.47, 4.39, NA, NA, NA, NA,
NA, 11.04, 6.22, 5.41, 4.29, NA, NA, NA, NA, NA, 9.26, 11.18,
6.3, NA, NA, NA, NA, NA, NA, 8.42, 7.75, 4.33, NA, NA, NA, NA,
NA, NA), whc = c(192, 251, 166, 170, 466, NA, NA, NA, NA, 308,
187, 595, 324, 364, NA, NA, NA, NA, 171, 406, 790, 292, 579,
NA, NA, NA, NA, 672, 251, 700, 245, 260, 485, 383, NA, NA, 325,
481, 338, 476, 968, NA, NA, NA, NA), ret = c(83, 90, 286, 309,
374, NA, NA, NA, NA, 109, 159, 98, 164, 636, NA, NA, NA, NA,
144, 234, 383, 178, 446, NA, NA, NA, NA, 275, 56, 178, 107, 125,
367, 137, NA, NA, 132, 120, 142, 147, 330, NA, NA, NA, NA)), row.names = c(NA,
-45L), class = c("tbl_df", "tbl", "data.frame"))
Which looks like this (where abbr is the tree species, nums is the number of the measurement per tree species and the rest are the tree parameters):
> manova_df
# A tibble: 45 x 7
abbr nums comb_rugosity bark_mm pH whc ret
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AS 1 3.44 9.59 6.5 192 83
2 AS 2 2.29 4.17 7.33 251 90
3 AS 3 5.21 17.2 8.17 166 286
4 AS 4 1.45 8.49 NA 170 309
5 AS 5 2.84 3.58 NA 466 374
6 AS 6 4.25 NA NA NA NA
7 AS 7 1.54 NA NA NA NA
8 AS 8 2.97 NA NA NA NA
9 AS 9 1.38 NA NA NA NA
10 BU 1 2.45 8.06 7.84 308 109
# ... with 35 more rows
My analysis is pretty straightforward:
mano_mod = manova(cbind(pH, bark_mm, comb_rugosity, whc, ret) ~ abbr, data = manova_df)
> summary(mano_mod)
Df Pillai approx F num Df den Df Pr(>F)
abbr 4 1.5708 1.4226 20 44 0.1628
Residuals 12
I did not include my real data here, but they follow the same structure. The given data are far from being significant, whereas my actual data are! My question is solely regarding the many NAs in my data and if the test is accurate.
(If anything is unclear, please ask.)

Related

R - How to save meta::forest output as tiff file (image cropped)

I'm struggling to save my forest plot output using the forest() function of the meta package as a tiff file with specific dimensions. The plot is cropped at the border.
I need to save a tiff file - no larger than 10MB, dpi=300, max height 225mm, max width 170 mm.
I am wondering whether:
there is a way to specify the size of the image in the forest() function (doesn't seem like there is)
there is way to scale the image when saving
I'd be grateful for some help!
Below is the data for the plot:
library(meta)
data = structure(list(instrument = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L, 4L, 4L), levels = c("FI continuous", "FI categorical",
"FP continuous", "FP categorical"), class = "factor"), author = structure(c(1L,
5L, 4L, 2L, 3L, 10L, 7L, 6L, 1L, 9L, 8L, 3L, 7L, 6L, 5L, 4L,
2L, 3L, 10L, 7L, 1L, 6L, 9L, 8L, 3L, 7L, 1L, 6L), levels = c("Chao, 2018",
"Ding, 2017", "Li, 2015", "Romero-Ortuno and Soraghan, 2014 (F)",
"Romero-Ortuno and Soraghan, 2014 (M)", "Thompson, 2019", "Widagdo, 2015",
"Woo, 2012 (F)", "Woo, 2012 (M)", "Zucchelli, 2019"), class = "factor"),
size = c(7668, 3056, 4002, 4638, 3985, 3363, 2087, 909, 7668,
2000, 2000, 3985, 2087, 909, 3056, 4002, 4638, 3985, 3363,
2087, 1633, 909, 2000, 2000, 3985, 2087, 1633, 909), deaths = c(4140,
239, 132, 278, 107, 477, 346, 86, 4140, 350, 137, 107, 346,
86, 239, 132, 278, 107, 477, 205, 754, 86, 350, 137, 107,
346, 754, 86), followup = c(7, 5, 5, 2, 3, 3, 3, 4, 7, 4,
4, 3, 3, 4, 5, 5, 2, 3, 3, 3, 7, 4, 4, 4, 3, 3, 7, 4), AUC = c(0.731,
0.7, 0.65, 0.75, 0.8, 0.84, 0.66, 0.73, 0.721, 0.632, 0.612,
0.8, 0.6, 0.73, 0.69, 0.65, 0.75, 0.79, 0.8, 0.63, 0.766,
0.73, 0.641, 0.602, 0.79, 0.57, 0.754, 0.71), lowerCI = c(0.724,
0.66, 0.61, 0.721, 0.76, 0.82, 0.63, 0.67, 0.713, 0.6, 0.564,
0.75, 0.57, 0.68, 0.65, 0.61, 0.721, 0.74, 0.78, 0.59, 0.749,
0.68, 0.608, 0.55, 0.74, 0.53, 0.736, 0.66), upperCI = c(0.739,
0.74, 0.7, 0.779, 0.85, 0.86, 0.69, 0.79, 0.728, 0.664, 0.66,
0.84, 0.63, 0.79, 0.73, 0.7, 0.779, 0.84, 0.82, 0.67, 0.782,
0.79, 0.673, 0.653, 0.83, 0.61, 0.771, 0.78), rating = structure(c(2L,
2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L,
2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L), levels = c("-",
"+"), class = "factor"), quality = structure(c(2L, 1L, 1L,
3L, 1L, 2L, 3L, 1L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 3L, 1L,
2L, 3L, 2L, 1L, 2L, 2L, 1L, 3L, 2L, 1L), levels = c("++",
"+", "0"), class = "factor"), AUC_95CI = c("0.731 (0.724-0.739)",
"0.7 (0.66-0.74)", "0.65 (0.61-0.7)", "0.75 (NA-NA)", "0.8 (0.76-0.85)",
"0.84 (0.82-0.86)", "0.66 (0.63-0.69)", "0.73 (0.67‐0.79)",
"0.721 (0.713-0.728)", "0.632 (0.6-0.664)", "0.612 (0.564-0.66)",
"0.8 (0.75-0.84)", "0.60 (0.57-0.63)", "0.73 (0.68‐0.79)",
"0.69 (0.65-0.73)", "0.65 (0.61-0.7)", "0.75 (NA-NA)", "0.79 (0.74-0.84)",
"0.79 (0.77-0.8)", "0.63 (0.59-0.67)", "0.766 (0.749-0.782)",
"0.73 (0.68‐0.79)", "0.641 (0.608-0.673)", "0.602 (0.55-0.653)",
"0.79 (0.74-0.83)", "0.57 (0.53-0.61)", "0.754 (0.736-0.771)",
"0.71 (0.66-0.78)"), adjustment = c("age and/or sex + further adjustment",
"undetermined", "undetermined", "age and/or sex", "age and/or sex + further adjustment",
"none or not specified", "none or not specified", "age and/or sex + further adjustment",
"age and/or sex + further adjustment", "none or not specified",
"none or not specified", "age and/or sex + further adjustment",
"none or not specified", "age and/or sex + further adjustment",
"undetermined", "undetermined", "age and/or sex", "age and/or sex + further adjustment",
"none or not specified", "none or not specified", "age and/or sex + further adjustment",
"age and/or sex + further adjustment", "none or not specified",
"none or not specified", "age and/or sex + further adjustment",
"none or not specified", "age and/or sex + further adjustment",
"age and/or sex + further adjustment"), pubyear = c(2018,
2014, 2014, 2017, 2015, 2019, 2015, 2019, 2018, 2012, 2012,
2015, 2015, 2019, 2014, 2014, 2017, 2015, 2019, 2015, 2018,
2019, 2012, 2012, 2015, 2015, 2018, 2019), area = structure(c(7L,
4L, 4L, 3L, 5L, 6L, 1L, 1L, 7L, 2L, 2L, 5L, 1L, 1L, 4L, 4L,
3L, 5L, 6L, 1L, 7L, 1L, 2L, 2L, 5L, 1L, 7L, 1L), levels = c("Australia",
"China", "England", "Europe", "International", "Sweden",
"US"), class = "factor"), meanage = c(NA, 80.4, 81.1, 74,
69.4, 74.7, 78.3, 74.4, NA, NA, NA, 69.4, 78.3, 74.4, 80.4,
81.1, 74, 69.4, 74.7, 78.3, NA, 74.4, NA, NA, 69.4, 78.3,
NA, 74.4), male_pct = c(NA, 100, 0, 44.6, 0, 35.1, 51, 45,
NA, 100, 0, 0, 51, 45, 100, 0, 44.6, 0, 35.1, 51, NA, 45,
100, 0, 0, 51, NA, 45), frail_cutoff = c(NA, NA, NA, NA,
NA, NA, NA, NA, 0.2, 0.25, 0.25, 0.35, 0.25, 0.21, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), FI_item_no = c(24,
70, 70, 29, 34, 45, 39, 34, 24, 47, 47, 34, 39, 34, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), dim_energy = c(1,
1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), dim_physicalactivity = c(0,
1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), dim_weightbmi = c(0,
0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), dim_strength = c(0,
1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), dim_gait = c(0, 1, 1,
1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), dim_other_count = c(8, 12,
12, 6, 7, 10, 8, 8, 8, 8, 8, 7, 8, 8, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), area2 = structure(c(5L,
3L, 3L, 3L, 4L, 3L, 1L, 1L, 5L, 2L, 2L, 4L, 1L, 1L, 3L, 3L,
3L, 4L, 3L, 1L, 5L, 1L, 2L, 2L, 4L, 1L, 5L, 1L), levels = c("Australia",
"China", "Europe", "International", "US"), class = "factor"),
dimscount = c(9, 16, 16, 9, 12, 11, 10, 13, 9, 12, 12, 12,
10, 13, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA), dimscount_fp = c(1, 4, 4, 3, 5, 1, 2, 5, 1, 4, 4, 5,
2, 5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA), instrument1 = c("FI (Continuous scores)", "FI (Continuous scores)",
"FI (Continuous scores)", "FI (Continuous scores)", "FI (Continuous scores)",
"FI (Continuous scores)", "FI (Continuous scores)", "FI (Continuous scores)",
"FI (Categorical scores)", "FI (Categorical scores)", "FI (Categorical scores)",
"FI (Categorical scores)", "FI (Categorical scores)", "FI (Categorical scores)",
"FP (Continuous scores)", "FP (Continuous scores)", "FP (Continuous scores)",
"FP (Continuous scores)", "FP (Continuous scores)", "FP (Continuous scores)",
"FP (Continuous scores)", "FP (Continuous scores)", "FP (Categorical scores)",
"FP (Categorical scores)", "FP (Categorical scores)", "FP (Categorical scores)",
"FP (Categorical scores)", "FP (Categorical scores)")), row.names = c(NA,
-28L), class = c("tbl_df", "tbl", "data.frame"))
This is my attempt:
mymeta = meta::metagen(TE=AUC,
seTE=(upperCI-lowerCI)/3.92, upper=upperCI, lower=lowerCI,
data=mydata,
method.tau="PM",
studlab=author,
label.left="\u2190 Poor discrimination",
label.right="Acceptable discrimination \u2192")
mymeta1 = update.meta(mymeta, byvar=instrument1, bylab="")
tiff("forest.tiff", width = 170, height = 225, units="mm", res=300)
forest = meta::forest(mymeta1,
ref=0.7,
xlab.pos=0.7,
xlim=c(0.5,1.0),
sortvar = seTE,
comb.fixed = FALSE,
comb.random = FALSE,
xlab="AUC (95% CI)",
leftcols=c("studlab"),
leftlabs=c("Study"),
rightcols=c("TE", "ci"),
rightlabs=c("AUC\nall-cause", "[95% CI]\nmortality"),
col.by="black",
random = FALSE,
common = FALSE,
test.overall=FALSE,
test.overall.common=FALSE,
test.overall.random=FALSE,
test.subgroup=FALSE,
test.subgroup.common=FALSE,
test.subgroup.random=FALSE,
print.subgroup.labels=TRUE,
test.effect.subgroup = FALSE,
test.effect.subgroup.common = FALSE,
test.effect.subgroup.random = FALSE,
hetstat=FALSE)
dev.off()
This results in an image that is cropped at the border and is too large 15MB (needs to be no more than 10MB...):
Thanks for your help in advance.

for loop that lets you pipe in different column names to dplyr pipe

I want to run four linear regressions from four metrics from two sites.
Site DCNSUB is the response variable and DCSSUB is the predictor in the regression.
I only want to regress where I have a complete pair of data for an event.
I do this for one metric at a time using the following dplyr pipe:
mV<-Tile%>% # model V for Volume
select(Event, Site, Volume)%>%
group_by(Event)%>%
filter(!any(is.na(all_of(Volume))))%>% # group by event and remove pairs that are missing volume
ungroup()%>%
mutate(Volume = log(Volume))%>% # take log
pivot_wider(names_from = Site,values_from = Volume) %>% # get responsive and predictor variable data into columns
as.data.frame(.)%>%
lm(DCNSUB~DCSSUB, data = .)
How can incorporate this into a for loop, where each iteration puts a different metric where 'Volume' is in the pipe? Here is my attempt:
for (i in names(Tile[-c(1,2)])){
mX<-Tile%>%
select(Event, Site, i)%>%
group_by(Event)%>%
filter(!any(is.na(all_of(i))))%>% # remove pairs that are missing i, note group by event helps removes the pairs
ungroup()%>%
mutate(i = log(i))%>% # take log
pivot_wider(names_from = Site,values_from = i) %>%# get responsive and predictor variable data into columns
as.data.frame(.)%>%
lm(DCNSUB~DCSSUB, data = .)
}
There have been other posts that use column indexing to call columns, but this doesn't work when trying to mix it with the column I want to remain constant in each loop. Also, those solution are for much less complicated pipes. Any help is appreciated, thanks.
data:
Tile<-structure(list(Event = c("10/17/2019", "10/17/2019", "10/23/2019",
"10/23/2019", "10/27/2019", "10/27/2019", "10/31/2019", "10/31/2019",
"11/24/2019", "11/24/2019", "11/28/2019", "11/28/2019", "12/10/2019",
"12/10/2019", "12/15/2019", "12/15/2019", "12/28/2019", "12/28/2019",
"12/30/2019", "12/30/2019", "1/3/2020", "1/3/2020", "1/12/2020",
"1/12/2020", "1/26/2020", "1/26/2020", "3/3/2020", "3/3/2020",
"3/8/2020", "3/8/2020", "3/13/2020", "3/13/2020", "5/12/2020",
"5/12/2020", "8/5/2020", "8/5/2020", "9/30/2020", "9/30/2020",
"12/1/2020", "12/1/2020", "12/25/2020", "12/25/2020", "1/17/2021",
"1/17/2021", "3/11/2021", "3/11/2021", "4/16/2021", "4/16/2021",
"4/22/2021", "4/22/2021", "4/30/2021", "4/30/2021", "5/6/2021",
"5/6/2021", "7/2/2021", "7/2/2021", "7/3/2021", "7/3/2021", "7/9/2021",
"7/9/2021", "7/13/2021", "7/13/2021", "7/14/2021", "7/14/2021",
"7/18/2021", "7/18/2021", "7/19/2021", "7/19/2021", "7/21/2021",
"7/21/2021", "7/31/2021", "7/31/2021", "8/2/2021", "8/2/2021",
"8/20/2021", "8/20/2021", "9/9/2021", "9/9/2021", "9/24/2021",
"9/24/2021", "10/17/2021", "10/17/2021", "10/22/2021", "10/22/2021",
"10/25/2021", "10/25/2021", "10/27/2021", "10/27/2021", "11/1/2021",
"11/1/2021"), Site = structure(c(3L, 1L, 3L, 1L, 3L, 1L, 3L,
1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L,
1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L,
1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L,
1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L,
1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L,
1L, 3L, 1L), .Label = c("DCNSUB", "DCNSUR", "DCSSUB", "DCSSUR"
), class = "factor"), TP.conc = c(1550, 2770, NA, NA, 1650, NA,
1810, NA, 666, 468, 1190, 1120, 574, 538, 487, 580, NA, 238,
610, 378, 398, 306, 744, 766, 447, 468, 504, 413, 384, 377, 714,
542, 927.2, 1000.1, 265.4, 285.1, 1527, 1764.5, NA, 460.9, NA,
469.8, NA, 172.8, 454, 432.8, 524.4, 476.4, 300, 303.6, 588.7,
598.1, 852.5, 797.6, 144.4, 122.1, 170.6, 110.1, 301.8, 328.8,
363.3, 498.5, 283.7, 104.9, 314.1, 327.6, 436.6, 262.1, 398.1,
358, 312, 251, 598, 831, 348, 456, 345, 240, 648, 949, 852, 1260,
643, 549, 712, 999, 982, 1100, 1240, 1555), TP.load = c(180.4,
NA, NA, NA, 67.6, NA, 201.5, NA, NA, NA, 53.3, 131.7, 12.1, 38.1,
7, 21.6, NA, NA, 21.2, 44.4, 6.1, 10.9, 79.7, 189.9, 10.5, 27.9,
84.2, 178.7, 13.6, 46.3, 14.4, 26.2, 11.4, 35, 4.2, 10.1, 4.6,
18.9, NA, 58.3, NA, 140.9, NA, 7.6, 181.8, 238.2, 72.5, 97.7,
18.5, 30.6, 121.4, 177.7, 114.1, 166.3, 22.8, 21.9, 15.1, 9.2,
25.8, 29.4, 7.6, 12.3, 3, 1.7, 58.5, 71, 23.3, 15.7, 32.7, 39.7,
3.1, 3.4, 67.2, 126, 49.1, 79.1, 8.6, 5.8, 8.7, 15.3, 38.62755,
62.40857143, NA, NA, NA, NA, NA, NA, NA, NA), SRP.conc = c(NA,
NA, NA, NA, 403, NA, NA, NA, NA, NA, NA, NA, 245, 234, 238, 197,
NA, 118, NA, NA, NA, NA, NA, 270, 121, 135, NA, NA, NA, NA, NA,
NA, 596.7, 635.6, 48, 85.9, 514.8, 572.7, NA, 161.5, NA, 163.3,
NA, 46.4, 96.9, 127, 83.1, 92.3, 53.5, 60.7, 111.7, 133.7, 132.2,
164.1, 50.1, 49.1, 54, 42.5, 122.5, 131.9, 104.2, 194.5, 84.6,
34.8, 90.2, 106.6, NA, NA, 129.9, 118.2, 62.2, 84.7, 105, 152,
92.6, 66, 45.9, 50.5, 66.2, 167, 264, 412, 203, 175, 352, 560,
503, 625, 621, 836), SRP.load = c(NA, NA, NA, NA, 16.5, NA, NA,
NA, NA, NA, NA, NA, 5.2, 16.6, 3.4, 7.3, NA, NA, NA, NA, NA,
NA, NA, 66.9, 2.8, 8, NA, NA, NA, NA, NA, NA, 7.3, 22.2, 0.8,
3.1, 1.6, 6.1, NA, 20.4, NA, 49, NA, 2, 38.8, 69.9, 11.5, 18.9,
3.3, 6.1, 23, 39.7, 17.7, 34.2, 7.9, 8.8, 4.8, 3.6, 10.5, 11.8,
2.2, 4.8, 0.9, 0.6, 16.8, 23.1, NA, NA, 10.7, 13.1, 0.6, 1.2,
11.8, 23, 13.1, 11.4, 1.1, 1.2, 0.9, 2.7, 12, 20.4, NA, NA, NA,
NA, NA, NA, NA, NA), Volume = c(11.64, NA, 1.87, 4.5, 4.1, 9.69,
11.13, 34, NA, NA, 4.48, 11.76, 2.1, 7.08, 1.45, 3.73, NA, NA,
3.47, 11.74, 1.52, 3.56, 10.71, 24.79, 2.34, 5.96, 16.71, 43.28,
3.54, 12.29, 2.02, 4.84, 1.22, 3.5, 1.59, 3.56, 0.3, 1.07, NA,
12.66, NA, 29.99, NA, 4.37, 40.04, 55.03, 13.82, 20.51, 6.18,
10.07, 20.62, 29.72, 13.38, 20.85, 15.76, 17.96, 8.82, 8.36,
8.56, 8.94, 2.1, 2.46, 1.07, 1.58, 18.64, 21.67, 5.33, 6, 8.22,
11.09, 0.99, 1.36, 11.23, 15.16, 14.11, 17.34, 2.48, 2.4, 1.34,
1.61, 4.53, 4.95, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(1L,
4L, 5L, 8L, 9L, 12L, 13L, 16L, 17L, 20L, 21L, 24L, 25L, 28L,
29L, 32L, 33L, 36L, 37L, 40L, 41L, 44L, 45L, 48L, 49L, 52L, 53L,
56L, 57L, 60L, 61L, 64L, 65L, 68L, 69L, 72L, 73L, 76L, 77L, 80L,
81L, 84L, 85L, 88L, 89L, 92L, 93L, 96L, 97L, 100L, 101L, 104L,
105L, 108L, 109L, 112L, 113L, 116L, 117L, 120L, 121L, 124L, 125L,
128L, 129L, 132L, 133L, 136L, 137L, 140L, 141L, 144L, 145L, 148L,
149L, 152L, 153L, 156L, 157L, 160L, 161L, 164L, 165L, 168L, 169L,
172L, 173L, 176L, 177L, 180L), class = "data.frame")
We may use map/lapply to loop as with for loop it needs a list to store the output which can be created of course, however, the output from map/lapply is itself a list
library(purrr)
library(dplyr)
library(tidyr)
# // loop over the names
out <- map(names(Tile)[-(1:2)], ~ Tile %>%
# // select the columns of interest along with looped column names
select(Event, Site, all_of(.x))%>%
# // grouped by Event and remove groups based on the NA in the looped column
group_by(Event)%>%
filter(!any(is.na(.data[[.x]])))%>%
ungroup()%>%
# // convert the column looped to its `log`
mutate(!! .x := log(.data[[.x]]))%>%
# // reshape from long to wide
pivot_wider(names_from = Site,values_from = all_of(.x)) %>%
# // build the linear model
lm(DCNSUB~DCSSUB, data = .)
)
-output
> out
[[1]]
Call:
lm(formula = DCNSUB ~ DCSSUB, data = .)
Coefficients:
(Intercept) DCSSUB
-1.475 1.231
[[2]]
Call:
lm(formula = DCNSUB ~ DCSSUB, data = .)
Coefficients:
(Intercept) DCSSUB
0.5418 0.9812
[[3]]
Call:
lm(formula = DCNSUB ~ DCSSUB, data = .)
Coefficients:
(Intercept) DCSSUB
0.09282 1.00866
[[4]]
Call:
lm(formula = DCNSUB ~ DCSSUB, data = .)
Coefficients:
(Intercept) DCSSUB
0.7099 0.9064
[[5]]
Call:
lm(formula = DCNSUB ~ DCSSUB, data = .)
Coefficients:
(Intercept) DCSSUB
0.8000 0.8535
From the list output, either use tidy (from broom) to convert to a tibble output or extract the components separately by looping (It can be done in the same map looped earlier though)
map_dfr(out, ~ {
v1 <- summary(.x)
tibble(pval = v1$coefficients[,4][2], MSE = v1$sigma^2)})
# A tibble: 5 × 2
pval MSE
<dbl> <dbl>
1 7.23e-17 0.0773
2 5.81e-13 0.267
3 3.49e-12 0.120
4 3.77e-10 0.238
5 2.10e-15 0.156

R, extrapolate average scores from graph

I have a graph like this:
With data that created it like this:
test<-structure(list(study_id = c(1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 13,
13, 13, 13, 13, 34, 34, 34, 34, 34, 40, 40, 40, 40, 40, 44, 44,
44, 44, 44, 47, 47, 47, 47, 47, 49, 49, 49, 49, 49, 51, 51, 51,
51, 51, 61, 61, 61, 61, 61, 66, 66, 66, 66, 66, 67, 67, 67, 67,
67, 68, 68, 68, 68, 68, 72, 72, 72, 72, 72, 75, 75, 75, 75, 75,
80, 80, 80, 80, 80, 84, 84, 84, 84, 84, 86, 86, 86, 86, 86, 94,
94, 94, 94, 94, 95, 95, 95, 95, 95, 101, 101, 101, 101, 101,
105, 105, 105, 105, 105, 111, 111, 111, 111, 111, 117, 117, 117,
117, 117, 123, 123, 123, 123, 123, 124, 124, 124, 124, 124, 125,
125, 125, 125, 125, 126, 126, 126, 126, 126, 131, 131, 131, 131,
131, 145, 145, 145, 145, 145, 153, 153, 153, 153, 153, 154, 154,
154, 154, 154, 155, 155, 155, 155, 155, 156, 156, 156, 156, 156,
161, 161, 161, 161, 161, 162, 162, 162, 162, 162, 166, 166, 166,
166, 166, 167, 167, 167, 167, 167, 169, 169, 169, 169, 169, 172,
172, 172, 172, 172, 175, 175, 175, 175, 175, 179, 179, 179, 179,
179, 180, 180, 180, 180, 180, 184, 184, 184, 184, 184, 185, 185,
185, 185, 185, 188, 188, 188, 188, 188, 190, 190, 190, 190, 190,
192, 192, 192, 192, 192, 194, 194, 194, 194, 194, 195, 195, 195,
195, 195, 197, 197, 197, 197, 197, 199, 199, 199, 199, 199, 203,
203, 203, 203, 203, 207, 207, 207, 207, 207, 210, 210, 210, 210,
210, 211, 211, 211, 211, 211, 212, 212, 212, 212, 212, 217, 217,
217, 217, 217, 221, 221, 221, 221, 221, 223, 223, 223, 223, 223,
227, 227, 227, 227, 227, 228, 228, 228, 228, 228, 229, 229, 229,
229, 229, 239, 239, 239, 239, 239, 244, 244, 244, 244, 244, 253,
253, 253, 253, 253, 256, 256, 256, 256, 256, 257, 257, 257, 257,
257, 259, 259, 259, 259, 259, 266, 266, 266, 266, 266, 272, 272,
272, 272, 272, 275, 275, 275, 275, 275, 277, 277, 277, 277, 277,
278, 278, 278, 278, 278, 284, 284, 284, 284, 284, 288, 288, 288,
288, 288, 290, 290, 290, 290, 290, 291, 291, 291, 291, 291, 292,
292, 292, 292, 292, 294, 294, 294, 294, 294, 295, 295, 295, 295,
295, 296, 296, 296, 296, 296, 299, 299, 299, 299, 299, 300, 300,
300, 300, 300, 301, 301, 301, 301, 301, 303, 303, 303, 303, 303,
305, 305, 305, 305, 305, 306, 306, 306, 306, 306, 307, 307, 307,
307, 307, 309, 309, 309, 309, 309, 313, 313, 313, 313, 313, 315,
315, 315, 315, 315, 316, 316, 316, 316, 316, 320, 320, 320, 320,
320, 324, 324, 324, 324, 324, 331, 331, 331, 331, 331, 336, 336,
336, 336, 336, 337, 337, 337, 337, 337, 348, 348, 348, 348, 348,
349, 349, 349, 349, 349, 352, 352, 352, 352, 352, 353, 353, 353,
353, 353, 367, 367, 367, 367, 367, 373, 373, 373, 373, 373, 382,
382, 382, 382, 382, 387, 387, 387, 387, 387, 388, 388, 388, 388,
388, 389, 389, 389, 389, 389, 392, 392, 392, 392, 392, 398, 398,
398, 398, 398, 401, 401, 401, 401, 401, 402, 402, 402, 402, 402,
404, 404, 404, 404, 404, 405, 405, 405, 405, 405, 410, 410, 410,
410, 410, 411, 411, 411, 411, 411, 412, 412, 412, 412, 412, 413,
413, 413, 413, 413, 414, 414, 414, 414, 414, 415, 415, 415, 415,
415, 420, 420, 420, 420, 420, 428, 428, 428, 428, 428, 431, 431,
431, 431, 431, 433, 433, 433, 433, 433, 434, 434, 434, 434, 434,
436, 436, 436, 436, 436), Time = structure(c(1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L), .Label = c("1", "2", "3", "4", "5"), class = "factor"),
Score = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, 0, 0,
NA, NA, NA, NA, NA, NA, NA, NA, 4, 7, NA, NA, NA, NA, NA,
NA, NA, NA, 4, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0, NA,
NA, NA, NA, 0, 0, 7, 8, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 5, 7, NA, NA, NA, 0, NA, NA,
NA, NA, 0, 5, 8, NA, NA, 7, 8, NA, NA, NA, 0, 0, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0, NA, NA, NA, NA, 4, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 2, 8, 8, NA, NA, 3, NA, NA, NA, NA, 1, NA, NA, NA, NA,
0, 9, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA, NA, NA, NA,
2, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, 1, 5, 5, NA,
NA, NA, NA, NA, 3, 4, 4, NA, NA, 0, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA,
NA, NA, NA, 0, 0, 0, 1, 1, 9, 9, NA, NA, NA, NA, NA, NA,
NA, NA, 0, 2, 5, 5, NA, NA, NA, NA, NA, NA, 0, 0, 0, 0, 0,
0, NA, NA, NA, NA, NA, NA, NA, NA, NA, 6, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA,
0, NA, NA, NA, NA, 7, NA, NA, NA, NA, 5, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 7, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 0, 4, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 1, 1, 1, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 8, 8, NA, NA, NA, 0,
NA, NA, NA, NA, 0, NA, NA, NA, NA, 0, 3, NA, NA, NA, 6, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 5, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 7, NA,
NA, NA, NA, 0, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 3, 8, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, 5, 5,
5, NA, NA, 0, NA, NA, NA, NA, 2, 7, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 0, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, 3, NA, NA,
NA, 0, NA, NA, NA, NA, 7, 7, 8, NA, NA, 0, NA, 0, NA, NA,
2, 4, 4, NA, NA), TimeBetweenScans = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
316, NA, NA, NA, NA, 113, 139, NA, NA, NA, NA, NA, NA, NA,
NA, 335, 660, NA, NA, NA, NA, NA, NA, NA, NA, 104, NA, NA,
NA, NA, 7, NA, NA, NA, NA, 42, NA, NA, NA, NA, 30, 84, 467,
826, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 643, 1794, NA, NA, NA, 404, NA, NA, NA, NA, 40,
221, 394, NA, NA, 171, 320, NA, NA, NA, 51, 227, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 449, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 56, NA, NA, NA, NA, 104, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 79, 989, 1097, NA, NA, 116, NA, NA, NA, NA, 65,
NA, NA, NA, NA, 39, 411, NA, NA, NA, 1193, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 142, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 106, 216, 266, 497, 575, NA, NA, NA, NA, NA, 221, 474,
796, NA, NA, 18, NA, NA, NA, NA, 87, 1565, NA, NA, NA, NA,
NA, NA, NA, NA, 36, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 207,
529, NA, NA, NA, NA, NA, NA, NA, NA, 125, NA, NA, NA, NA,
137, 372, 941, 1102, 1225, 927, 1006, NA, NA, NA, NA, NA,
NA, NA, NA, 63, 429, 533, 567, NA, NA, NA, NA, NA, NA, 156,
447, 470, 1204, 1266, 32, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 411, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 201, NA, NA, NA, NA, 160, NA, NA, NA, NA, 166, NA,
NA, NA, NA, 459, NA, NA, NA, NA, NA, NA, NA, NA, NA, 212,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 50,
313, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 312, 530, 783, 1574, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 1627, 1706, NA, NA, NA, 354,
NA, NA, NA, NA, 33, NA, NA, NA, NA, 62, 130, NA, NA, NA,
1416, NA, NA, NA, NA, 121, NA, NA, NA, NA, 842, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 24, 64, 82, 122, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 250, NA, NA, NA, NA, 174, 300, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 216, 264, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 17,
NA, NA, NA, NA, 214, 268, 388, NA, NA, 24, NA, NA, NA, NA,
149, 382, NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA,
NA, 91, 188, NA, NA, NA, NA, NA, NA, NA, NA, 72, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 9, 38, NA,
NA, NA, NA, NA, NA, NA, NA, 13, 138, NA, NA, NA, 42, NA,
NA, NA, NA, 771, 1200, 1512, NA, NA, 113, 166, 180, NA, NA,
122, 475, 640, NA, NA), Groups = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Two",
NA, NA, NA, NA, "Zero", "Zero", NA, NA, NA, NA, NA, NA, NA,
NA, "Two", "Two", NA, NA, NA, NA, NA, NA, NA, NA, "Two",
NA, NA, NA, NA, "Zero", NA, NA, NA, NA, "Zero", NA, NA, NA,
NA, "Two", "Two", "Two", "Two", NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, "Two", "Two", NA, NA,
NA, "One", NA, NA, NA, NA, "Two", "Two", "Two", NA, NA, "Two",
"Two", NA, NA, NA, "Zero", "Zero", NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Two", NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Zero",
NA, NA, NA, NA, "Two", NA, NA, NA, NA, NA, NA, NA, NA, NA,
"Two", "Two", "Two", NA, NA, "Two", NA, NA, NA, NA, "Two",
NA, NA, NA, NA, "Two", "Two", NA, NA, NA, "One", NA, NA,
NA, NA, NA, NA, NA, NA, NA, "Two", NA, NA, NA, NA, NA, NA,
NA, NA, NA, "Two", "Two", "Two", "Two", "Two", NA, NA, NA,
NA, NA, "Two", "Two", "Two", NA, NA, "Zero", NA, NA, NA,
NA, "Zero", "Zero", NA, NA, NA, NA, NA, NA, NA, NA, "Zero",
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, "Zero", "Zero", NA, NA,
NA, NA, NA, NA, NA, NA, "Zero", NA, NA, NA, NA, "One", "One",
"One", "One", "One", "Two", "Two", NA, NA, NA, NA, NA, NA,
NA, NA, "Two", "Two", "Two", "Two", NA, NA, NA, NA, NA, NA,
"One", "One", "One", "One", "One", "Zero", NA, NA, NA, NA,
NA, NA, NA, NA, NA, "Two", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, "Zero", NA, NA, NA, NA, "Zero", NA,
NA, NA, NA, "Two", NA, NA, NA, NA, "Two", NA, NA, NA, NA,
NA, NA, NA, NA, NA, "Two", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, "Two", "Two", NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "One",
"One", "One", "One", NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, "Two", "Two", NA, NA, NA, "One", NA, NA, NA, NA,
"Zero", NA, NA, NA, NA, "Two", "Two", NA, NA, NA, "Two",
NA, NA, NA, NA, "Zero", NA, NA, NA, NA, "Two", NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, "Two", NA, NA, NA, NA, "Zero", "One", NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, "Two", "Two", NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, "Zero", NA, NA, NA, NA, "Two", "Two", "Two", NA, NA,
"Zero", NA, NA, NA, NA, "Two", "Two", NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, "Zero", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, "Zero", "Zero", NA, NA, NA, NA, NA, NA, NA,
NA, "Zero", "Two", NA, NA, NA, "Zero", NA, NA, NA, NA, "Two",
"Two", "Two", NA, NA, "One", "One", "One", NA, NA, "Two",
"Two", "Two", NA, NA)), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -630L), spec = structure(list(
cols = list(study_id = structure(list(), class = c("collector_double",
"collector")), Time = structure(list(), class = c("collector_double",
"collector")), Score = structure(list(), class = c("collector_double",
"collector")), TimeBetweenScans = structure(list(), class = c("collector_double",
"collector")), Groups = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
And Code that created the graph like this: I grouped the study id's so that a dotted line was drawn connecting all the scores from each individual patient. So each line is one person.
test%>%ggplot(aes(x=TimeBetweenScans,y=Score, group=study_id, color=Time, shape=Groups))+geom_point(size=3)+geom_line(color="Black", linetype="dotted")+labs(title = "Oulu Score vs Time",y="Oulu Score",x="Time from Post-Op Scan to Follow Up Scan", color="Follow-up Scan")
I was asked to get the "average" score at different timeframes. I.e. the average score at 1 year followup (TimeBetweenScans = "365"), 2 years, 3 years, and 4 years.
So for instance, eyeballing it, you'd take all the dotted lines that cross this red line I drew at the 1 year mark, figure out where they were in the Y axis when they crossed that line, and average their "score".
If I had rows that contained '365' in the "TimeBetweenScans" column, I'd write something like:
test%>%filter(TimeBetweenScans=="365")%>%summarise(MeanScore=mean(Score))
That code would select only the data right at the year mark and average the y axis score for me. But since 365 isn't actually ever in a row, and it only exists when those dotted lines cross it, I need to extrapolate what it WOULD be for that person at '365'.
Does that make sense?
If so, how can I do it?
Here is an idea.
I filtered the nearest days around the desired time (year_in_days) for each study_id. Then I calculated a regression line between these points and predicted the Score for the year_in_days. In a last step I calculated the mean over all predictions.
You might get a lot of warnings while filtering because a lot study_id groups won't have any value - just NA.
Code
# Time you are looking for
year_in_days = 100
test %>%
group_by(study_id) %>%
group_modify(~{
.x %>%
# filter inside each group the nearest time to year_in_days (lower and upper)
filter((TimeBetweenScans %in% min(TimeBetweenScans[TimeBetweenScans > year_in_days], na.rm = T)) |
(TimeBetweenScans %in% max(TimeBetweenScans[TimeBetweenScans < year_in_days], na.rm = T))) %>%
# filter groups with two meassurments and values for Score
filter(n() == 2 &
!is.na(Score))
}) %>%
ungroup() %>%
group_by(study_id) %>%
group_modify(~{
# for each group predict the value at year "year_in_days"
broom::tidy(predict(lm(Score ~ TimeBetweenScans, .x), data.frame(TimeBetweenScans = c(year_in_days))))
}) %>%
ungroup() %>%
# calculate mean score over all predictions
summarise(mean(x))
Output
# A tibble: 1 x 1
`mean(x)`
<dbl>
1 1.14

Create data frame from matrix elements

I Have a matrix 'Feats' which has 3 columns of data with feature titles:
Feats <- structure(list(6.25, 3.125, 21.875, NA, NA, NA, 0.0625, 2L, 1L,
7L, 22L, NA, -2.22250786972976, 1.29105036381309, 0.291962041644914,
-0.236742861516547, NA, NA, 206.568058210094, 26.5635498091798,
236.313419096143, NA, -177.80062957838, 206.568058210094,
6.6734180947409, -1.72176626557489, NA, 0.03, 0.1225, 0.37,
NA, 0.0281666666666667, 0.0338, 0.338, 0.338, 0.124, 19,
7.8291135544129, 6.09286912790999, 1.84893765231614, 0.567403653479842,
NA, NA, structure(2L, .Label = c("", "Resting"), class = "factor"),
12.5, 9.375, 25, NA, NA, NA, 0.13125, 4L, 3L, 8L, 17L, NA,
-4.61233109314598, 2.80969059774635, 0.310781140139641, 2.01618235362392,
NA, NA, 247.38710328687, 30.960434438506, 270.000001621512,
NA, -184.493243725839, 149.850165213139, 6.21562280279281,
18.9758339164604, NA, 0.06, 0.16390625, 0.34, NA, 0.0316923076923077,
0.0312857142857143, 0.412, 0.438, 0.062, 24, 10.757380314083,
4.99655985128538, 1.32689481272002, 0.497119334728677, NA,
NA, structure(2L, .Label = c("", "Resting"), class = "factor"),
28.125, 12.5, 18.75, NA, NA, NA, 0.1375, 9L, 4L, 6L, 13L,
NA, -8.28559386168768, 4.93024930500612, 0.000801170003772261,
3.3100091226664, NA, NA, 285.259587496729, 26.5665579000233,
246.377341685035, NA, -147.299446430003, 197.209972200245,
0.021364533433927, 40.7385738174326, NA, 0.06, 0.248125,
0.62, NA, 0.0587272727272727, 0.033375, 0.646, 0.534, 0.124,
18, 13.5914203301306, 4.06478678366543, 1.41204036306107,
0.497119356632411, NA, NA, structure(2L, .Label = c("", "Resting"
), class = "factor")), .Dim = c(44L, 3L), .Dimnames = list(
c("movPercentLeft", "movPercentRight", "movPercentForward",
"movPercentUTurn", "movPercentNonMov", "changeRateMovNonMov",
"changeRateBetweenAnyMov", "headingAccumLeft", "headingAccumRight",
"accumForward", "accumNonMoving", "accumUTurn", "rateOfChangeLeft",
"rateOfChangeRight", "rateOfChangeForward", "rateOfChangeNonMoving",
"rateOfChangeUTurn", "maxHeadingChangeLeft", "maxHeadingChangeRight",
"maxHeadingChangeForward", "maxHeadingChangeNonMoving", "maxHeadingChangeUTurn",
"meanHeadingChangePerLeft", "meanHeadingChangePerRight",
"meanHeadingChangePerForward", "meanHeadingChangePerNonMoving",
"meanHeadingChangePerUTurn", "minSpeed", "meanSpeed", "maxSpeed",
"minAccel", "meanAccelPos", "meanAccelNeg", "accumAccelPos",
"accumAccelNeg", "maxAccel", "changesPosNeg", "accumDistanceMov",
"accumDistanceNonMov", "maxDistanceMov", "maxDistanceNonMov",
"accumTimeMov", "accumTimeNonMov", "Class"), c("1", "2",
"3")))
I would like to create a single data frame 'Features' such that:
Features <- data.frame(Feats[ ,1], Feats [ ,2], Feats[ ,3])
This example has 3 feats columns but 'Features' could have many eventually. I am thinking something along the lines of a for loop to achieve this but is there something more elegant?

R ggplot2 grid labeling

I've been asked to make a bar plot from pollution data. Example data can be found here. Data structure is as follows
str(datos) 'data.frame': 55 obs. of 10 variables:
$ PROVINCIA : int 46 46 46 46 46 46 46 46 46 46 ...
$ ESTACION : Factor w/ 55 levels "Alacant-El_Pla",..: 5 1 2 3 8 23 24 21 31 22 ...
$ MAXIMO_HORARIO : num 99.5 88.5 88.5 90 97.5 87.3 96 92.5 88 20 ...
$ PROMEDIO_DIARIO : num NA NA NA NA NA NA NA NA NA NA ...
$ MAXIMO_OCTOHORARIO : num 103.9 83.1 80.9 75.7 95.1 ...
$ VARIACION_MAX_HOR : num -25.2 -6.5 -6.7 -1.2 -13.2 -15.4 -12.7
-29.5 -16.3 NA ...
$ VARIACION_PRM_DIA : num NA NA NA NA NA NA NA NA NA NA ...
$ OSCILACION_DIARIO : num 16.5 63.7 53.3 62 26.8 31.3 29.2 15 52 20 ...
$ ESTACIONALIDAD_MAX : num -38.2 -39.6 -36.8 -38.8 -37.6 -51.8 -35.6 -40.3 -42.9 -86.5 ...
$ ESTACIONALIDAD_MAX-1: num NA NA NA NA NA NA NA NA NA NA ...
I've tried to use ggplot2 geom_bar geometry and facetting with the following code
datos=read.csv("data.csv",header=T,sep=",", na.strings="-99.9")
ggplot(datos, aes(ESTACION,MAXIMO_HORARIO, fill = factor(MAXIMO_HORARIO))) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle=90, size=10)) +
facet_grid(PROVINCIA ~ .)
obtaining this output
This is on the right way but I would like that every facet (group) shows its own values and not empty space that correspond to data in another facet, and also with the right labels in each grid. I can split data into three parts and produce three different plots but I'd like to build just a single file with the three plots in it.
Desired output would look like
EDIT: Output of dput(datos)
**>
dput(datos)
structure(list(PROVINCIA = c(46L, 46L, 46L, 46L, 46L, 46L, 46L,
46L, 46L, 46L, 46L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), ESTACION = structure(c(5L, 1L, 2L, 3L,
8L, 23L, 24L, 21L, 31L, 22L, 41L, 27L, 12L, 13L, 14L, 15L, 16L,
18L, 28L, 29L, 19L, 37L, 39L, 26L, 49L, 52L, 53L, 54L, 55L, 4L,
7L, 6L, 9L, 10L, 11L, 17L, 20L, 33L, 25L, 30L, 32L, 36L, 35L,
34L, 38L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 50L, 51L, 40L), .Label = c("Alacant-El_Pla",
"Alacant-Florida_Babel", "Alacant-Rabassa", "Albalat_dels_Tarongers",
"Alcoi-Verge_dels_Lliris", "Algar_de_Pal", "Alzira", "Benidorm",
"Benig", "Bull", "Burjassot-Facultats", "Burriana", "Castell1",
"Castell2", "Castell3", "Castell4", "Caudete_de_las_Fuentes",
"Cirat", "Coratxar", "Cortes_de_Pall", "Elda-Lacy", "El_Pin",
"Elx-Agroalimentari", "Elx-Parc_de_Bombers", "Gandia", "La_Vall_d",
"Lluce", "Morella", "Onda", "Ontinyent", "Orihuela", "Paterna-CEAM",
"Quart_de_Poblet", "Sagunt-CEA", "Sagunt-Nord", "Sagunt-Port",
"Sant_Jordi", "Torrebaja", "Torre_Endom", "Torrent-El_Vedat",
"Torrevieja", "Val1", "Val2", "Val3", "Val4", "Val5", "Val6",
"Val7", "Vilafranca", "Vilamarxant", "Villar_del_Arzobispo",
"Vinaros", "VinarosP", "Viver", "Zorita"), class = "factor"),
MAXIMO_HORARIO = c(99.5, 88.5, 88.5, 90, 97.5, 87.3, 96,
92.5, 88, 20, 20, 81.5, 99, 91.7, 93.5, 81.5, 90.5, 84.5,
100.3, 96.3, 41.7, 91.5, 57.3, NA, 93, 111.5, 86.8, NA, 100.3,
21.9, 80.5, 111, 98.7, 87.3, 89.7, 87.5, 41.7, 81.7, NA,
20, 84.8, 92, 88.7, NA, 74, NA, 95, 20.5, 85.7, 80, 82.3,
76, 20, 90.8, NA), PROMEDIO_DIARIO = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 21.9, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA), MAXIMO_OCTOHORARIO = c(103.9, 83.1,
80.9, 75.7, 95.1, 82.9, 90.2, 83.5, 85, NA, NA, 77.1, 76.7,
91.4, 73.1, 65.1, 96.6, 81.1, 110.5, 91.1, NA, 87.8, 54.8,
NA, 95.1, 116.8, 79.9, NA, 107.2, 73.9, 70.5, 102.8, 100.5,
77.5, 80.9, 86.9, NA, 70.5, NA, NA, 73.5, 86.9, 86, NA, 83.5,
NA, 84.5, 20.5, 90.8, 71.5, 67.5, 64.5, NA, 91.4, NA), VARIACION_MAX_HOR = c(-25.2,
-6.5, -6.7, -1.2, -13.2, -15.4, -12.7, -29.5, -16.3, NA,
NA, -32.5, -11.5, -22.3, -19.5, -22.3, -25.3, -24.7, -14.7,
-18, NA, -12.8, -36, NA, -27.3, -11.4, -15.7, NA, -21.4,
-103.6, -26, -24.5, -33.1, -30, -31, -17.8, NA, -15.1, NA,
NA, -23.5, -32.5, -16.1, NA, -32.3, NA, -28.2, 0.3, -30.5,
-17.3, -18.4, -19.7, NA, -31.2, NA), VARIACION_PRM_DIA = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), OSCILACION_DIARIO = c(16.5,
63.7, 53.3, 62, 26.8, 31.3, 29.2, 15, 52, 20, 20, 51.8, 85.7,
27.5, 80, 74.8, 45, 48.3, 12.5, 21.6, 41.7, 41.8, 35.3, NA,
26.5, 27.1, 64.2, NA, 58.6, 3.9, 39.2, 39.3, 32.9, 22.6,
43.4, 17.3, 41.7, 46.9, NA, 20, 50.8, 58.2, 64.5, NA, 2.7,
NA, 40.2, 1.5, 25.9, 30.5, 58.6, 31, 20, 15.8, NA), ESTACIONALIDAD_MAX = c(-38.2,
-39.6, -36.8, -38.8, -37.6, -51.8, -35.6, -40.3, -42.9, -86.5,
-83.6, -50.6, -35, -46.8, -45, -57.1, -31.4, -49.7, -35.5,
-45.7, -75.2, -44.1, -62.6, NA, -48.4, -10.8, -39.3, NA,
-38.1, -86.4, -53.7, -16.5, -42.3, -42.2, -38.1, -48.7, -68.2,
-45.4, NA, -87.6, -43.8, -44.2, -43.1, NA, -55.5, NA, -33.1,
-86.1, -38.3, -44.4, -41.6, -38.2, -85.5, -50.1, NA), ESTACIONALIDAD_MAX.1 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, -71.11,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("PROVINCIA",
"ESTACION", "MAXIMO_HORARIO", "PROMEDIO_DIARIO", "MAXIMO_OCTOHORARIO",
"VARIACION_MAX_HOR", "VARIACION_PRM_DIA", "OSCILACION_DIARIO",
"ESTACIONALIDAD_MAX", "ESTACIONALIDAD_MAX.1"), class = "data.frame", row.names = c(NA,
-55L))
**
Sounds like you want facet_wrap rather than facet_grid. Try
ggplot(datos, aes(ESTACION,MAXIMO_HORARIO, fill = factor(MAXIMO_HORARIO))) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle=90, size=10)) +
facet_wrap(~PROVINCIA , scales="free", ncol=1)
to get
facet_grid() is not designed for what you want. Making the three plots separately is the right approach. But with the gridExtra package it is easy to combine these plot elements (the gridExtra package calls them "grobs") into a single plot or single file.
require(ggplot2)
require(gridExtra)
#toy data
dat <- data.frame(x=1:20, y=sample(1:20, size=20, replace=T), group=sample(1:3, size=20, replace=T))
#making each "grob"
p1 <- ggplot(subset(dat, group==1), aes(factor(x), y)) +
geom_bar(stat='identity')
p2 <- ggplot(subset(dat, group==2), aes(factor(x), y)) +
geom_bar(stat='identity')
p3 <- ggplot(subset(dat, group==3), aes(factor(x), y)) +
geom_bar(stat='identity')
#combine them into a single stack of plots
pAll <- grid.arrange(p1, p2, p3, ncol=1)
pAll
Note for this approach to work, your x-variable in the parent data.frame will have to be a string or a numeric, not a factor. (For numerics, you have to make it a factor after subsetting: that's the only way ggplot2 will know that you don't want to show the gaps where each subset has no data. For strings, this won't be a problem and the x-axis doesn't need to be a factor at any point.)

Resources