Here is an example of my dataset:
df <- data.frame(
id = c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65),
collection_point = c(rep(c("Baseline", "Immediate", "3M"), each=28)),
intervention = c(rep(c("B", "A", "C", "B", "C", "A", "A", "B", "A", "C", "B", "C",
"A", "A", "B", "A", "C", "B", "C", "A", "A"), each = 4)),
scale_A = c(6.5, 7.0, 6.25, 6.0, NA, 7.5, 7.5,
8.0, 7.5, 6.75, 7.5, 6.75, 6.75, 6.5,
5.75, 6.75, 7.75, 7.5, 7.75, 7.25, 7.75,
7.25, 7.25, 5.75, 6.75, NA, 6.75, 7.5,
6.75, 7.0, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.25, 7.25, 7.25, 7.5, 6.5, 6.25,
6.25, 7.25, 7.5, 6.75, 7.25, 7.25, 7.5,
7.25, 7.5, 7.25, NA, 7.0, 7.5, 7.5,
6.75, 7.25, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.5, 7.5, 7.5, 7.5, 6.5, 5.75,
6.25, 6.75, 7.5, 7.25, 7.25, 7.5, 7.75,
7.75, 7.75, 7.5, NA, NA, NA, NA))
where,
id = participant
collection_point = times data was collected from participant (repeated measure)
intervention = group each participant was randomized to (fixed effect)
scale_A = questionnaire score that each participant completed at each data collection point (outcome)
Participants were randomized to one of three interventions and completed the same scales (scales A-C) at three different time points to determine any improvements over time.
I have used the code
mixed.lmer.A1<-lmer(scale_A~intervention*collection_point+(collection_point|id), control =
lmerControl(check.nobs.vs.nRE = "ignore"), data = df)
I can use plot_model(mixed.lmer.A1) and use the function terms to select only the interaction effects (ex: terms = c("interventionB:collection_point3M") to create a forest plot. However, I think it would look much neater to only have the interventions on the y axis and have multiple bands that represent each collection_point. Desired output like this:
Any idea how I can do this? Thanks!
Here is one solution:
library(ggplot2)
library(lme4)
#> Loading required package: Matrix
library(sjPlot)
#> Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!
df <- data.frame(
id = c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65),
collection_point = c(rep(c("Baseline", "Immediate", "3M"), each=28)),
intervention = c(rep(c("B", "A", "C", "B", "C", "A", "A", "B", "A", "C", "B", "C",
"A", "A", "B", "A", "C", "B", "C", "A", "A"), each = 4)),
scale_A = c(6.5, 7.0, 6.25, 6.0, NA, 7.5, 7.5,
8.0, 7.5, 6.75, 7.5, 6.75, 6.75, 6.5,
5.75, 6.75, 7.75, 7.5, 7.75, 7.25, 7.75,
7.25, 7.25, 5.75, 6.75, NA, 6.75, 7.5,
6.75, 7.0, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.25, 7.25, 7.25, 7.5, 6.5, 6.25,
6.25, 7.25, 7.5, 6.75, 7.25, 7.25, 7.5,
7.25, 7.5, 7.25, NA, 7.0, 7.5, 7.5,
6.75, 7.25, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.5, 7.5, 7.5, 7.5, 6.5, 5.75,
6.25, 6.75, 7.5, 7.25, 7.25, 7.5, 7.75,
7.75, 7.75, 7.5, NA, NA, NA, NA))
mixed.lmer.A1 <- lmer(scale_A~intervention*collection_point+(collection_point|id), control =
lmerControl(check.nobs.vs.nRE = "ignore"), data = df)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> unable to evaluate scaled gradient
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge: degenerate Hessian with 1 negative eigenvalues
plot_model(mixed.lmer.A1, type = "int") +
coord_flip()
Created on 2021-12-13 by the reprex package (v2.0.1)
Related
I am trying to use the function "Summaryplot" from the Openair Package in R. But everytime I tried to use it with the next data matrix, you only have to use the next code to extract the info:
structure(list(Fecha = structure(c(1577840400, 1577844000, 1577847600,
1577851200, 1577854800, 1577858400, 1577862000, 1577865600, 1577869200,
1577872800, 1577876400, 1577880000, 1577883600, 1577887200, 1577890800,
1577894400, 1577898000, 1577901600, 1577905200, 1577908800, 1577912400,
1577916000, 1577919600, 1577923200, 1577926800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), PM10_CDAR = c(11.4, 8.3, 13.3, 16,
39.5, 35.4, 31, 48.7, 41, 34, 23.3, 16.5, 21.8, 15.7, 17.8, 12.7,
12.8, 16, 11.3, 7.9, 8.1, 10, 10.4, 7.7, 6.1), PM10_KEN = c(49.7,
72.4, 34.5, 50.3, 65.2, 59, 25.5, 19.6, 17.4, 14.3, 48.2, 34.8,
25.3, 56.7, 26, 45.6, 29, 30.5, 24.1, 22, 26.9, 22.2, 17.3, 19.1,
15.5), PM10_LAF = c(28.8, 69, 72.3, 35.1, 82, 44, 69, 73, 46,
43, 29.9, 25.1, 21.4, 15.8, 11.7, 16, 15, 12, 9, 10.8, 10.1,
11.9, 12.9, 12.4, 11.8), PM10_TUN = c(45, 57, 93, 69, 73, 60,
45, 69, 61, 46, 28, 20, 33, 54, 44, 27, 39, 37, 36, 41, 30, 29,
18, 4, 7), PM2.5_CDAR = c(9, 8, 10, 16, 34, 30, 33, 42, 33, 34,
6, 10, 9, 9, 15, 10, 9, 7, 9, 5, 5, 10, 6, 4, 2), PM2.5_KEN = c(49,
81, 110, 83, 63, 59, 79, 68, 84, 76, 48, 19, 22, 34, 36, 33,
29, 19, 13, 22, 3, 16, 16, 6, 9), PM2.5_LAF = c(35, 65, 53, 30,
60, 62, 64, 67, 36, 43, 21, 16, 11, 11, 10, 15, 15, 12, 9, 6,
6, 10, 10, 9, 10), PM2.5_TUN = c(39, 42, 66, 54, 52, 39, 33,
40, 42, 33, 21, 11, 13, 27, 22, 17, 21, 15, 17, 15, 13, 10, 6,
4, 2)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-25L))
the next error appears:
> summaryPlot(date.zoo_2, pollutant = "Kennedy_PM10")
Error in seq.int(0, to0 - from, by) : 'to' must be a finite number
In addition: Warning messages:
1: In min.default(numeric(0), na.rm = TRUE) :
no non-missing arguments to min; returning Inf
2: In max.default(numeric(0), na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
I tried everything, to change the date column into date as. idx <- as.POSIXct(datos_meterologicos$Fecha); datos_meterologicos$Fecha <- read.zoo(datos_meterologicos, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M", tz="UTC"). And frankly, I don´t know what to do because the same error is still appearing.
The whole code is next
date.matrix_2 <- as.data.frame(datos_meterologicos[,-1])
idx_2 <- as.POSIXct(datos_meterologicos$Fecha)
date.xts_2 <- as.xts(date.matrix_2,order.by=idx_2)
date.zoo_2 <- as.zoo(date.xts_2)
Here is an example of my dataset:
df <- data.frame(
id = c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65),
collection_point = c(rep(c("Baseline", "Immediate", "3M"), each=28)),
intervention = c(rep(c("B", "A", "C", "B", "C", "A", "A", "B", "A", "C", "B", "C",
"A", "A", "B", "A", "C", "B", "C", "A", "A"), each = 4)),
scale_A = c(6.5, 7.0, 6.25, 6.0, NA, 7.5, 7.5,
8.0, 7.5, 6.75, 7.5, 6.75, 6.75, 6.5,
5.75, 6.75, 7.75, 7.5, 7.75, 7.25, 7.75,
7.25, 7.25, 5.75, 6.75, NA, 6.75, 7.5,
6.75, 7.0, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.25, 7.25, 7.25, 7.5, 6.5, 6.25,
6.25, 7.25, 7.5, 6.75, 7.25, 7.25, 7.5,
7.25, 7.5, 7.25, NA, 7.0, 7.5, 7.5,
6.75, 7.25, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.5, 7.5, 7.5, 7.5, 6.5, 5.75,
6.25, 6.75, 7.5, 7.25, 7.25, 7.5, 7.75,
7.75, 7.75, 7.5, NA, NA, NA, NA))
scale_B = c(5.0, 6.5, 6.25, 7.0, NA, 5.5, 6.5,
6.0, 7.5, 5.75, 6.5, 5.75, 7.75, 6.5,
6.75, 7.75, 7.75, 7.5, 7.75, 5.25, 7.75,
6.25, 6.25, 6.75, 5.75, NA, 6.75, 6.5,
7.75, 6.0, 7.5, 6.0, 7.5, 7.5, 6.5,
6.75, 6.25, 6.25, 6.25, 6.5, 6.5, 7.25,
7.25, 6.25, 6.5, 7.75, 6.25, 7.25, 6.5,
6.25, 6.5, 6.25, NA, 7.0, 6.5, 7.5,
7.75, 6.25, 7.5, 6.0, 7.5, 6.5, 6.5,
6.75, 6.5, 6.5, 6.5, 7.5, 7.5, 6.75,
7.25, 7.75, 6.5, 6.25, 7.25, 6.5, 6.75,
6.75, 6.75, 6.5, 5.5, NA, NA, 6.5))
scale_C = c(5.5, 5.0, 7.25, 7.0, 8.0, 5.5, 5.5,
8.0, 5.5, 7.75, 5.5, 7.75, 7.75, 7.5,
7.75, 7.75, 5.75, 5.5, 5.75, 5.25, 5.75,
5.25, 6.25, 7.75, 7.75, NA, 7.75, 5.5,
6.75, 6.0, 7.5, 5.0, 5.5, 5.5, 7.5,
5.75, 6.25, 5.25, 5.25, 5.5, 7.5, 7.25,
7.25, 6.25, 5.5, 7.75, 5.25, 5.25, 7.5,
5.25, 6.5, 5.25, 5.0, 5.0, 5.5, 5.5,
7.75, 6.25, 7.5, 5.0, 5.5, 5.5, 7.5,
5.75, 6.5, 5.5, 5.5, 5.5, 7.5, 7.75,
7.25, 7.75, 5.5, 5.25, 5.25, 5.5, 6.75,
5.75, 5.75, 5.5, 6.75, NA, 5.75, NA))
where,
id = participant
collection_point = times data was collected from participant (repeated measure)
intervention = group each participant was randomized to (fixed effect)
scale_A = questionnaire score that each participant completed at each data collection point (outcome)
Participants were randomized to one of three interventions and completed the same scales (scales A-C) at three different time points to determine any improvements over time.
I have used the code
mixed.lmer.A1<-lmer(scale_A~intervention+(collection_point|id), control =
lmerControl(check.nobs.vs.nRE = "ignore"), data = df)
but I would like to run MANOVA as all scales measure different aspects of a cohesive theme. However, I can't run
mixed.lmer.comb<-lmer(cbind(scale_A, scale_B, scale_C)~intervention+
(collection_point|id), control = lmerControl(check.nobs.vs.nRE = "ignore"),
data = df)
like I originally thought. It does work if I run using lm but that wouldn't be super meaningful as I need to account for collection_point as a repeated measure.
Is there a way I can run multiple dependent variables using lmer?
You can do this by converting the data to long format: there are a lot of ways to do this, e.g. reshape in base R or reshape2::melt, but I find tidyr::pivot_longer the easiest:
df_long <- tidyr::pivot_longer(df, cols = starts_with("scale_"),
names_to = "scales",
values_to = "value")
The fixed effects are 0 + scales + scales:intervention: we don't want an overall intercept, we want a scale-specific intercept, plus intervention effects for each scale.
The random effects are collection_point|scales/id: this allows the effect of collection point to vary across scales and across id (as in the original model).
mm <- lmer(value ~ 0 + scales + scales:intervention + (collection_point|scales/id),
data = df_long,
control = lmerControl(check.nobs.vs.nRE = "ignore"))
This model is technically correct, but gives a singular fit (as is not surprising since we are trying to estimate a variance across only three levels of scales); see ?isSingular, or the GLMM FAQ, for advice about to handle this.
This is not the only model we could set up; a maximal model would include more terms.
Some further comments:
One principle is that, since the elements of multivariate data generally have different units, we should not have any terms in the model that apply across scales (such as an overall intercept, or an overall effect of intervention); this might not apply in your particular case, I don't know
it is unusual (and often, although not always, wrong) to have a term varying in a random effect (collection_point in this case) that does not have a corresponding fixed effect — by doing so you are assuming that the average (population-level) effect of collection point is exactly zero, which is surprising unless (1) there's something special about the experimental design (e.g. observations were somehow randomized across collection points), (2) you have pre-processed the data in some way (e.g. explicitly transformed the data to have zero variance across collection points), (3) are setting up a null model for comparison.
I'm a little concerned about your need to override the check that you have fewer random effects estimated than observations; I haven't looked into this in detail, but that usually means your model is overfitting in some way. (Maybe it's because we're looking at a subset of the data, and this doesn't come up in the full data set?)
More here.
I have a data frame like below,
dat<- structure(list(Octagonplot = c(11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34, 41, 42, 43, 44, 51, 52, 53, 54, 61, 62, 63, 64, 71, 72, 73, 74, 81, 82, 83, 84, 91, 92, 93, 94, 101, 102, 103, 104, 111, 112, 113, 114, 121, 122, 123, 124, 11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34, 41, 42, 43, 44, 51, 52, 53, 54, 61, 62, 63, 64, 71, 72, 73, 74, 81, 82, 83, 84, 91, 92, 93, 94, 101, 102, 103, 104, 111, 112, 113, 114, 121, 122, 123, 124, 11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34, 41, 42, 43, 44, 51, 52, 53, 54, 61, 62, 63, 64, 71, 72, 73, 74, 81, 82, 83, 84, 91, 92, 93, 94, 101, 102, 103, 104, 111, 112, 113, 114, 121, 122, 123, 124), d13C = c(-27.822, -27.93, -27.927, -27.764, -28.081, -28.091, -28.553, -28.633, -27.996, -27.972, -27.664, -28.037, -28.211, -28.348, -28.5, -27.875, -28.331, -28.873, -28.609, -28.262, -27.569, -27.583, -27.305, -27.494, -27.484, -27.585, -27.368, -27.313, -27.894, -28.405, -27.296, -27.67, -27.175, -27.431, -27.382, -27.479, -28.059, -28.329, -28.285, -27.976, -27.564, -27.387, -27.958, -27.638, -28.087, -28.208, -28.513, -28.002, -27.977, -27.952, -27.647, -27.882, -29.181, -28.635, -29.131, -28.931, -28.42, -28.413, -27.993, -28.503, -29.54, -29.009, -29.197, -29.609, -29.346, -29.969, -29.798, -29.037, -27.854, -27.923, -27.976, -27.712, -27.769, -27.827, -27.735, -27.82, -28.345, -29.476, -28.387, -28.019, -27.307, -27.567, -27.429, -27.771, -28.044, -28.683, -28.786, -28.664, -27.653, -28.064, -28.036, -27.757, -28.323, -29.195, -28.828, -28.937, -27.9078297690006, -27.9386973277244, -27.7756066004902, -27.8411714524657, -28.7963592918008, -28.4522330354614, -28.8597856593141, -28.8113167816976, -28.1924531764532, NA, -27.824879800081, -28.2347160358722, -28.7498706023163, -28.7313297359698, -28.7365680482049, -28.7272468735994, -28.803582483867, -29.4363702389094, NA, -28.7054768643306, -27.7481689930581, -27.7535107262537, -27.6555760218728, NA, -27.6272638860929, -27.7069166950818, -27.5782961448598, -27.5234468773432, -28.1124586856048, -29.0179480728715, -27.8824806843723, -27.8693344400536, -27.2278831040908, -27.5051927317272, -27.4090338924322,-27.6583036975383,-28.0521215748621, -28.5076100126232, -28.5288005348874, -28.3745035897644, -27.6136332691194, NA, -28.0041637896659, -27.6963035708696, -28.1989533738283, -28.8230228029304, -28.7207578899079, -28.4489097046946), midpoint = c(2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7,8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7,8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7,8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2)), row.names = c(NA, -144L), class = c("tbl_df", "tbl","data.frame"))
I want to find the linear relationship within each Octagonplot and then calculate the d13C values when the midpoint value==5 by using their linear relationship. And create a new data frame for the output data consistent with the Octagonplot number. Here is my code for Octagonplot '124', but I want to how to loop all of them from 11,12,13,14,21,22,....123,124.
target1 <- c("124")
C124<- filter(dat, Octagonplot%in% target1)
mod124<- lm(C124$d13C~C124$midpoint)
summary(mod124)
a <- coef(summary(mod124))[2]
b <- coef(summary(mod124))[1]
y124<- a*5+b
y124
Hope someone could help.
This might be what you're looking for. You'll lose d13C and midpoint...5 which were in your example output, but it's unclear to me how you're selecting those. Also, midpoint...5 never equals 5, so I assumed that was a filter you already applied and used all midpoint values.
dat <- structure(list(Octagonplot = c(11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34, 41, 42, 43, 44, 51, 52, 53, 54, 61, 62, 63, 64, 71, 72, 73, 74, 81, 82, 83, 84, 91, 92, 93, 94, 101, 102, 103, 104, 111, 112, 113, 114, 121, 122, 123, 124, 11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34, 41, 42, 43, 44, 51, 52, 53, 54, 61, 62, 63, 64, 71, 72, 73, 74, 81, 82, 83, 84, 91, 92, 93, 94, 101, 102, 103, 104, 111, 112, 113, 114, 121, 122, 123, 124, 11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34, 41, 42, 43, 44, 51, 52, 53, 54, 61, 62, 63, 64, 71, 72, 73, 74, 81, 82, 83, 84, 91, 92, 93, 94, 101, 102, 103, 104, 111, 112, 113, 114, 121, 122, 123, 124), d13C = c(-27.822, -27.93, -27.927, -27.764, -28.081, -28.091, -28.553, -28.633, -27.996, -27.972, -27.664, -28.037, -28.211, -28.348, -28.5, -27.875, -28.331, -28.873, -28.609, -28.262, -27.569, -27.583, -27.305, -27.494, -27.484, -27.585, -27.368, -27.313, -27.894, -28.405, -27.296, -27.67, -27.175, -27.431, -27.382, -27.479, -28.059, -28.329, -28.285, -27.976, -27.564, -27.387, -27.958, -27.638, -28.087, -28.208, -28.513, -28.002, -27.977, -27.952, -27.647, -27.882, -29.181, -28.635, -29.131, -28.931, -28.42, -28.413, -27.993, -28.503, -29.54, -29.009, -29.197, -29.609, -29.346, -29.969, -29.798, -29.037, -27.854, -27.923, -27.976, -27.712, -27.769, -27.827, -27.735, -27.82, -28.345, -29.476, -28.387, -28.019, -27.307, -27.567, -27.429, -27.771, -28.044, -28.683, -28.786, -28.664, -27.653, -28.064, -28.036, -27.757, -28.323, -29.195, -28.828, -28.937, -27.9078297690006, -27.9386973277244, -27.7756066004902, -27.8411714524657, -28.7963592918008, -28.4522330354614, -28.8597856593141, -28.8113167816976, -28.1924531764532, NA, -27.824879800081, -28.2347160358722, -28.7498706023163, -28.7313297359698, -28.7365680482049, -28.7272468735994, -28.803582483867, -29.4363702389094, NA, -28.7054768643306, -27.7481689930581, -27.7535107262537, -27.6555760218728, NA, -27.6272638860929, -27.7069166950818, -27.5782961448598, -27.5234468773432, -28.1124586856048, -29.0179480728715, -27.8824806843723, -27.8693344400536, -27.2278831040908, -27.5051927317272, -27.4090338924322,-27.6583036975383,-28.0521215748621, -28.5076100126232, -28.5288005348874, -28.3745035897644, -27.6136332691194, NA, -28.0041637896659, -27.6963035708696, -28.1989533738283, -28.8230228029304, -28.7207578899079, -28.4489097046946), midpoint...5 = c(2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7,8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7,8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7,8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 8.7, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2, 6.2)), row.names = c(NA, -144L), class = c("tbl_df", "tbl","data.frame"))
get_y <- function(df) {
mod<- lm(df$d13C ~ df$midpoint)
a <- coef(summary(mod))[2]
b <- coef(summary(mod))[1]
y <- a * 5 + b
out <- data.frame(Octagonplot = unique(df$Octagonplot), y = y)
out
}
result <- lapply(unique(dat$Octagonplot), function(Oct)
get_y(dat[dat$Octagonplot == Oct, ]))
result <- do.call(rbind.data.frame, result)
head(result)
# Octagonplot y
# 1 11 -27.88239
# 2 12 -27.93747
# 3 13 -27.81907
# 4 14 -27.81372
# 5 21 -28.54322
# 6 22 -28.32195
I am calling NbClust() on my df containing four columns of numerical, scaled, "NON-NA" data. My code looks as follows
nc = NbClust(scale(df),distance="euclidean", min.nc=2, max.nc=7,method="complete")
In the description of ?NbClust() it is said that it computes 30 different indices for the data but in my case it only computes 26 and leaves out the following four:
Gap-Index
Gamma-Index
Gplus-Index
Tau-Index
Does anyone know why it does not compute them? I am happy for any advice, hint!
Data looks like this
df = structure(list(Birthrate = c(18.2, 8.5, 54.1, 1.4, 2.1, 83.6,
17, 1, 0.8, 61.7, 4.9, 7.9, 2, 14.2, 48.2, 17.1, 10.4, 37.5,
1.6, 49.5, 10.8, 6.2, 7.1, 7.8, 3, 3.7, 4.2, 8.7), GDP = c(1.22,
0.06, 0, 0.54, 2.34, 0.74, 1.03, 1.21, 0, 0.2, 1.41, 0.79, 2.75,
0.03, 11.13, 0.05, 2.99, 0.71, 0, 0.9, 1.15, 0, 1.15, 1.44, 0,
0.71, 1.21, 1.45), Income = c(11.56, 146.75, 167.23, 7, 7, 7,
10.07, 7, 7, 7, 47.43, 20.42, 7.52, 7, 7, 15.98, 15.15, 20.42,
7, 22.6, 7, 7, 18.55, 7, 7.7, 7, 7, 7), Population = c(54, 94,
37, 95, 98, 31, 78, 97, 95, 74, 74, 81, 95, 16, 44, 63, 95, 20,
95, 83, 98, 98, 84, 62, 98, 98, 97, 98)), .Names = c("Birthrate",
"GDP", "Income", "Population"), class = "data.frame", row.names = c(NA,
-28L))
You want:
nc = NbClust(scale(df),distance="euclidean", min.nc=2,max.nc=7,
method="complete", index="alllong")
The Problem
I'm trying to find a solution to overcome a deficient experimental design in establishing sampling points. The aim is to subset the original dataset, forcing sampling points stratification based on 2 factors with several levels.
I need a general formulation of the problem that may allow me to redefine the set of criteria levels.
Note
I've found examples of subseting tables based on criteria, the most relevant is a post from Brian Diggs but I cannot find a general way to apply that solution to my particular case.
The Dataset
My data.frame have 3 columns, sample id and two factors (f1 and f2).
Criteria are based on interval of values for f1 and f2.
dat <- structure(list(id = 1:203, f1 = c(22, 20.8, 20.7, 22, 12.1, 8,
20.6, 22, 22, 21.6, 0, 22, 21.4, 15.9, 21.2, 19.1, 12.5, 16.6,
14, 21.2, 14.7, 20.7, 20.5, 5.4, 19.1, 18.9, 22, 22, 22, 0, 0,
22, 1.3, 1, 0, 9.4, 7.9, 14.5, 0, 1.5, 0, 20.3, 18, 17.3, 1,
22, 0, 15, 17.9, 4.3, 19.5, 21.2, 21.2, 14.6, 2.3, 0, 6.7, 17.9,
9.5, 19, 21.6, 16.6, 11.7, 13.7, 1.5, 1, 7.6, 3.7, 18.5, 13.5,
20.9, 18.2, 11.5, 7.3, 6.5, 21.1, 22, 20.5, 20.5, 20, 16.2, 18.6,
22, 15.1, 14.4, 10.8, 17.1, 5.7, 15.1, 12.8, 14.5, 8.8, 16.8,
18.7, 1, 6.3, 1.8, 14.6, 22, 16.2, 12.9, 9.1, 2, 7.6, 7, 11.7,
1, 1, 9.6, 11, 2, 2, 14, 14.9, 7.8, 11.4, 8.3, 7.6, 9.1, 4.5,
18, 11.4, 3.1, 4.3, 9.3, 8.1, 1.4, 5.2, 14.7, 3.6, 5, 2.7, 10.3,
11.3, 17.9, 5.2, 1, 1.5, 13.2, 0, 1, 7.4, 1.7, 11.5, 20.2, 0,
14.7, 17, 15.2, 22, 22, 22, 17.2, 15.3, 10.9, 18.7, 11.2, 18.5,
20.3, 21, 20.8, 15, 21, 16.9, 18.5, 18.5, 10.3, 12.6, 15, 19.8,
21, 17.2, 16.3, 18.3, 10.3, 17.8, 11.2, 1.5, 1, 0, 1, 14, 19.1,
6.1, 19.2, 17.1, 14.5, 18.4, 22, 20.3, 6, 13, 18.3, 8.5, 15.3,
10.6, 7.2, 6.2, 1, 7.9, 2, 20, 16.3), f2 = c(100, 100, 92.9,
38.5, 100, 90.9, 100, 100, 100, 91.7, 0, 100, 71.4, 100, 100,
53.8, 28.6, 91.7, 100, 100, 64.3, 100, 92.9, 78.6, 100, 100,
27.3, 83.3, 14.3, 0, 0, 9.1, 23.1, 12.5, 0, 100, 81.8, 100, 0,
15.4, 0, 83.3, 100, 75, 7.1, 81.8, 0, 21.4, 84.6, 25, 80, 90.9,
100, 71.4, 50, 0, 46.2, 90.9, 14.3, 66.7, 90.9, 84.6, 46.2, 91.7,
33.3, 7.7, 71.4, 27.3, 46.2, 100, 100, 100, 60, 54.5, 46.2, 53.8,
91.7, 100, 100, 66.7, 45.5, 57.1, 15.4, 75, 75, 76.9, 53.8, 25,
90.9, 84.6, 91.7, 90.9, 100, 54.5, 23.1, 63.6, 30.8, 90.9, 92.9,
100, 92.3, 90.9, 12.5, 38.5, 15.4, 84.6, 27.3, 7.1, 75, 21.4,
7.7, 15.4, 84.6, 100, 69.2, 63.6, 64.3, 53.8, 92.3, 33.3, 11.1,
61.5, 66.7, 23.1, 85.7, 81.8, 41.7, 69.2, 76.9, 38.5, 9.1, 23.1,
85.7, 90, 100, 100, 14.3, 36.4, 84.6, 0, 7.7, 61.5, 25, 50, 100,
0, 63.6, 36.4, 76.9, 100, 100, 100, 100, 90.9, 100, 100, 100,
100, 100, 83.3, 100, 100, 100, 100, 50, 54.5, 71.4, 100, 85.7,
100, 75, 100, 76.9, 83.3, 100, 92.3, 33.3, 76.9, 33.3, 0, 40,
91.7, 100, 53.8, 100, 100, 100, 100, 100, 92.3, 76.9, 23.1, 84.6,
33.3, 100, 92.3, 46.2, 100, 9.1, 53.8, 7.7, 20, 42.9)), .Names = c("id",
"f1", "f2"), class = "data.frame", row.names = c(NA, -203L))
The expected output
Sampling points should ideally be grouped following a crossed design (it is not a complete factorial design).
For Factor f1: 0, 1-15, 30-60, 80-95, 100
For Factor f2: 0, 5-10, 15-20
I need to find points given all combinations of f1 and f2 intervals, something like this fashion:
gr <- expand.grid(f1=c('0', '1-15', '30-60', '80-95', '100'),
f2=c('0', '5-10', '15-20'))
> gr
f1 f2
1 0 0
2 1-15 0
3 30-60 0
4 80-95 0
5 100 0
6 0 5-10
7 1-15 5-10
8 30-60 5-10
9 80-95 5-10
10 100 5-10
11 0 15-20
12 1-15 15-20
13 30-60 15-20
14 80-95 15-20
15 100 15-20
The solution should split dat based on lines of gr.
This is not a complete factorial design since not all combinations will fulfill this particular criteria combination but it is important to identify NA's as well.
Any help will be appreciated. Please let me know if I'm providing sufficient information.
Use cut, to split f1 and f2 into factors based on your breakpoints, paste the factor together, and then split based on the combined factor.
dat$f1.group<-cut(dat$f1,c(0,1,15,30,60,80,90,95,100))
dat$f2.group<-cut(dat$f1,c(0,5,10,15,20))
gr<-expand.grid(levels(dat$f1.group),levels(dat$f2.group))
names(gr)<-c('f1.group','f2.group')
gr$combined = paste(gr$f1.group,gr$f2.group)
dat<-merge(gr,dat)[c('id','f1','f2','combined')]
split(dat,dat$combined)
That will get you a list of data.frame, with one element for each combo defined in gr. You can them easily sample by these strata.