Looping/sapply through nlme function - r

I am trying to execute a loop with mixed-model effects with response variable changing. I came from here and here. I should say that I have tried sthg creating a function and then sapply or lapply (wihtout success)
I provide a small dataset (really small) just to represent my original database (much larger and similar to those of longitudinal studies)
data<- structure(list(paciente = structure(c(6134, 6099, 6457, 6164,
6470, 6323, 6550, 6082, 6476, 6044, 6509, 6539, 6234, 6555, 6383,
6127, 6507, 6513, 6486, 6080, 6101, 6007, 6023, 6516, 6001, 6198,
6510, 6530, 6351, 6181), label = "Paciente", format.spss = "F6.0"),
edad_s1 = structure(c(70, 63, 61, 71, 67, 59, 63, 69, 67,
67, 67, 72, 65, 72, 63, 65, 60, 64, 56, 63, 57, 62, 72, 60,
72, 63, 72, 68, 66, 71), label = "Edad", format.spss = "F3.0"),
sexo_s1 = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 2L), .Label = c("Hombre", "Mujer"), label = "Sexo", class = "factor"),
time = c(2, 1, 2, 1, 0, 0, 1, 0, 2, 1, 1, 0, 1, 2, 1, 2,
1, 2, 0, 1, 1, 0, 2, 1, 0, 2, 1, 2, 2, 0), grupo_int_v00 = structure(c(1L,
1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), .Label = c("A",
"B"), label = "Grupo de intervención", class = "factor"),
peso1 = c(108, 80.4, 95, 75, 92.6, 90, 82.2, 94.4, 78, 71.3,
75.1, 83.5, 87.1, 63, 73, 98.5, 90.2, 81.3, 93.4, 79.8, 114.3,
110.9, 81.5, 88.5, 82.4, 88.3, 90, 73, 79, 94.8), cintura1 = c(127,
100.5, 103.5, 108, 115, 114.5, 95.5, 115, 101, 98, 99, 108.5,
105, 99, 104, 126, 114.2, 99, 110, 104.5, 120, 126, 111.5,
102, 117, 110, 125, 100, 104, 123), tasis2_e = c(156, 129,
131, 138, 167, 138, 115, 146, 119, 148, 130, 144, 115, 122,
135, 139, 128, 119, 138, 115, 138, 151, 151, NA, 137, 147,
124, 168, 152, 156), tadias2_e = c(70, 63, 80, 67, 76, 81,
57, 68, 69, 69, 68, 78, 61, 71, 54, 77, 63, 63, 92, 73, 80,
88, 84, NA, 79, 76, 62, 90, 87, 89), p17_total = c(10, 10,
5, 9, 9, 7, 15, 11, 6, 12, 11, 4, 9, 14, 9, 9, 11, 14, 6,
5, 10, 10, 9, 13, 12, 7, 11, 12, 7, 4), geaf_tot = c(1986.01,
1286.71, 1230.77, 1510.49, 839.16, 2144.52, 5361.31, 1678.32,
4055.94, 2601.4, 3363.64, 3076.92, 5342.66, 2769.23, 2601.4,
1693.24, 4055.94, 3146.85, 3916.08, 6405.59, 2442.89, 671.33,
867.13, 1585.08, 3153.85, 3188.81, 7986.01, 839.16, 7552.45,
2937.06), glucosa = c(127, 97, 95, 102, 119, 113, 109, 105,
93, 167, 85, 108, 122, 112, 113, 120, 100, 108, 100, 86,
129, 136, 98, 97, 130, 125, 109, 102, NA, 181), albumi = c(4.47,
4.82, 4.78, 4.22, 4.59, 4.5, 4.33, 4.87, 4.83, 4.98, 4.23,
4.77, 4.76, 4.98, 4.18, 4.51, 4.72, 4.87, 4.77, 4.61, 4.55,
4.77, 4.6, 4.59, 4.25, 4.71, 4.47, 4.54, NA, 4.63), coltot = c(157,
191, 276, 248, 248, 217, 187, 301, 173, 230, 258, 238, 231,
181, 183, 243, 223, 195, 237, 245, 164, 145, 199, 234, 178,
192, 201, 198, NA, 159), hdl = c(39, 50, 57, 59, 49, 44,
60, 98, 52, 73, 58, 44, 58, 60, 48, 46, 73, 58, 39, 47, 38,
45, 59, 56, 72, 34, 78, 62, NA, 54), ldl_calc = c(91, 124,
204, 133, 155, 140, 105, 162, 91, 141, 182, 173, 155, 107,
83, 150, 132, 124, NA, 167, 101, 88, 121, 160, 84, 130, 112,
120, NA, NA), trigli = c(137, 87, 74, 282, 219, 165, 112,
203, 149, 78, 89, 105, 91, 71, 259, 236, 92, 63, 447, 157,
123, 58, 94, 90, 112, 139, 53, 80, NA, 429), hba1c = c(6.57,
5.82, 5.68, 5.96, 6.11, 5.73, 5.48, 5.8, 5.6, 7.8, 5.21,
5.73, 6.1, 5.86, 6.37, 6.27, 5.22, 5.59, 5.47, 5.95, 6.96,
NA, 5.47, 4.99, NA, 6.25, 5.79, 5.79, NA, 6.54), i_hucpeptide = c(NA,
NA, 466.64, 838.61, 847.89, 1481.03, 819.65, NA, 1298.6,
NA, 564.59, 544.2, 755.73, 1057.83, 957.43, NA, 957.33, 1002.34,
1104, NA, NA, NA, NA, 594.6, NA, 815.82, 922.08, 628.54,
NA, 1591.01), i_hughrelin = c(NA, NA, 410.97, 553.65, 453,
352.44, 527.01, NA, 328.27, NA, 1668.41, 460.06, 1072.27,
260.24, 749.03, NA, 1327.91, 363.79, 524.53, NA, NA, NA,
NA, 1051.1, NA, 143.32, 1076.49, 1565.85, NA, 607.31), i_hugip = c(NA,
NA, 2.67, 2.67, 2.67, 2.67, 2.67, NA, 2.67, NA, 2.67, 2.67,
690.74, 1165.16, 2.67, NA, 2.67, 2.67, 2.67, NA, NA, NA,
NA, 2.67, NA, 2.67, 2.67, 2.67, NA, 2.67), i_huglp1 = c(NA,
NA, 127.66, 284.34, 200.13, 59.3, 234.84, NA, 503.42, NA,
103.9, 14.14, 71.6, 56.41, 75.13, NA, 161.36, 124.19, 220.52,
NA, NA, NA, NA, 14.14, NA, 112.57, 100.52, 237.55, NA, 470.91
), i_huglucagon = c(NA, NA, 333.79, 649.94, 726.99, 395.38,
610.5, NA, 434.42, NA, 502.4, 127.62, 268.23, 10.48, 428.15,
NA, 716.02, 238.95, 320.32, NA, NA, NA, NA, 10.48, NA, 238,
487.42, 297.6, NA, 495.16), i_huinsulin = c(NA, NA, 129.24,
270.98, 299.75, 730.82, 267.54, NA, 616.91, NA, 121.26, 85.34,
224.96, 247.48, 220.75, NA, 181.85, 341.25, 551.46, NA, NA,
NA, NA, 133.42, NA, 263.87, 279.45, 94.78, NA, 573.14), i_huleptin = c(NA,
NA, 3992.49, 17806.43, 8409.76, 11511.43, 2965.92, NA, 3223.08,
NA, 9018.79, 1039.45, 2613.33, 2128.98, 7307.89, NA, 13492.13,
2883.77, 4775.98, NA, NA, NA, NA, 2602.96, NA, 2829.59, 8511.92,
3528.77, NA, 11487.15), i_hupai1 = c(NA, NA, 997.29, 2499.25,
3085.25, 1909.44, 1730.55, NA, 3333.37, NA, 1424.3, 1857.71,
2578.46, 2268.52, 2222.97, NA, 2722.92, 1300.69, 2732.11,
NA, NA, NA, NA, 1204.36, NA, 2483.08, 2289.67, 1791.79, NA,
6595.54), i_huresistin = c(NA, NA, 3044.48, 5774.77, 3221.72,
4925.57, 5170.95, NA, 3683.64, NA, 4041.32, 6771.31, 5119.11,
9521.7, 3328.41, NA, 5061.65, 3773.39, 3039.39, NA, NA, NA,
NA, 4405.17, NA, 2577.84, 3433.82, 6802.94, NA, 6461.67),
i_huvisfatin = c(NA, NA, 302.3, 2083.46, 2989.72, 1118.7,
8.64, NA, 96.03, NA, 2209.51, 8.64, 1944.37, 1415.55, 678.33,
NA, 4349.56, 8.64, 410.1, NA, NA, NA, NA, 117, NA, 8.64,
2308.8, 228.53, NA, 1766.64), col_rema = c(27, 17, 15, 56,
44, 33, 22, 41, 30, 16, 18, 21, 18, 14, 52, 47, 18, 13, NA,
31, 25, 12, 19, 18, 22, 28, 11, 16, NA, NA), homa = c(NA,
NA, 5.053, 11.374, 14.679, 33.985, 12.001, NA, 23.61, NA,
4.242, 3.793, 11.294, 11.406, 10.265, NA, 7.484, 15.167,
22.694, NA, NA, NA, NA, 5.326, NA, 13.574, 12.535, 3.978,
NA, 42.691), i_pcr = c(NA, NA, 0.41, 0.82, NA, 2.08, 0.08,
NA, 0.1, NA, 0.38, 0.05, 0.04, 0.35, 0.2, NA, 0.98, 0.02,
NA, NA, NA, NA, NA, 0.2, NA, 0.1, 0.16, 0.16, NA, 2.93)), row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))
Afterwards I am defining my iteration and my variables database
ex<- subset(data[, 6:30])
for (i in 1:length(ex)) {
var_1 <- ex[,i]
var_1 <- unlist(var_1)
lme_1 <- lme(var_1 ~ sexo_s1*peso1 + edad_s1 + p17_total + poly(time, 2)*grupo_int_v00,
random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
data = dat_longer, subset = !is.na(var_1))
Error in model.frame.default(formula = ~time + var_1 + sexo_s1 + peso1 + :
invalid type (list) for variable 'var_1'
I have tried unlisting/as.data.frame in before running the loop
for (i in 1:length(data)) {
var_1 <- data[,i]
var_1 <- unlist(var_1) #or as.data.frame(var_1)
lme_1 <- lme(var_1 ~ sexo_s1*peso1 + edad_s1 + p17_total + poly(time, 2)*grupo_int_v00,
random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
data = dat_longer, subset = !is.na(var_1))
}
Error in model.frame.default(formula = ~time + var_1 + sexo_s1 + peso1 + :
variable lengths differ (found for 'var_1')
I have also tried to develop a new function to iterate over
lme_z <- function(z){
out <- lme(z ~ sexo_s1*peso1 + edad_s1 + p17_total + poly(time, 2)*grupo_int_v00,
random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
data = dat_longer, subset = !is.na(z))
}
Error
If there is some contribution to iterate in the response variable (I know Ben Bolker is an expert)
Thanks in advance

If data is a data frame containing all of the variables that you use in your formula, including all of the responses that you want to consider, then you can do:
f <- function(resp) {
fixed <- . ~ sexo_s1 * peso1 + edad_s1 + p17_total + poly(time, 2) * grupo_int_v00
fixed[[2L]] <- as.name(resp)
lme(fixed = fixed,
random = ~poly(time, 2) | paciente,
data = data,
subset = !is.na(data[[resp]]),
control = lmeControl(opt = "optim"))
}
list_of_lme_objects <- lapply(names_of_response_variables, f)
An important piece is:
fixed <- . ~ sexo_s1 * peso1 + edad_s1 + p17_total + poly(time, 2) * grupo_int_v00
fixed[[2L]] <- as.name(resp)
The second statement injects the response named resp into the left hand side of the formula template. A more transparent example:
fixed <- . ~ world
fixed[[2L]] <- as.name("hello")
fixed
## hello ~ world
Another important piece is:
subset = !is.na(data[[resp]])
Here, the right hand side actually evaluates to a logical vector of length equal to the number of rows of data. You might consider passing na.action = na.omit instead of subset, though that will also omit rows where the independent variables have missing values, so the semantics are slightly different.
The variable grupo_int_v00 is missing from your data frame. You'll have to fix that on your end in order to test the code...

I was going to suggest:
formvars <- c("sexo_s1*peso1",
"edad_s1",
"p17_total",
"poly(time, 2)")
## excluded *grupo_int_v00 since not in example data frame
respvars <- names(df)[7:30]
result <- list()
for (r in respvars) {
result[[r]] <- lme(reformulate(formvars, response = r),
random = ~ poly(time, 2)|paciente,
control=lmeControl(opt="optim"),
data = df,
na.action = na.exclude)
}
Many of #MikaelJagan's points are well taken. In particular:
grupo_int_v00 excluded since it wasn't in your example data set
this code doesn't work for your example since there are only two complete cases (i.e., observations with no missing predictors/responses) in the data set, so we can't fit a quadratic polynomial ("degree must be less than the number of unique points")
I used na.exclude, which obviates your subset argument; it excludes NA values when fitting but will re-introduce them e.g. in calculating predictions or residuals

Related

report_text from report not working with lists in R

I want to report a couple of t-tests using report_text() from report. My tests are stored in a list. I'm able to report the tests outside the lists, but nothing that I do seems to work to unlist my tests and report them
Works just fine:
a <- t.test(data$ARG_L1, data$ARG_L2, data = data, paired = T)
report_text(a)
### output:
Effect sizes were labelled following Cohen's (1988) recommendations.
The Paired t-test testing the difference between data$ARG_L1 and data$ARG_L2 (mean of the
differences = 6.35) suggests that the effect is positive, statistically significant, and large (difference = 6.35, 95% CI [4.42, 8.27], t(44) = 6.65, p < .001; Cohen's d = 0.99, 95% CI [0.63,1.35])
Cannot report:
### set list outsite loop:
tests <- list()
### run loop:
for (zz in seq(from = 1, to = 4, by = 2)) {
PairedVar1 <- data[zz+1] #1 Variables
PairednVar1 <- names(PairedVar1)
data$PairedVar1Unlist <- unlist(PairedVar1)
PairedVar2 <- data[zz+2] #2 Variables
PairednVar2 <- names(PairedVar2)
data$PairedVar2Unlist <- unlist(PairedVar2)
### run the test:
tests[[zz]] <- t.test(data$PairedVar1Unlist, data$PairedVar2Unlist,
paired = T, data = data, exact = F)
tests[[zz]]$data.name <- str_glue("{PairednVar1} and {PairednVar2}") ### write the names
}
report_text(tests)
Error: Oops, objects of class [list] are not supported (yet) by report_text() :(
Want to help? Check out https://easystats.github.io/report/articles/new_models.html
Attempts (no one worked):
report_text(unlist(tests))
report_text(tests[[1]])
report_text(bind_rows(tests))
tests <- tests %>% discard(is.null)
report_text(tests)
Question: : How can I unlist the tests and report them with report ? I'm sure there's a way. Thanks in advance.
data:
> dput(data)
structure(list(ID = structure(c("PART_1", "PART_2", "PART_3",
"PART_4", "PART_5", "PART_6", "PART_7", "PART_8", "PART_9", "PART_10",
"PART_11", "PART_12", "PART_13", "PART_14", "PART_15", "PART_16",
"PART_17", "PART_18", "PART_19", "PART_20", "PART_21", "PART_22",
"PART_23", "PART_24", "PART_25", "PART_26", "PART_27", "PART_28",
"PART_29", "PART_30", "PART_31", "PART_32", "PART_33", "PART_34",
"PART_35", "PART_36", "PART_37", "PART_38", "PART_39", "PART_40",
"PART_41", "PART_42", "PART_43", "PART_44", "PART_45", "PART_46",
"PART_47", "PART_48", "PART_49", "PART_50", "PART_51", "PART_52",
"PART_53", "PART_54", "PART_55", "PART_56", "PART_57", "PART_58",
"PART_59", "PART_60", "PART_61", "PART_62", "PART_63", "PART_64",
"PART_65", "PART_66", "PART_67", "PART_68", "PART_69", "PART_70",
"PART_71"), class = c("glue", "character")), ARG_L1 = c(70.18,
67.65, 71.89, 70.42, NaN, 72.38, 69.67, 75.63, 76.7, 76.21, 66.5,
70.57, 76.72, 66.4, 74.75, 79.17, 70.84, NA, 67.82, 70, 71.88,
74.55, 69.33, 69.5, 65.25, 75.05, 75.44, 64.56, 74.88, 74.29,
72.4, 71.93, NA, 69.12, 71.43, 77.53, NA, 71.93, 70.4, 60.25,
NA, NA, 64.8, 69, NA, 71.19, 71.12, 75.04, 68.89, 68.26, 75.81,
NA, NA, NA, 75.89, 68.82, 77.35, 68.38, 76.71, 79.12, 78.89,
73.5, NA, 69.7, 69.82, 70.91, NaN, 72, 71.17, 71.85, 69.7), ARG_L2 = c(65.7,
65.8, 74.45, 68, NA, NA, 53.75, 73.94, 67.24, 58.22, NA, NaN,
71.07, 68.07, NaN, 69.88, 71.32, 62.18, 58.65, 76.45, 71.13,
67.25, NaN, 51.76, 69.33, 68.17, 58, 54.27, 68.05, NaN, NA, 61,
61.67, NA, 67.79, 65.93, NA, NA, 59.27, 69.67, 71.38, 70, NaN,
64.88, 68.19, 62.06, 61, 55.48, 65.67, 67.72, 68.47, 64, 65.11,
66, 67.5, 66.33, NA, 69.61, 69.33, 75.67, 68.17, 63, NA, 58.81,
NA, NA, NA, 66.5, 62.33, 65, NA), NARR_L1 = c(74.26, NA, NA,
70.94, NaN, 75, 66.14, 74.48, 77.07, 73.47, 76, 60.44, 73.92,
77.19, 71.4, 77.59, 72, NA, 70.38, 65.47, 70.54, NA, 68.09, 64.61,
66.5, 72.52, 62.59, 69.25, 71.48, 71.88, 74.4, 70.1, NA, 70,
69.6, 78.04, 62.3, 68.79, 73.44, 72.25, NA, NA, 67, 68.25, NA,
NA, 65.94, 75.71, 72.43, 69.68, 76, 68.6, 65.65, NA, 70.43, 74,
71.76, 71.17, 74.63, 74.22, NA, 69.47, NA, 68.72, 67, 62.82,
NaN, 77.33, 69.76, 75.42, 67.62), NARR_L2 = c(65.08, 61, NA,
71.18, 68.46, NA, 62.75, 66.32, 73.42, 59.83, NA, 51.8, 64.77,
67.88, NaN, 72.27, 64.25, NaN, 62.6, 54.75, 64.74, NA, NaN, 51.58,
67.05, 62.38, 64.57, NA, 65.56, NaN, NA, 70.71, NA, NA, 68.1,
NA, 58.43, NA, 55, 65.29, NA, 58.86, NaN, 64.18, NA, 70.33, 58.5,
64.84, 65.19, 63.14, 59.12, NaN, 62.75, NA, NaN, 68.82, 65.04,
66.78, 64.86, 69.06, 69.94, 59.31, 65.15, 55.83, 67.71, NA, NA,
69, 58.83, 60.65, NA), PairedVar1Unlist = c(74.26, NA, NA, 70.94,
NaN, 75, 66.14, 74.48, 77.07, 73.47, 76, 60.44, 73.92, 77.19,
71.4, 77.59, 72, NA, 70.38, 65.47, 70.54, NA, 68.09, 64.61, 66.5,
72.52, 62.59, 69.25, 71.48, 71.88, 74.4, 70.1, NA, 70, 69.6,
78.04, 62.3, 68.79, 73.44, 72.25, NA, NA, 67, 68.25, NA, NA,
65.94, 75.71, 72.43, 69.68, 76, 68.6, 65.65, NA, 70.43, 74, 71.76,
71.17, 74.63, 74.22, NA, 69.47, NA, 68.72, 67, 62.82, NaN, 77.33,
69.76, 75.42, 67.62), PairedVar2Unlist = c(65.08, 61, NA, 71.18,
68.46, NA, 62.75, 66.32, 73.42, 59.83, NA, 51.8, 64.77, 67.88,
NaN, 72.27, 64.25, NaN, 62.6, 54.75, 64.74, NA, NaN, 51.58, 67.05,
62.38, 64.57, NA, 65.56, NaN, NA, 70.71, NA, NA, 68.1, NA, 58.43,
NA, 55, 65.29, NA, 58.86, NaN, 64.18, NA, 70.33, 58.5, 64.84,
65.19, 63.14, 59.12, NaN, 62.75, NA, NaN, 68.82, 65.04, 66.78,
64.86, 69.06, 69.94, 59.31, 65.15, 55.83, 67.71, NA, NA, 69,
58.83, 60.65, NA)), row.names = c(NA, -71L), class = "data.frame")
Your for loop creates an element of tests which is NULL:
[[2]]
NULL
Converting from for loop to lapply solves the problem [I also changed the name of your data frame from data to df to avoid potential clashes with the data() function]:
tests <- lapply(
seq(from = 1, to = 4, by = 2),
function(zz) {
PairedVar1 <- df[zz+1] #1 Variables
PairednVar1 <- names(PairedVar1)
df$PairedVar1Unlist <- unlist(PairedVar1)
PairedVar2 <- df[zz+2] #2 Variables
PairednVar2 <- names(PairedVar2)
df$PairedVar2Unlist <- unlist(PairedVar2)
### run the test:
rv <- t.test(df$PairedVar1Unlist, df$PairedVar2Unlist,
paired = T, data = df, exact = F)
rv$data.name <- str_glue("{PairednVar1} and {PairednVar2}") ### write the names
rv
}
)
tests
[[1]]
Paired t-test
data: ARG_L1 and ARG_L2
t = 6.6524, df = 44, p-value = 3.703e-08
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
4.423911 8.269422
sample estimates:
mean difference
6.346667
[[2]]
Paired t-test
data: NARR_L1 and NARR_L2
t = 9.895, df = 41, p-value = 1.999e-12
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
5.732389 8.672373
sample estimates:
mean difference
7.202381
Then you should be able to lapply as I mentioned in my comment.
lapply(tests, report_text)
I don't have the report package installed, so I cannot confirm.

Remove legend in ggplot

I am working with ggeffects package
I have the following syntax
data_example <- structure(list(paciente = structure(c(6171, 6488, 6300, 6446,
6489, 6445, 6473, 6351, 6212, 6387), label = "Paciente", format.spss = "F6.0"),
edad_s1 = structure(c(69, 62, 60, 71, 67, 59, 63, 66, 67,
70), label = "Edad", format.spss = "F3.0"), sexo_s1 = structure(c(1L,
2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L), .Label = c("Hombre",
"Mujer"), label = "Sexo", class = "factor"), grupo_int_v00 = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L), .Label = c("A", "B"), label = "Grupo de intervención", class = "factor"),
time = c(0, 0, 0, 2, 2, 2, 1, 2, 1, 1), peso1 = c(89.9, 62,
91.5, 75.2, 68.2, 88.4, 93.6, 79, 88.3, 84.4), cintura1 = c(113,
90, 112, NA, 87.5, 116, 98.5, 104, 112.5, 108.5), tasis2_e = c(132,
132, 149, NA, 145, 137, 129, 152, 146, 129), tadias2_e = c(81,
58, 79, NA, 80, 60, 79, 87, 79, 68), p17_total = c(7, 9,
10, 10, 10, 10, 10, 7, 10, 11), geaf_tot = c(3412.59, 3524.48,
559.44, 5454.55, 4293.71, 839.16, 3146.85, 7552.45, 4335.66,
566.9), glucosa = c(102, 97, 89, NA, 88, 168, 104, NA, 114,
121), albumi = c(4.94, 4.68, 4.75, NA, 4.34, 5.06, 4.56,
NA, 5.06, 3.96), coltot = c(232, 253, 215, NA, 202, 287,
255, NA, 217, 147), hdl = c(59, 64, 68, NA, 71, 46, 61, NA,
40, 42), ldl_calc = c(143, 150, 127, NA, 114, NA, 170, NA,
143, 86), trigli = c(152, 195, 99, NA, 85, 378, 121, NA,
170, 93), hba1c = c(5.61, 5.66, 5.43, NA, 5.38, 8.14, 5.81,
NA, 6, 6.38), i_hucpeptide = c(988.91, 673.5, 1036.03, NA,
734.29, 1266.3, 610.9, NA, 1144.8, 672.08), i_hughrelin = c(1133.35,
1230.06, 1109.98, NA, 1064.79, 725.35, 1437.85, NA, 866.07,
822.83), i_hugip = c(2.67, 2.67, 2.67, NA, 2.67, 2.67, 2.67,
NA, 2.67, 2.67), i_huglp1 = c(145.43, 138.32, 194.14, NA,
99.37, 166.27, 218.33, NA, 184.04, 222.84), i_huglucagon = c(513.89,
357.35, 624.73, NA, 464.85, 448.49, 304.29, NA, 310.61, 426.52
), i_huinsulin = c(234.23, 229.06, 358.86, NA, 175.38, 466,
99.02, NA, 367.95, 77.33), i_huleptin = c(7898.28, 5211.27,
14670.25, NA, 7161.39, 3218.49, 2659.8, NA, 3766.01, 1207.58
), i_hupai1 = c(3468.4, 1977.9, 4101.1, NA, 1613.4, 2847.27,
2442.49, NA, 1953.26, 1752.88), i_huresistin = c(4783.28,
2676.05, 3064.57, NA, 2165.52, 3878.48, 8343.46, NA, 2822.68,
6496.73), i_huvisfatin = c(831.6, 649.45, 2270.65, NA, 1578.88,
9.63, 185.09, NA, 162.8, 8.64), col_rema = c(30, 39, 20,
NA, 17, NA, 24, NA, 34, 19), homa = c(1061.843, 987.503,
1419.491, NA, 685.931, 3479.467, 457.692, NA, 1864.28, 415.864
), i_pcr = c(0.05, NA, 0.27, NA, 0.03, 0.23, 0.04, NA, 0.09,
0.09), d_homa = c(NA, NA, NA, NA, -2.629, 33.042, -181.211,
NA, -929.683, -89.108), d_hughrelin = c(NA, NA, NA, NA, -213.59,
48.43, 95.27, NA, -228.62, -146.8), d_huinsulin = c(NA, NA,
NA, NA, 3.24, -68.79, -43.31, NA, -147.33, -7.46), d_hucpeptide = c(NA,
NA, NA, NA, 192.39, -263.54, -71.56, NA, -437.38, -215.44
), d_huglucagon = c(NA, NA, NA, NA, 38.99, -112.45, -10.75,
NA, -133.55, -259.73), d_huleptin = c(NA, NA, NA, NA, 409.76,
-1081.5, -1778.69, NA, -353.91, -679.7), d_huresistin = c(NA,
NA, NA, NA, 391.02, -155.41, -436.47, NA, -1137.79, -922.75
), d_huvisfatin = c(NA, NA, NA, NA, 457.54, -260.79, -341.02,
NA, -426.89, 0), d_glucosa = c(NA, NA, NA, NA, -2, 23, 3,
NA, -8, -13), d_coltot = c(NA, NA, NA, NA, -52, 36, -11,
NA, 15, -12), d_hdl = c(NA, NA, NA, NA, 1, 3, -1, NA, 1,
4), d_ldl_calc = c(NA, NA, NA, NA, -50, NA, -10, NA, 12,
-15), d_col_rema = c(NA, NA, NA, NA, -3, NA, 0, NA, 2, -1
), d_trigli = c(NA, NA, NA, NA, -14, 132, -1, NA, 8, -5),
d_hba1c = c(NA, NA, NA, NA, -0.11, -0.04, -0.18, NA, -1.76,
-0.67), d_tasis2_e = c(NA, NA, NA, NA, 0, 6, -1, 7, -21,
-9), d_tadias2_e = c(NA, NA, NA, NA, 0, 2, -8, 8, -10, -17
), d_peso1 = c(NA, NA, NA, -6, -2.3, 0.2, -11.4, 0.8, -4.1,
-9.3), d_cintura1 = c(NA, NA, NA, NA, -2.5, -4, -12.5, 6,
-3.5, -4.5), d_geaf_tot = c(NA, NA, NA, 699.31, 2055.95,
-2181.82, 1748.25, 3776.23, 867.13, -6593.94), d_p17_total = c(NA,
NA, NA, 1, 4, 5, 4, -5, 5, 2), d_hupai1 = c(NA, NA, NA, NA,
-185.03, 204.77, 202.01, NA, -1551.91, 57.2), d_hugip = c(NA,
NA, NA, NA, 0, 0, 0, NA, 0, 0), d_huglp1 = c(NA, NA, NA,
NA, -42.07, -163.02, 107.28, NA, -95.82, -87.5), d_pcr = c(NA,
NA, NA, NA, NA, NA, NA, NA, -0.18, -0.22), ln_trigli = c(5.024,
5.273, 4.595, NA, 4.443, 5.935, 4.796, NA, 5.136, 4.533),
ln_homa = c(6.968, 6.895, 7.258, NA, 6.531, 8.155, 6.126,
NA, 7.531, 6.03), ln_hba1c = c(1.725, 1.733, 1.692, NA, 1.683,
2.097, 1.76, NA, 1.792, 1.853), ln_geaf_tot = c(8.135, 8.167,
6.327, 8.604, 8.365, 6.732, 8.054, 8.93, 8.375, 6.34), i_ratiolg = c(6.969,
4.237, 13.217, NA, 6.726, 4.437, 1.85, NA, 4.348, 1.468)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
The mixed model I have created following the syntax
lme_peso <- lme(peso1 ~ sexo_s1 + edad_s1 + poly(time, 2)*grupo_int_v00 + p17_total,
random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
data = dat_longer, subset = !is.na(peso1), na.action = na.omit)
And then to plot it
ggpredict(lme_peso, c("time [all]", "grupo_int_v00"), type="fixed") %>%
ggplot(aes(x = x, y = predicted, colour = group)) +
geom_point() +
geom_line() +
stat_smooth(method = "loess",se = T) +
labs(x = "time (months)", y = "Weight (kg)") +
scale_color_manual(labels = c("Control", "Intervention"), values = c("orange", "green")) +
geom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill = F),alpha = 1/5) +
scale_x_continuous(breaks = 0:2, labels = c(0, 6, 12))
When I supress the arguments of fill in geom_ribbon the fill stays black. But I don't know how to manage to keep just one legend with 2 groups (Control and Intervention). I have the extra-added legend (with F in this case)
Thanks in advance
I couldn't run your code, but I rebuilt it with iris.
Like Matt suggested, one thing would be, remove fill=F:
ggplot(data=iris, aes(x = SepalLength , y = PetalLength, group=Name)) +
geom_point() +
geom_line() +
stat_smooth(method = "loess",se = T, aes(color=Name)) +
geom_ribbon(aes(ymin = 1, ymax = 3),alpha = 1/5) +
scale_x_continuous(breaks = 0:2, labels = c(0, 6, 12))
Or if you need it for some reason, use guides(fill="none"):
ggplot(data=iris, aes(x = SepalLength , y = PetalLength, group=Name)) +
geom_point() +
geom_line() +
stat_smooth(method = "loess",se = T, aes(color=Name)) +
geom_ribbon(aes(ymin = 1, ymax = 3, fill=FALSE),alpha = 1/5) +
scale_x_continuous(breaks = 0:2, labels = c(0, 6, 12)) +
guides(fill="none")
Output:

Heatmap with the data point categorized by their class label

I have a dataframe with columns for different attributes and a column for the class label. I am trying to create a Heatmap/matrix plot of all the attributes with the data points categorized by their class label.
If I turn my dataframe into a numeric matrix, I can use the heatmap function to create a heatmap:
q3 <- read.arff("diabetes.arff")
q3_m <- as.matrix(q3[,1:8])
heatmap(q3_m, Colv=NA, Rowv=NA)
However, I can't figure out how to order these by the class variable, as I had to remove it from the matrix because it isn't numeric.
If I transform the data into the long format, I can also make the following heatmap using ggplot:
q3_long <- pivot_longer(q3, preg:age, names_to = "Attribute",
values_to = "Value")
ggplot(data = q3_long, mapping = aes(x = Attribute, y=class, fill = Value)) +
geom_raster() +
xlab(label = "Attribute")
However, this averages the values of every case in a given class rather than showing every case as a separate row with its own fill.
How can I combine these approaches to get a heatmap that clusters the cases by class?
(Apologies in advance - I attempted to include images here ,but I just joined stackoverflow and therefore don't have the 10 reputation points needed to include images).
Thanks for your help.
Edit: here is a sample of the data. It is also publicly available - the diabetes.arff dataset is automatically downloaded with Weka installation (https://waikato.github.io/weka-wiki/downloading_weka/).
structure(list(preg = c(6, 1, 8, 1, 0, 5, 3, 10, 2, 8, 4, 10,
10, 1, 5, 7, 0, 7, 1, 1), plas = c(148, 85, 183, 89, 137, 116,
78, 115, 197, 125, 110, 168, 139, 189, 166, 100, 118, 107, 103,
115), pres = c(72, 66, 64, 66, 40, 74, 50, 0, 70, 96, 92, 74,
80, 60, 72, 0, 84, 74, 30, 70), skin = c(35, 29, 0, 23, 35, 0,
32, 0, 45, 0, 0, 0, 0, 23, 19, 0, 47, 0, 38, 30), insu = c(0,
0, 0, 94, 168, 0, 88, 0, 543, 0, 0, 0, 0, 846, 175, 0, 230, 0,
83, 96), mass = c(33.6, 26.6, 23.3, 28.1, 43.1, 25.6, 31, 35.3,
30.5, 0, 37.6, 38, 27.1, 30.1, 25.8, 30, 45.8, 29.6, 43.3, 34.6
), pedi = c(0.627, 0.351, 0.672, 0.167, 2.288, 0.201, 0.248,
0.134, 0.158, 0.232, 0.191, 0.537, 1.441, 0.398, 0.587, 0.484,
0.551, 0.254, 0.183, 0.529), age = c(50, 31, 32, 21, 33, 30,
26, 29, 53, 54, 30, 34, 57, 59, 51, 32, 31, 31, 33, 32), class = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L,
2L, 1L, 2L), .Label = c("tested_negative", "tested_positive"), class = "factor")), row.names = c(NA,
20L), class = "data.frame")
Maybe this is what you are looking for. To get a heatmap by cases you could add an id variable to your dataset which you could map on x and make use of faceting to cluster the cases by class:
library(tidyr)
library(ggplot2)
library(dplyr)
q3_long <- q3 %>%
mutate(id = row_number(), id = factor(id)) %>%
pivot_longer(-c(class, id), names_to = "Attribute", values_to = "Value")
ggplot(data = q3_long, mapping = aes(x = Attribute, y = id, fill = Value)) +
geom_raster() +
xlab(label = "Attribute") +
facet_wrap(~class, scales = "free_y")

Robust Independent T-test

This is my first time asking a question, so I apologize for any formatting issues or anything that makes this difficult to answer. Please let me know what I need to add to be able to the answer question.
I'm attempting to compare differences between 2 unequal group sizes (one ~ 97 the other ~ 714). The reason for the large discrepancy is I am looking at a program done by one class to see if it is significantly different than what has occurred in previous classes. I've been reading about robust stats recently and decided to use a yuen bootstrap in R-Studio from the WRS2 package for a more valid comparison, especially with the difference in sample size.
My formula is
yuenbt(DataExample$PT500 ~ DataExample3$ClassPT500, tr = 0.2, nboot = 599, side = TRUE)
and it returns
Call:
yuenbt(formula = DataExample$PT500 ~ DataExample$ClassPT500,
tr = 0.2, nboot = 599, side = TRUE)
Test statistic: NA (df = NA), p-value = 0
Trimmed mean difference: -65
95 percent confidence interval:
NA NA
The NA's return on other variables that I've tried out as well, or in some cases the confidence interval will state INF. Any ideas why this is happening (such a big difference in sample size?) and suggestions on what the next best step would be are greatly appreciated.
Here is a sample of data:
structure(list(PrePT500 = c(74, 105, 121, 128), PostPT500 = c(191,
264, 327, 314), PT500 = c(117, 159, 206, 186), PrePullups = c(0,
NA, NA, 2), PostPullups = c(3, NA, NA, 3), Pullups = c(3, NA,
NA, 1), PreSitups = c(46, 40, 25, 33), PostSitups = c(41, 61,
39, 49), Situps = c(-5, 21, 14, 16), PreMC = c(8, 16, 29, 19),
PostMC = c(41, 45, 60, 60), MC = c(33, 29, 31, 41), PrePushups = c(20,
16, 28, 30), PostPushups = c(40, 47, 50, 50), Pushups = c(20,
31, 22, 20), Pre1.5 = c(1048, 917, 902, 905), Post1.5 = c(846,
748, 696, 760), X1.5 = c(-202, -169, -206, -145), Pre220 = c(43,
50, 41, 45), Post220 = c(39, 40, 32, 34), X220 = c(-4, -10,
-9, -11), PreAgility = c(20.96, NA, 21.1, 19.88), PostAgility = c(19.69,
NA, 18.8, 20.79), Agility = c(-1.27, NA, -2.3, 0.91), PreBD = c(6.17,
7.82, 5.08, 7), PostBD = c(5, 4.87, 4.68, 6.2), BD = c(-1.17,
-2.95, -0.4, -0.8), PreCL = c(7.05, 13.6, 14.4, 8.8), PostCL = c(8.1,
8.9, 8.27, 7.6), CL = c(1.05, -4.7, -6.13, -1.2), PreSW = c(10.2,
NA, 20.34, 8), PostSW = c(11.4, NA, 9.3, 7.4), SW = c(1.2,
NA, -11.04, -0.6), Pre500 = c(115, 128, 107, 114), Post500 = c(105,
112, 93, 99), X500 = c(-10, -16, -14, -15), PreTotal = c(446,
91, 255, NA), PostTotal = c(493, 439, 503, NA), Total = c(47,
348, 248, NA), ClassPrePT500 = c(338, 213, 215, 243), ClassPostPT500 = c(430,
396, 333, 314), ClassPT500 = c(92, 183, 118, 71), ClassPrePullups = c(6,
5, 2, 0), ClassPostPullups = c(13, 7, 15, 0), ClassPullups = c(7,
2, 13, 0), ClassPreSitups = c(59, 42, 45, 53), ClassPostSitups = c(75,
70, 51, 53), ClassSitups = c(16, 28, 6, 0), ClassPreMC = c(60,
43, 31, 48), ClassPostMC = c(60, 60, 31, 60), ClassMC = c(0,
17, 0, 12), ClassPrePushups = c(50, 37, 26, 30), ClassPostPushups = c(50,
50, 47, 34), ClassPushups = c(0, 13, 21, 4), ClassPre1.5 = c(803,
810, 803, 741), ClassPost1.5 = c(700, 690, 664, 661), Class1.5 = c(-103,
-120, -139, -80), ClassPre220 = c(32, 41, 31, 40), ClassPost220 = c(31,
33, 30, 37), Class220 = c(-1, -8, -1, -3), ClassPreAgility = c(19,
23, 18, 22.1), ClassPostAgility = c(16.4, 18, 16.5, 20.3),
ClassAgility = c(-2.6, -5, -1.5, -1.8), ClassPreBD = c(6.4,
8.5, 5.8, 11.2), ClassPostBD = c(5.3, 5.8, 5.5, 7.5), ClassBD = c(-1.1,
-2.7, -0.3, -3.7), ClassPreCL = c(7.8, 9.3, 7.3, 9.6), ClassPostCL = c(7.6,
7.4, 7.4, 9.2), ClassCL = c(-0.2, -1.9, 0.100000000000001,
-0.4), ClassPreSW = c(8.5, 8.4, 7.7, NA), ClassPostSW = c(7.8,
8.1, 7.6, 8), ClassSW = c(-0.7, -0.300000000000001, -0.100000000000001,
NA), ClassPre500 = c(102, 104, 100, 108), ClassPost500 = c(94,
88, 98, 101), Class500 = c(-8, -16, -2, -7), ClassPreTotal = c(495,
418, 528, 264), ClassPostTotal = c(561, 539, 562, 482), ClassTotal = c(66,
121, 34, 218)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
Thank you in advance for any help.
The R function
yuenbt(x, y, tr=0.2, alpha=0.05, nboot=599, side=F) computes a 1 − α confidence interval for μt 1 − μt 2 using the bootstrap-t method, where the default amount of trimming (tr) is 0.2, the default value for α is 0.05, and the default value
for nboot (B) is 599. So far, simulations suggest that in terms of probability coverage, there is little or no advantage to using B > 599 when α = 0.05. However, there is no recommended choice for B when α < 0.05 simply because little is known about how the bootstrap-t performs for this special case. Finally, the default value for side is FALSE, indicating that the equal-tailed two-sided confidence interval is to be used. Using side=TRUE results in the symmetric two-sided confidence interval.
Try:
yuenbt(DataExample$PT500, DataExample3$ClassPT500, tr = 0.2, nboot = 599, side = TRUE)

Rollsum with week numbers

This question comes from a previous one I posted a while ago:
rollsum with fixed dates
I can not make the given solution to work. I have a large data set, the interesting columns are:
id = c(145658, 145658, 145658, 145658, 145658, 145658, 145658, 145658, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)
week_number = c(24, 35, 44, 71, 82, 117, 127, 142, 4, 15, 20, 24, 30, 36, 42, 46, 59, 67, 68, 71, 75, 78, 79, 86, 93, 96)
amount = c(51.9, 51.9, 51.9, 51.9, 51.9, 103.8, 51.9, 51.9, 67.9, 67.9, 67.9, 67.9, 67.9, 67.9, 67.9, 67.9, 67.9, 67.9, 101.0, 168.9, 101.0, 101.0, 135.8, 168.9, 168.9, 67.9)
df = data.frame(id = id, week_number = week_number, amount = amount)
In reality, I have thousands of id's, and each has different week number. I want to calculate the rollsum on the "amount" column for n past weeks (including the present week) for each id.
An extreme example would be with the past 100 weeks. The results would look like:
past_100wk = c(NA, NA, NA, NA, NA, 363.3, 363.3, 363.8, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
Again, this is an extreme case, but it shows the the results should give NA (or -1) when the row value is not included in the week_number window (100 weeks, in this case).
Thank you!

Resources