Subsetting from a dataframe in R - r

I have sampled 'n' rows from a dataframe called nodes:
nodes <- structure(list(node_number = 1:50,
x = c(2L, 80L, 36L, 57L, 33L, 76L, 77L, 94L,
89L, 59L, 39L, 87L, 44L, 2L, 19L, 5L,
58L, 14L, 43L, 87L, 11L, 31L, 51L, 55L,
84L, 12L, 53L, 53L, 33L, 69L, 43L, 10L,
8L, 3L, 96L, 6L, 59L, 66L, 22L, 75L, 4L,
41L, 92L, 12L, 60L, 35L, 38L, 9L, 54L, 1L),
y = c(62L, 25L, 88L, 23L, 17L, 43L, 85L, 6L, 11L,
72L, 82L, 24L, 76L, 83L, 43L, 27L, 72L, 50L,
18L, 7L, 56L, 16L, 94L, 13L, 57L, 2L, 33L, 10L,
32L, 67L, 5L, 75L, 26L, 1L, 22L, 48L, 22L, 69L,
50L, 21L, 81L, 97L, 34L, 64L, 84L, 100L, 2L, 9L, 59L, 58L),
node_demand = c(3L, 14L, 1L, 14L, 19L, 2L, 14L, 6L,
7L, 6L, 10L, 18L, 3L, 6L, 20L, 4L,
14L, 11L, 19L, 15L, 15L, 4L, 13L,
13L, 5L, 16L, 3L, 7L, 14L, 17L,
3L, 3L, 12L, 14L, 20L, 13L, 10L,
9L, 6L, 18L, 7L, 20L, 9L, 1L, 8L,
5L, 1L, 7L, 9L, 2L)),
.Names = c("node_number", "x", "y", "node_demand"),
class = "data.frame", row.names = c(NA, -50L))
To sample I use this code:
hubs <- nodes[sample(1:total_nodes, hubs_required, replace = FALSE),]
Which returns :
node_number x y node_demand
33 33 8 26 12
14 14 2 83 6
42 42 41 97 20
13 13 44 76 3
10 10 59 72 6
I would like to return all the rows that haven't been selected so that I can perform a series of calculations on them.
I thought that using something like data[-sample,] would work but I get the following error
Error in xj[i] : invalid subscript type 'list'.
Anybody know who could I get these values?

It would be easier to keep the list of indexes that selected. Somthing like
hubs <- nodes[keep <- sample(1:total_nodes, hubs_required, replace = FALSE),]
other_hubs <- nodes[-keep, ]
Otherwise, if your data has some sort of key/ID, you can do something like
other_hubs <- nodes[nodes%node_number %in% hubs$node_number, ]
or with dplyr, this can be an anti-join
nodes %>% anti_join(hubs, by="node_number")

Related

Is there a way to produce multiple x-y scatterplots at once based on grouping value, ordered by a third variable?

I have multi-level data. The group level is individual persons, which are designated by id. The variable index indicates different time points. Is there a way to make a separate scatterplot (x vs. y) for each individual, all displayed in the same output, and ordered based on a third variable (z)? If so, can color then be added to indicate degree of third variable (z)? Data below, Thanks.
> dput(dat1.1)
structure(list(id = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L), index = c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L), x = c(7.443917, 7.520429, 7.446833,
8.07893, 8.534033, 8.263931, 7.598647, 6.902987, 7.672617, 7.739256,
7.591341, 8.101125, 7.811751, 6.596834, 6.637652, 8.467165, 7.835399,
6.500149, 7.083198, 7.531798, 6.110208, 6.368534, 5.26318, 6.735778,
5.580152, 5.460161, 5.844303, 6.258181, 7.191627, 5.105033, 6.760193,
5.857215, 5.866264, 6.769086, 6.547294, 5.623804, 4.675815, 6.153901,
6.040519, 6.236045, 8.216397, 6.097841, 5.491311, 5.831432, 6.297337,
6.655688, 5.553445, 6.37449, 6.271961, 6.959645, 7.080341, 6.46092,
6.476955, 7.221111, 6.219023, NA, NA, NA, NA, NA, 8.21752, 7.589581,
8.363739, 8.849697, 7.78645, 7.494006, 7.827766, 9.11352, 7.80884,
6.701855, 6.259061, 5.523358, 6.186617, 6.548538, 6.6937, 7.213297,
5.243428, 7.510827, 7.054297, 7.603241), y = c(106L, 114L, 50L,
50L, 56L, 46L, 50L, 52L, 114L, 50L, 56L, 26L, 48L, 52L, 48L,
54L, 54L, 56L, 52L, 50L, 84L, 86L, 88L, 86L, 82L, 84L, 88L, 84L,
86L, 84L, 86L, 86L, 84L, 84L, 88L, 88L, 88L, 84L, 86L, 120L,
106L, 168L, 116L, 56L, 108L, 68L, 68L, 70L, 74L, 76L, 76L, 76L,
72L, 70L, 118L, NA, NA, NA, NA, NA, 60L, 62L, 52L, 90L, 50L,
50L, 54L, 56L, 52L, 30L, 78L, 30L, 52L, 54L, 52L, 80L, 86L, 46L,
54L, 84L), z = c(33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L,
33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 54L, 54L,
54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L,
54L, 54L, 54L, 54L, 54L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L,
56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 50L,
50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L,
50L, 50L, 50L, 50L, 50L, 50L)), class = "data.frame", row.names = c(NA,
-80L))
Does this come close to giving you what you want?
library(tidyverse)
d %>%
group_by(id) %>%
mutate(z=as.factor(z)) %>%
group_map(
function(.x, .y) {
.x %>%
ggplot() +
geom_point(aes(x=x, y=y, colour=z)) +
facet_wrap(vars(z)) +
scale_colour_manual(drop=FALSE, values=d %>% distinct(z) %>% pull(z)) +
labs(title=.x$id[1])
},
.keep=TRUE
)
Points to note:
group_map applies a function to each group of a grouped data frame. .x refers to the data in the current group, .y is a one row tibble defining the group. .keep requests that the grouping variables are kept in .x.
drop=FALSE in the call to scale_colour_manual() ensures that unused factor levels are retained in the legend (and hence different levels of z are distinguishable between plots).

Creating scatter plot class or group wise

Im using ggstatsplot's ggscatterstats function to calculate correlation between various clinical parameters and then plotting them. For example
here my variables are age and WBC. This is taking all the data points irrespective of the class they belong. I would like to do the same with each FAB classification that is present in my data.
dat <- merge_clinical_class_TMB %>% select(FAB,AGE,Wbc,Platelet,HB,PB_Blasts,BM_Blasts,TMB_NONSYNONYMOUS)
df2 <- dat
library(ggstatsplot)
ggscatterstats(
df2,
x = AGE,
y = Wbc,
type = "np" # try the "robust" correlation too! It might be even better here
#, marginal.type = "boxplot"
)
My dataframe looks like this
head(df2)
FAB AGE Wbc Platelet HB PB_Blasts BM_Blasts TMB_NONSYNONYMOUS
1 M4 50 17 231 10 88 52 0.3000000
2 M3 61 1 90 10 44 0 0.4333333
3 M3 30 6 114 11 82 6 0.2333333
4 M0 77 92 105 9 67 56 0.4000000
5 M1 46 29 90 9 90 81 0.5666667
6 M1 68 3 63 8 91 55 0.9000000
My data
dput(df2)
structure(list(FAB = structure(c(5L, 4L, 4L, 1L, 2L, 2L, 3L,
3L, 3L, 5L, 3L, 5L, 1L, 5L, 5L, 3L, 3L, 3L, 1L, 2L, 1L, 4L, 6L,
6L, 5L, 3L, 5L, 7L, 5L, 1L, 6L, 5L, 5L, 6L, 5L, 6L, 3L, 3L, 4L,
4L, 5L, 7L, 3L, 3L, 5L, 2L, 5L, 1L, 3L, 6L, 2L, 5L, 2L, 5L, 7L,
3L, 3L, 8L, 6L, 4L, 2L, 2L, 2L, 2L, 3L, 8L, 3L, 2L, 2L, 4L, 6L,
3L, 3L, 3L, 2L, 3L, 2L, 2L, 2L, 3L, 6L, 2L, 1L, 3L, 2L, 5L, 5L,
1L, 2L, 5L, 6L, 6L, 2L, 6L, 4L, 2L, 5L, 2L, 2L, 2L, 1L, 4L, 4L,
1L, 3L, 9L, 6L, 5L, 5L, 1L, 3L, 3L, 5L, 1L, 2L, 2L, 3L, 5L, 1L,
5L, 5L, 6L, 2L, 2L, 2L, 1L, 3L, 3L, 6L, 5L, 2L, 5L, 1L, 2L, 8L,
2L, 3L, 9L, 5L, 2L, 1L, 5L, 3L, 5L, 5L, 1L, 3L, 2L, 5L, 3L, 6L,
5L, 1L, 2L, 2L, 5L, 3L, 5L, 5L, 6L, 5L, 5L, 3L, 5L, 6L, 3L, 2L,
3L, 3L, 2L, 4L, 6L, 4L, 1L, 2L, 6L, 3L, 6L, 2L, 3L, 2L, 4L, 2L,
2L, 4L, 3L, 3L, 4L, 4L, 4L, 3L, 4L, 3L, 6L, 2L, 4L, 2L, 5L, 2L,
4L), .Label = c("M0", "M1", "M2", "M3", "M4", "M5", "M6", "M7",
"nc"), class = "factor"), AGE = c(50L, 61L, 30L, 77L, 46L, 68L,
23L, 64L, 76L, 81L, 25L, 78L, 39L, 49L, 57L, 63L, 62L, 52L, 76L,
64L, 65L, 61L, 44L, 31L, 64L, 33L, 55L, 50L, 64L, 59L, 59L, 77L,
33L, 48L, 35L, 66L, 67L, 51L, 74L, 51L, 64L, 77L, 63L, 37L, 57L,
53L, 62L, 39L, 72L, 66L, 51L, 51L, 18L, 63L, 54L, 75L, 40L, 60L,
76L, 33L, 63L, 53L, 75L, 67L, 66L, 77L, 64L, 76L, 51L, 42L, 51L,
59L, 43L, 45L, 60L, 47L, 68L, 24L, 48L, 73L, 60L, 44L, 71L, 25L,
60L, 57L, 55L, 69L, 42L, 42L, 45L, 50L, 41L, 21L, 50L, 69L, 76L,
70L, 27L, 76L, 65L, 48L, 59L, 69L, 81L, 22L, 61L, 51L, 63L, 61L,
22L, 73L, 49L, 41L, 47L, 54L, 44L, 55L, 83L, 78L, 59L, 57L, 57L,
88L, 43L, 71L, 62L, 75L, 62L, 58L, 65L, 66L, 60L, 35L, 76L, 72L,
35L, 73L, 67L, 70L, 48L, 65L, 41L, 52L, 67L, 58L, 34L, 60L, 55L,
56L, 61L, 31L, 71L, 56L, 57L, 60L, 57L, 58L, 79L, 55L, 34L, 76L,
82L, 67L, 67L, 54L, 53L, 71L, 61L, 30L, 50L, 35L, 29L, 45L, 38L,
81L, 31L, 75L, 67L, 29L, 51L, 40L, 32L, 57L, 25L, 63L, 75L, 25L,
68L, 62L, 25L, 31L, 68L, 45L, 61L, 35L, 22L, 23L, 21L, 53L),
Wbc = c(17L, 1L, 6L, 92L, 29L, 3L, 32L, 117L, 62L, 91L, 34L,
10L, 2L, 57L, 88L, 77L, 75L, 4L, 15L, 1L, 3L, 86L, 9L, 137L,
132L, 3L, 22L, 6L, 3L, 1L, 12L, 40L, 26L, 116L, 53L, 112L,
2L, 42L, 32L, 4L, 2L, 3L, 17L, 19L, 14L, 3L, 119L, 5L, 3L,
79L, 104L, 3L, 35L, 77L, 2L, 8L, 8L, 1L, 4L, 1L, 46L, 2L,
6L, 31L, 3L, 2L, 3L, 34L, 2L, 2L, 15L, 12L, 4L, 29L, 12L,
12L, 60L, 224L, 33L, 2L, 7L, 14L, 5L, 11L, 47L, 5L, 31L,
6L, 11L, 38L, 5L, 7L, 134L, 93L, 3L, 10L, 3L, 48L, 90L, 297L,
1L, 1L, 1L, 2L, 2L, 115L, 35L, 50L, 18L, 62L, 52L, 15L, 12L,
48L, 81L, 13L, 35L, 28L, 78L, 17L, 30L, 99L, 20L, 3L, 172L,
6L, 28L, 98L, 59L, 101L, 68L, 2L, 2L, 43L, 4L, 38L, 34L,
59L, 37L, 1L, 111L, 49L, 43L, 298L, 26L, 47L, 14L, 16L, 114L,
203L, 8L, 133L, 1L, 31L, 3L, 68L, 3L, 20L, 19L, 73L, 20L,
5L, 1L, 15L, 45L, 68L, 88L, 36L, 10L, 23L, 1L, 72L, 1L, 2L,
40L, 12L, 13L, 7L, 46L, 2L, 64L, NA, 5L, 103L, 8L, 1L, 3L,
16L, 29L, 1L, 99L, 2L, 6L, 2L, 3L, 2L, 115L, 27L, 8L, 1L),
Platelet = c(231L, 90L, 114L, 105L, 90L, 63L, 38L, 100L,
32L, 32L, 23L, 98L, 215L, 14L, 56L, 19L, 110L, 22L, 85L,
42L, 16L, 22L, 50L, 42L, 15L, 61L, 65L, 50L, 134L, 102L,
57L, 29L, 111L, 50L, 44L, 34L, 28L, 232L, 42L, 58L, 27L,
86L, 23L, 38L, 76L, 108L, 52L, 175L, 52L, 132L, 23L, 143L,
30L, 41L, 9L, 21L, 95L, 59L, 79L, 38L, 11L, 68L, 22L, 141L,
168L, 70L, 41L, 21L, 25L, 35L, 14L, 20L, 67L, 116L, 45L,
57L, 8L, 34L, 32L, 60L, 93L, 145L, 48L, 33L, 50L, 129L, 9L,
61L, 176L, 12L, 53L, 136L, 40L, 73L, 27L, 12L, 166L, 30L,
87L, 40L, 94L, 52L, 23L, 127L, 39L, 57L, 35L, 21L, 148L,
25L, 149L, 64L, 351L, 71L, 53L, 22L, 35L, 31L, 46L, 85L,
18L, 80L, 62L, 156L, 32L, 50L, 69L, 31L, 20L, 57L, 142L,
37L, 79L, 66L, 21L, 31L, 88L, 11L, 15L, 82L, 53L, 76L, 51L,
68L, 64L, 55L, 40L, 90L, 37L, 45L, 36L, 52L, 86L, 88L, 35L,
174L, 28L, 121L, 131L, 17L, 152L, 52L, 30L, 79L, 79L, 87L,
30L, 44L, 140L, 59L, 58L, 19L, 29L, 156L, 19L, 61L, 36L,
11L, 71L, 13L, 45L, 34L, 39L, 82L, 18L, 43L, 118L, 32L, 73L,
15L, 60L, 208L, 96L, 257L, 61L, 12L, 32L, 23L, 52L, 46L),
HB = c(10L, 10L, 11L, 9L, 9L, 8L, 7L, 10L, 10L, 11L, 11L,
10L, 10L, 8L, 10L, 13L, 11L, 9L, 9L, 8L, 9L, 12L, 8L, 6L,
10L, 7L, 8L, 9L, 11L, 12L, 11L, 10L, 10L, 9L, 8L, 10L, 9L,
13L, 9L, 8L, 12L, 9L, 12L, 9L, 9L, 9L, 11L, 10L, 11L, 12L,
12L, 11L, 9L, 10L, 9L, 9L, 10L, 9L, 10L, 9L, 8L, 9L, 9L,
10L, 12L, 10L, 10L, 8L, 10L, 9L, 11L, 11L, 11L, 8L, 9L, 9L,
9L, 6L, 10L, 10L, 9L, 9L, 8L, 9L, 9L, 7L, 9L, 11L, 12L, 10L,
9L, 10L, 12L, NA, 10L, 7L, 11L, 10L, 9L, 11L, 10L, 9L, 8L,
8L, 10L, 9L, 12L, 11L, 8L, 13L, 11L, 9L, 9L, 12L, 10L, 9L,
10L, 8L, 9L, 9L, 9L, 10L, 9L, 10L, 10L, 9L, 10L, 8L, 7L,
9L, 9L, 8L, 9L, 9L, 8L, 10L, 8L, 9L, 9L, 8L, 9L, 9L, 9L,
9L, 9L, 10L, 9L, 8L, 9L, 10L, 7L, 11L, 11L, 10L, 6L, 8L,
9L, 9L, 10L, 8L, 11L, 10L, 11L, 8L, 9L, 8L, 9L, 8L, 10L,
10L, 10L, 9L, 9L, 12L, 9L, 9L, 11L, 9L, 13L, 9L, 10L, 8L,
9L, 10L, 10L, 11L, 9L, 9L, 10L, 9L, 9L, 11L, 7L, 13L, 14L,
12L, 8L, 12L, 8L, 9L), PB_Blasts = c(88L, 44L, 82L, 67L,
90L, 91L, 59L, 60L, 48L, 98L, 53L, 40L, 75L, 81L, 90L, 57L,
46L, 67L, 74L, 61L, 99L, 73L, 74L, 83L, 72L, 33L, 35L, 70L,
85L, 61L, 95L, 80L, 71L, 83L, 90L, 90L, 50L, 64L, 51L, 93L,
95L, 75L, 80L, 52L, 61L, 72L, 65L, 83L, 45L, 32L, 85L, 73L,
86L, 82L, 30L, 48L, 47L, 58L, 78L, 100L, 81L, 82L, 40L, 89L,
70L, 47L, 80L, 73L, 62L, 88L, 57L, 70L, 40L, 56L, 86L, 37L,
90L, 77L, 75L, 37L, 94L, 86L, 97L, 72L, 87L, 40L, 52L, 60L,
68L, 40L, 95L, 81L, 92L, 90L, 90L, 42L, 37L, 84L, 77L, 99L,
83L, 65L, 79L, 82L, 46L, 94L, 71L, 39L, 62L, 95L, 55L, 11L,
51L, 42L, 77L, 72L, 39L, 69L, 75L, 70L, 75L, 52L, 91L, 33L,
87L, 55L, 72L, 76L, 85L, 79L, 79L, 81L, 50L, 81L, 33L, 88L,
34L, 90L, 69L, 32L, 92L, 90L, 47L, 75L, 30L, 59L, 57L, 62L,
54L, 60L, 89L, 82L, 90L, 90L, 64L, 89L, 43L, 58L, 58L, 97L,
71L, 91L, 53L, 75L, 85L, 67L, 86L, 70L, 43L, 86L, 74L, 87L,
0L, 0L, 86L, 53L, 63L, 41L, 76L, 45L, 85L, 0L, 94L, 6L, 91L,
0L, 2L, 93L, 85L, 82L, 56L, 40L, 48L, 0L, 14L, 90L, 71L,
51L, 91L, 42L), BM_Blasts = c(52L, 0L, 6L, 56L, 81L, 55L,
0L, 0L, 88L, 37L, 87L, 6L, 4L, 48L, 84L, 70L, 53L, 18L, 82L,
5L, 34L, 68L, 5L, 6L, 90L, 0L, 67L, 0L, 22L, 12L, 0L, 2L,
14L, 3L, 18L, 7L, 17L, 79L, 0L, 40L, 0L, 8L, 71L, 33L, 17L,
41L, 65L, 53L, 0L, 11L, 85L, 2L, 90L, 39L, 0L, 54L, 23L,
0L, 0L, 0L, 97L, 42L, 48L, 61L, 6L, 0L, 46L, 55L, 10L, 2L,
0L, 48L, 39L, 37L, 43L, 0L, 91L, 76L, 41L, 16L, 30L, 17L,
54L, 50L, 65L, 0L, 59L, 22L, 51L, 16L, 6L, 10L, 90L, 72L,
0L, 32L, 0L, 49L, 88L, 98L, 0L, 0L, 15L, 0L, 0L, 94L, 55L,
39L, 9L, 86L, 70L, 11L, 5L, 74L, 79L, 90L, 83L, 57L, 74L,
28L, 17L, 4L, 91L, 0L, 91L, 50L, 49L, 80L, 22L, 64L, 84L,
12L, 14L, 86L, 6L, 18L, 40L, 0L, 61L, 6L, 87L, 0L, 62L, 51L,
6L, 72L, 59L, 29L, 24L, 96L, 0L, 53L, 13L, 45L, 61L, 56L,
35L, 10L, 0L, 8L, 58L, 16L, 25L, 10L, 3L, 71L, 52L, 67L,
32L, 88L, 10L, 8L, 0L, 0L, 97L, 7L, 45L, 0L, 49L, 9L, 85L,
0L, 70L, 91L, 7L, 0L, 2L, 0L, 32L, 11L, 71L, 0L, 48L, 0L,
14L, 7L, 90L, 63L, 83L, 29L), TMB_NONSYNONYMOUS = c(0.3,
0.433333333333, 0.233333333333, 0.4, 0.566666666667, 0.9,
0.3, 0.133333333333, 0.4, 0.3, 0.233333333333, 0.5, 0.266666666667,
0, 0.2, 0.4, 0.266666666667, 0.333333333333, 0.4, 0.4, 0.566666666667,
0.0333333333333, 0.166666666667, 0.1, 0.166666666667, 0.266666666667,
0.3, 0.3, 0.466666666667, 0.0666666666667, 0.266666666667,
0.266666666667, 0.0333333333333, 0.1, 0.133333333333, 0.0333333333333,
0.5, 0.6, 0.0333333333333, 0.1, 0.0333333333333, 0.333333333333,
0.433333333333, 0.2, 0.466666666667, 0.2, 0.0333333333333,
0.733333333333, 0.2, 0.233333333333, 0.233333333333, 0.3,
0.133333333333, 0, 0.3, 0.333333333333, 0.333333333333, 0.266666666667,
0.533333333333, 0.2, 0.533333333333, 0.466666666667, 0.533333333333,
0.0333333333333, 0.3, 0.5, 0.333333333333, 0.266666666667,
0.5, 0.333333333333, 0.0666666666667, 0.466666666667, 0.333333333333,
0.266666666667, 0.7, 0.433333333333, 0.166666666667, 0.0666666666667,
0.233333333333, 0.5, 0.0333333333333, 0.2, 0.433333333333,
0.433333333333, 0.4, 0.233333333333, 0.0666666666667, 0.233333333333,
0.466666666667, 0.0666666666667, 0, 0.1, 0.4, 0.1, 0.2, 0.4,
0.433333333333, 0.566666666667, 0.2, 0.0333333333333, 0.533333333333,
0.566666666667, 0.3, 0.466666666667, 0.566666666667, 0.0333333333333,
0.4, 0.0666666666667, 0.633333333333, 0.4, 0.466666666667,
0.466666666667, 0.3, 0.5, 0.0333333333333, 0.333333333333,
0.333333333333, 0.266666666667, 0.366666666667, 0.666666666667,
0.333333333333, 0.533333333333, 0.466666666667, 0.6, 0.333333333333,
0.4, 0.266666666667, 0.366666666667, 0.2, 0.0333333333333,
0.266666666667, 0.3, 0.166666666667, 0.4, 0.566666666667,
0.4, 0.1, 0.1, 0.0666666666667, 0.366666666667, 0, 0.4, 0.0333333333333,
0.1, 0.0666666666667, 0.5, 0.3, 0.466666666667, 0.0333333333333,
0.4, 0.1, 0.0666666666667, 0.766666666667, 0.5, 0.466666666667,
0.333333333333, 0.4, 0.333333333333, 0.4, 0.266666666667,
0.2, 0.3, 0.7, 0.166666666667, 0.2, 0, 0.5, 0.166666666667,
0.533333333333, 0.233333333333, 0.166666666667, 0.133333333333,
0.0666666666667, 0.4, 0.333333333333, 0.133333333333, 0.4,
0.233333333333, 0.466666666667, 0.366666666667, 0.266666666667,
0.266666666667, 0.266666666667, 0.4, 0.2, 0.166666666667,
0.4, 0.333333333333, 0.166666666667, 0.266666666667, 0.1,
0.333333333333, 0.733333333333, 0.466666666667, 0.466666666667,
0.2, 0.1, 1.13333333333, 0.2, 0.3)), class = "data.frame", row.names = c(NA,
-200L))
Objective I would like to do the same with various FABI have FAB label from M0 to M7 I would like to ignore nc
So for each FAB label I would like to see the correlation for example if I have to take the M0 class then I would like to see their Age vs Wbc correlation and similarly for other FAB class as well. Is it possible to do these in ggstataplot as I don't see for correlation any such functionality there .
Simple way is I can subset them and do the same like M0 ,M1, M2 etc etc but that is a long process can I split the FAB column and pass it to the library?
I would like to know other ways to do the above and plot the same
Any help or suggestion would be appreciated
Update: We could also use the built in function see comments:
Many thanks to #Indrajeet Patil: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggscatterstats.html#grouped-analysis-with-grouped_ggscatterstats
To subset FAB we use filter:
## for reproducibility
set.seed(123)
## plot
grouped_ggscatterstats(
## arguments relevant for ggscatterstats
data = df2 %>% filter(as.integer(FAB)<5),
x = AGE,
y = Wbc,
grouping.var = FAB,
type = "r",
# ggtheme = ggthemes::theme_tufte(),
## arguments relevant for combine_plots
annotation.args = list(
title = "Relationship between Wbc and Age",
caption = "Source: stackoverflow"
),
plotgrid.args = list(nrow = 2, ncol = 2)
)
First answer:
We could do something like this:
write a function and pass the data frame + the column FAB value:
library(ggstatsplot)
my_function <- function(df, x){
ggscatterstats(
df %>% filter(FAB == x),
x = AGE,
y = Wbc,
type = "np" # try the "robust" correlation too! It might be even better here
#, marginal.type = "boxplot"
)
}
M0 <- my_function(df2, "M0")
M1 <- my_function(df2, "M1")
M2 <- my_function(df2, "M2")
M3 <- my_function(df2, "M3")
.
.
.
library(patchwork)
(M0 / M1 | M2 / M3)

Adding a year to a time series with only months and days

I have a df like this
month day x
1 1 1 84
2 1 2 43
3 1 3 49
4 1 4 67
5 1 5 59
......
366 12 31 97
The year should be 2019 from Oct-Dec and 2020 from Jan-Sep
I tried to use
df$year<-as.date(df,origin='2019-01-01')
But I am not sure how to make an argument.
I want the year column to get a date column and try then
df$date<-as.date(with(paste("???",month,day,sep="-"), %Y-%m-%d,origin ="2019-01-01")
but again I don't know how to make an argument for year
Please any help would save me a lot of time because doing it manually seems impossible
We could use an ifelse statement with the make_date function from lubridate:
library(dplyr)
library(lubridate)
df %>%
mutate(year= ifelse(month %in% c(10,11,12), 2019, 2020),
date = make_date(year, month, day))
output:
month day x year date
1 1 1 84 2020 2020-01-01
2 1 2 43 2020 2020-01-02
3 11 3 49 2019 2019-11-03
4 1 4 67 2020 2020-01-04
5 1 5 59 2020 2020-01-05
366 12 31 97 2019 2019-12-31
You could use something like below. If you need a fixed variable instead of 2019/2020 you can use something like var-1 when it is oct-dec and var when it is jan - sep.
library(dplyr)
library(lubridate)
df1 %>%
mutate(date = if_else(month %in% c(10:12),
ymd(paste(2019, df1$month, df1$day, sep = "-")),
ymd(paste(2020, df1$month, df1$day, sep = "-"))))
data:
df1 <- data.frame(month = c(1:12), day = 1, x = 5)
Using base features you could use rowSums to identify 31th of October, then ISOdate.
w <- which.max(rowSums(d[1:2]) == 31 + 10)
d$year <- c(rep(2020, w), rep(2019, 365 - w))
d$date <- do.call(\(year, month, day, ...) as.Date(ISOdate(year, month, day)), d)
Result
head(d, 3)
# month day x year date
# 1 1 1 58 2020 2020-01-01
# 2 1 2 74 2020 2020-01-02
# 3 1 3 43 2020 2020-01-03
tail(d, 3)
# month day x year date
# 363 12 29 46 2019 2019-12-29
# 364 12 30 82 2019 2019-12-30
# 365 12 31 63 2019 2019-12-31
Note:
R.version.string
# [1] "R version 4.1.1 (2021-08-10)"
Data:
d <- structure(list(month = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L), day = c(1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L,
18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L,
31L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L,
27L, 28L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L,
26L, 27L, 28L, 29L, 30L, 31L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L,
22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L,
28L, 29L, 30L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L,
25L, 26L, 27L, 28L, 29L, 30L, 31L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L,
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L,
30L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L,
27L, 28L, 29L, 30L, 31L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L,
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L),
x = c(72L, 95L, 95L, 76L, 84L, 64L, 85L, 84L, 70L, 95L, 75L,
64L, 72L, 48L, 68L, 68L, 44L, 53L, 46L, 49L, 62L, 53L, 74L,
86L, 58L, 63L, 85L, 85L, 81L, 44L, 66L, 82L, 86L, 90L, 75L,
54L, 53L, 52L, 47L, 48L, 61L, 95L, 96L, 73L, 59L, 57L, 94L,
70L, 81L, 68L, 83L, 83L, 95L, 55L, 73L, 51L, 50L, 83L, 58L,
45L, 74L, 64L, 54L, 60L, 77L, 94L, 90L, 47L, 44L, 50L, 70L,
69L, 76L, 69L, 62L, 63L, 62L, 55L, 47L, 43L, 71L, 47L, 66L,
69L, 74L, 53L, 85L, 62L, 53L, 57L, 52L, 65L, 85L, 68L, 62L,
43L, 72L, 69L, 79L, 71L, 95L, 45L, 96L, 70L, 96L, 51L, 48L,
67L, 52L, 48L, 72L, 54L, 64L, 79L, 49L, 55L, 90L, 57L, 51L,
63L, 79L, 69L, 48L, 52L, 89L, 70L, 95L, 64L, 75L, 95L, 70L,
94L, 95L, 43L, 87L, 56L, 46L, 53L, 60L, 91L, 61L, 88L, 83L,
89L, 45L, 87L, 69L, 83L, 71L, 44L, 93L, 96L, 80L, 46L, 80L,
66L, 80L, 59L, 86L, 51L, 48L, 80L, 81L, 79L, 65L, 80L, 72L,
84L, 61L, 55L, 49L, 54L, 60L, 44L, 44L, 84L, 49L, 94L, 45L,
80L, 79L, 51L, 70L, 48L, 66L, 89L, 60L, 57L, 76L, 86L, 88L,
71L, 79L, 94L, 74L, 93L, 80L, 75L, 90L, 91L, 77L, 95L, 48L,
90L, 77L, 50L, 49L, 56L, 71L, 73L, 62L, 85L, 90L, 76L, 67L,
44L, 96L, 52L, 73L, 85L, 44L, 44L, 79L, 89L, 93L, 58L, 57L,
75L, 48L, 58L, 59L, 51L, 64L, 89L, 82L, 76L, 51L, 56L, 46L,
82L, 48L, 76L, 93L, 60L, 52L, 75L, 77L, 53L, 52L, 56L, 50L,
66L, 70L, 67L, 87L, 90L, 50L, 80L, 54L, 81L, 54L, 73L, 88L,
64L, 52L, 64L, 73L, 79L, 68L, 53L, 86L, 94L, 56L, 62L, 65L,
85L, 61L, 54L, 93L, 60L, 69L, 82L, 83L, 56L, 51L, 82L, 71L,
76L, 77L, 60L, 79L, 61L, 83L, 87L, 43L, 74L, 76L, 63L, 59L,
54L, 93L, 82L, 65L, 89L, 68L, 62L, 61L, 91L, 89L, 79L, 59L,
52L, 80L, 71L, 96L, 46L, 84L, 47L, 92L, 80L, 86L, 64L, 88L,
56L, 93L, 94L, 66L, 46L, 87L, 63L, 89L, 92L, 88L, 65L, 90L,
71L, 53L, 91L, 61L, 91L, 62L, 62L, 48L, 80L, 73L, 62L, 75L,
59L, 72L, 61L, 90L, 51L, 66L, 74L, 58L, 73L, 89L, 50L, 79L,
90L, 94L, 59L, 47L, 88L, 83L)), row.names = c(NA, -365L), class = "data.frame")
Base R option -
transform(df, date = as.Date(paste(ifelse(month %in% 10:12, 2019, 2020), month, day, sep = '-')))
# month day x date
#1 1 1 84 2020-01-01
#2 1 2 43 2020-01-02
#3 11 3 49 2019-11-03
#4 1 4 67 2020-01-04
#5 1 5 59 2020-01-05

Subtract Values based on Multiple Grouping Factors

I have a dataset with phosphorus concentrations for 17 separate days (concentrations are cumulative, so increase from Day1 to Day102 in all cases). There are 22 different treatments (column = Trmt). Each Trmt has 3 Levels (Level = X, Y, Z). 2 measurements per Level for a total of 6 per Trmt.
My goal is to plot a 3-line graph of Days (x-axis; numeric) by Concentration (y-axis) using ggplot2. Data should be grouped by Trmt, Level and day for a total of 51 measurements (3 lines x 17 days).
My data looks as follows:
structure(list(Trmt = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 5L, 5L, 5L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 7L, 7L, 7L, 7L, 7L, 10L, 10L, 10L, 10L, 10L, 10L, 9L, 9L, 9L, 9L, 9L, 9L, 12L, 12L, 12L, 12L, 12L, 12L, 11L, 11L, 11L, 11L, 11L, 11L, 14L, 14L, 14L, 14L, 14L, 14L, 13L, 13L, 13L, 13L, 13L, 13L, 16L, 16L, 16L, 16L, 16L, 16L, 15L, 15L, 15L, 15L, 15L, 15L, 18L, 18L, 18L, 18L, 18L, 18L, 17L, 17L, 17L, 17L, 17L, 17L, 20L, 20L, 20L, 20L, 20L, 20L, 19L, 19L, 19L, 19L, 19L, 19L, 22L, 22L, 22L, 22L, 22L, 22L, 21L, 21L, 21L, 21L, 21L, 21L), .Label = c("A01nF", "A01yT", "A02nF", "A02yT", "A03nF", "A03yT", "A04nF", "A04yT", "A05nF", "A05yT", "A06nF", "A06yT", "A07nF", "A07yT", "A08nF", "A08yT", "A10nF", "A10yT", "A11nF", "A11yT", "A13nF", "A13yT"), class = "factor"), Level = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("X", "Y", "Z"), class = "factor"), Day1 = c(3L, 1L, 4L, 2L, 4L, 2L, 5L, 4L, 1L, 2L, 5L, 1L, 5L, 2L, 5L, 5L, 3L, 5L, 3L, 3L, 1L, 4L, 1L, 1L, 5L, 4L, 1L, 5L, 4L, 5L, 3L, 5L, 3L, 5L, 3L, 4L, 2L, 4L, 2L, 4L, 3L, 1L, 1L, 3L, 1L, 3L, 1L, 5L, 2L, 4L, 4L, 3L, 1L, 4L, 4L, 1L, 4L, 1L, 2L, 5L, 1L, 5L, 1L, 2L, 4L, 4L, 4L, 4L, 2L, 4L, 5L, 5L, 4L, 1L, 3L, 2L, 3L, 5L, 4L, 3L, 2L, 3L, 5L, 4L, 1L, 3L, 4L, 3L, 3L, 5L, 3L, 1L, 1L, 4L, 4L, 5L, 1L, 4L, 4L, 4L, 1L, 4L, 5L, 5L, 1L, 5L, 3L, 1L, 4L, 1L, 4L, 5L, 5L, 3L, 3L, 2L, 4L, 5L, 3L, 2L, 1L, 5L, 5L, 2L, 2L, 3L, 4L, 3L, 4L, 2L, 2L, 4L), Day2 = c(10L, 9L, 7L, 7L, 6L, 7L, 10L, 9L, 10L, 6L, 10L, 7L, 8L, 9L, 8L, 9L, 7L, 10L, 7L, 10L, 6L, 8L, 6L, 8L, 8L, 8L, 10L, 6L, 8L, 8L, 6L, 10L, 7L, 10L, 7L, 10L, 6L, 6L, 7L, 9L, 8L, 10L, 8L, 7L, 9L, 8L, 6L, 9L, 7L, 9L, 8L, 6L, 6L, 8L, 10L, 7L, 8L, 6L, 8L, 8L, 6L, 9L, 10L, 6L, 8L, 7L, 9L, 7L, 8L, 10L, 10L, 6L, 7L, 10L, 9L, 9L, 8L, 9L, 6L, 8L, 6L, 8L, 6L, 9L, 10L, 7L, 7L, 7L, 8L, 7L, 8L, 10L, 7L, 8L, 9L, 6L, 8L, 9L, 8L, 9L, 6L, 7L, 10L, 9L, 10L, 7L, 6L, 9L, 9L, 9L, 6L, 10L, 9L, 8L, 9L, 7L, 10L, 7L, 10L, 9L, 6L, 8L, 9L, 8L, 9L, 6L, 6L, 10L, 9L, 8L, 8L, 7L), Day4 = c(11L, 12L, 14L, 11L, 15L, 15L, 12L, 11L, 15L, 12L, 15L, 12L, 12L, 11L, 15L, 15L, 13L, 11L, 13L, 14L, 12L, 11L, 13L, 12L, 15L, 15L, 14L, 11L, 15L, 11L, 12L, 11L, 13L, 11L, 12L, 13L, 13L, 14L, 13L, 15L, 14L, 15L, 12L, 14L, 11L, 13L, 15L, 11L, 12L, 13L, 11L, 15L, 11L, 13L, 11L, 11L, 14L, 12L, 14L, 15L, 11L, 12L, 15L, 12L, 13L, 12L, 14L, 12L, 11L, 13L, 12L, 12L, 11L, 15L, 13L, 12L, 11L, 12L, 13L, 14L, 14L, 14L, 13L, 12L, 15L, 12L, 15L, 15L, 12L, 13L, 12L, 12L, 12L, 14L, 13L, 13L, 14L, 11L, 12L, 11L, 15L, 11L, 11L, 11L, 14L, 11L, 12L, 15L, 15L, 11L, 12L, 14L, 15L, 14L, 14L, 12L, 14L, 13L, 15L, 15L, 14L, 13L, 12L, 15L, 15L, 11L, 13L, 12L, 11L, 13L, 12L, 14L), Day7 = c(19L, 17L, 17L, 20L, 17L, 19L, 18L, 19L, 17L, 20L, 16L, 20L, 19L, 18L, 20L, 19L, 17L, 16L, 18L, 18L, 17L, 18L, 19L, 18L, 17L, 19L, 17L, 20L, 19L, 20L, 19L, 20L, 17L, 18L, 20L, 19L, 20L, 18L, 18L, 20L, 18L, 20L, 17L, 19L, 17L, 19L, 17L, 17L, 20L, 18L, 18L, 17L, 16L, 18L, 20L, 16L, 17L, 19L, 16L, 19L, 16L, 17L, 16L, 20L, 16L, 19L, 19L, 17L, 17L, 17L, 20L, 19L, 18L, 16L, 20L, 17L, 19L, 16L, 18L, 19L, 16L, 19L, 20L, 20L, 16L, 16L, 18L, 17L, 16L, 18L, 16L, 17L, 16L, 18L, 20L, 16L, 16L, 20L, 20L, 16L, 20L, 18L, 17L, 19L, 18L, 18L, 19L, 19L, 16L, 18L, 19L, 19L, 17L, 17L, 18L, 18L, 20L, 18L, 20L, 20L, 18L, 19L, 19L, 16L, 16L, 17L, 20L, 16L, 17L, 18L, 16L, 20L), Day10 = c(24L, 23L, 23L, 21L, 21L, 23L, 21L, 21L, 22L, 25L, 21L, 23L, 21L, 25L, 25L, 25L, 24L, 22L, 25L, 24L, 21L, 23L, 24L, 23L, 23L, 22L, 23L, 22L, 22L, 25L, 25L, 22L, 21L, 24L, 25L, 23L, 23L, 23L, 24L, 23L, 25L, 23L, 21L, 23L, 22L, 24L, 22L, 23L, 24L, 22L, 25L, 23L, 23L, 21L, 25L, 24L, 24L, 25L, 25L, 25L, 22L, 23L, 21L, 22L, 24L, 22L, 23L, 22L, 24L, 22L, 21L, 22L, 23L, 21L, 25L, 25L, 22L, 21L, 25L, 24L, 22L, 21L, 25L, 24L, 21L, 24L, 25L, 22L, 23L, 22L, 24L, 23L, 25L, 25L, 23L, 25L, 22L, 23L, 23L, 23L, 22L, 25L, 22L, 23L, 24L, 25L, 22L, 21L, 21L, 22L, 23L, 24L, 21L, 24L, 23L, 23L, 25L, 24L, 25L, 23L, 22L, 25L, 25L, 25L, 21L, 22L, 23L, 21L, 24L, 24L, 25L, 21L), Day13 = c(29L, 29L, 26L, 27L, 30L, 30L, 30L, 26L, 30L, 29L, 30L, 27L, 26L, 29L, 28L, 26L, 30L, 28L, 29L, 27L, 28L, 26L, 29L, 28L, 30L, 26L, 27L, 30L, 26L, 29L, 26L, 28L, 29L, 28L, 29L, 28L, 27L, 27L, 28L, 26L, 26L, 27L, 27L, 29L, 27L, 29L, 27L, 30L, 26L, 27L, 30L, 26L, 29L, 29L, 27L, 29L, 26L, 29L, 28L, 28L, 29L, 30L, 28L, 30L, 30L, 30L, 28L, 29L, 28L, 27L, 28L, 27L, 27L, 28L, 27L, 30L, 27L, 30L, 27L, 28L, 29L, 27L, 30L, 29L, 30L, 30L, 26L, 30L, 29L, 30L, 27L, 26L, 27L, 27L, 28L, 26L, 30L, 28L, 30L, 30L, 30L, 30L, 26L, 28L, 27L, 26L, 29L, 26L, 29L, 26L, 30L, 29L, 30L, 26L, 27L, 30L, 29L, 30L, 27L, 30L, 28L, 26L, 30L, 27L, 30L, 26L, 28L, 29L, 26L, 28L, 28L, 26L), Day18 = c(32L, 31L, 32L, 31L, 31L, 34L, 32L, 34L, 32L, 33L, 31L, 34L, 35L, 34L, 34L, 32L, 33L, 35L, 32L, 35L, 31L, 31L, 33L, 33L, 32L, 31L, 32L, 31L, 32L, 34L, 33L, 33L, 34L, 31L, 35L, 35L, 31L, 34L, 32L, 32L, 34L, 33L, 34L, 33L, 33L, 35L, 35L, 31L, 35L, 31L, 33L, 34L, 31L, 33L, 34L, 32L, 32L, 33L, 31L, 32L, 35L, 34L, 31L, 32L, 34L, 35L, 34L, 31L, 34L, 33L, 35L, 35L, 31L, 32L, 35L, 34L, 31L, 32L, 32L, 33L, 32L, 35L, 32L, 32L, 35L, 33L, 34L, 32L, 34L, 35L, 34L, 33L, 33L, 31L, 31L, 31L, 35L, 34L, 33L, 32L, 33L, 33L, 33L, 35L, 34L, 33L, 31L, 34L, 34L, 34L, 34L, 33L, 33L, 31L, 31L, 31L, 33L, 33L, 35L, 32L, 32L, 31L, 31L, 32L, 33L, 32L, 34L, 34L, 31L, 35L, 31L, 35L), Day23 = c(39L, 40L, 38L, 37L, 37L, 38L, 37L, 36L, 37L, 36L, 36L, 38L, 40L, 38L, 37L, 36L, 36L, 40L, 40L, 40L, 40L, 39L, 40L, 36L, 38L, 36L, 36L, 37L, 38L, 37L, 36L, 37L, 39L, 39L, 38L, 38L, 37L, 40L, 36L, 38L, 37L, 40L, 36L, 37L, 39L, 38L, 38L, 38L, 40L, 38L, 37L, 36L, 38L, 36L, 36L, 36L, 39L, 40L, 39L, 37L, 39L, 39L, 37L, 36L, 37L, 39L, 39L, 37L, 36L, 37L, 40L, 36L, 39L, 40L, 39L, 40L, 39L, 38L, 39L, 40L, 37L, 40L, 38L, 38L, 38L, 40L, 40L, 36L, 39L, 39L, 39L, 39L, 38L, 37L, 37L, 36L, 37L, 39L, 37L, 40L, 40L, 40L, 38L, 38L, 39L, 38L, 36L, 37L, 36L, 36L, 40L, 39L, 39L, 39L, 36L, 39L, 38L, 40L, 36L, 37L, 38L, 38L, 36L, 37L, 39L, 36L, 40L, 40L, 39L, 38L, 37L, 38L), Day28 = c(42L, 43L, 43L, 44L, 44L, 44L, 42L, 42L, 43L, 42L, 45L, 43L, 43L, 43L, 42L, 44L, 42L, 44L, 45L, 44L, 44L, 45L, 44L, 41L, 41L, 42L, 44L, 44L, 44L, 45L, 43L, 42L, 43L, 42L, 41L, 44L, 43L, 43L, 42L, 42L, 44L, 42L, 42L, 42L, 45L, 44L, 45L, 42L, 43L, 45L, 45L, 44L, 41L, 42L, 42L, 41L, 44L, 44L, 44L, 44L, 42L, 45L, 41L, 42L, 45L, 43L, 44L, 45L, 44L, 42L, 41L, 43L, 41L, 44L, 43L, 41L, 45L, 42L, 45L, 41L, 45L, 41L, 45L, 42L, 45L, 42L, 45L, 45L, 41L, 41L, 43L, 41L, 41L, 42L, 43L, 41L, 42L, 44L, 43L, 45L, 41L, 41L, 44L, 41L, 44L, 43L, 43L, 45L, 44L, 41L, 44L, 43L, 42L, 45L, 45L, 41L, 45L, 42L, 41L, 44L, 41L, 41L, 41L, 43L, 41L, 41L, 45L, 41L, 42L, 45L, 41L, 44L), Day35 = c(50L, 50L, 50L, 50L, 48L, 46L, 50L, 46L, 48L, 50L, 50L, 50L, 46L, 49L, 46L, 47L, 49L, 49L, 48L, 49L, 46L, 47L, 49L, 46L, 49L, 50L, 49L, 46L, 49L, 50L, 46L, 48L, 50L, 46L, 50L, 48L, 46L, 48L, 50L, 50L, 47L, 47L, 47L, 47L, 47L, 49L, 48L, 46L, 46L, 48L, 50L, 46L, 49L, 48L, 46L, 49L, 50L, 49L, 48L, 48L, 48L, 50L, 49L, 47L, 48L, 50L, 50L, 46L, 47L, 46L, 48L, 48L, 48L, 47L, 49L, 48L, 49L, 46L, 47L, 50L, 47L, 50L, 47L, 47L, 46L, 46L, 47L, 50L, 49L, 49L, 48L, 47L, 46L, 50L, 46L, 50L, 50L, 46L, 47L, 47L, 49L, 50L, 50L, 46L, 47L, 50L, 47L, 48L, 46L, 50L, 49L, 46L, 46L, 50L, 50L, 49L, 46L, 49L, 46L, 46L, 46L, 48L, 47L, 47L, 50L, 47L, 46L, 48L, 50L, 48L, 46L, 46L), Day42 = c(52L, 51L, 53L, 53L, 54L, 55L, 55L, 54L, 52L, 51L, 55L, 51L, 54L, 53L, 53L, 55L, 54L, 55L, 51L, 51L, 55L, 54L, 54L, 53L, 55L, 53L, 52L, 53L, 53L, 51L, 54L, 54L, 55L, 53L, 54L, 55L, 51L, 51L, 54L, 52L, 51L, 51L, 55L, 54L, 54L, 52L, 52L, 55L, 55L, 51L, 55L, 52L, 55L, 51L, 53L, 52L, 53L, 54L, 51L, 54L, 54L, 55L, 52L, 54L, 52L, 52L, 51L, 52L, 55L, 52L, 54L, 51L, 52L, 55L, 51L, 52L, 55L, 54L, 52L, 53L, 53L, 52L, 55L, 51L, 51L, 55L, 52L, 55L, 55L, 55L, 53L, 52L, 53L, 54L, 52L, 52L, 52L, 52L, 53L, 51L, 54L, 54L, 51L, 53L, 55L, 51L, 54L, 54L, 54L, 53L, 53L, 54L, 54L, 55L, 52L, 52L, 54L, 51L, 52L, 51L, 51L, 55L, 52L, 51L, 51L, 53L, 54L, 51L, 51L, 54L, 55L, 52L), Day52 = c(59L, 57L, 56L, 58L, 59L, 59L, 57L, 59L, 57L, 56L, 58L, 58L, 60L, 59L, 56L, 56L, 60L, 57L, 60L, 57L, 59L, 56L, 60L, 59L, 59L, 56L, 60L, 58L, 60L, 57L, 57L, 60L, 56L, 57L, 59L, 60L, 56L, 58L, 57L, 57L, 58L, 58L, 59L, 56L, 58L, 56L, 57L, 60L, 58L, 59L, 58L, 56L, 56L, 57L, 60L, 59L, 60L, 58L, 59L, 60L, 57L, 60L, 59L, 57L, 60L, 56L, 57L, 56L, 58L, 60L, 56L, 58L, 56L, 60L, 57L, 57L, 57L, 60L, 58L, 59L, 58L, 60L, 59L, 58L, 56L, 56L, 58L, 57L, 60L, 56L, 58L, 56L, 57L, 58L, 58L, 60L, 59L, 60L, 59L, 59L, 59L, 57L, 57L, 60L, 59L, 57L, 57L, 58L, 59L, 57L, 59L, 58L, 60L, 59L, 56L, 57L, 57L, 56L, 57L, 60L, 58L, 57L, 56L, 59L, 59L, 59L, 57L, 57L, 58L, 56L, 58L, 60L), Day62 = c(67L, 65L, 68L, 65L, 69L, 70L, 69L, 66L, 65L, 70L, 70L, 65L, 67L, 68L, 65L, 67L, 65L, 66L, 66L, 68L, 68L, 66L, 65L, 67L, 66L, 69L, 69L, 69L, 68L, 67L, 66L, 69L, 65L, 65L, 69L, 66L, 69L, 68L, 69L, 67L, 65L, 69L, 69L, 69L, 70L, 67L, 65L, 65L, 65L, 66L, 66L, 69L, 68L, 66L, 67L, 66L, 70L, 70L, 70L, 69L, 70L, 70L, 67L, 66L, 65L, 69L, 67L, 66L, 70L, 70L, 70L, 65L, 66L, 67L, 66L, 66L, 67L, 68L, 70L, 67L, 69L, 66L, 67L, 65L, 70L, 65L, 70L, 66L, 66L, 69L, 68L, 65L, 65L, 67L, 68L, 67L, 69L, 68L, 69L, 66L, 68L, 70L, 69L, 68L, 70L, 66L, 69L, 66L, 66L, 67L, 65L, 69L, 69L, 67L, 70L, 65L, 70L, 69L, 66L, 68L, 67L, 68L, 66L, 65L, 67L, 70L, 66L, 67L, 66L, 67L, 67L, 70L), Day72 = c(74L, 74L, 71L, 75L, 74L, 71L, 75L, 71L, 75L, 71L, 72L, 72L, 75L, 73L, 75L, 74L, 74L, 74L, 71L, 74L, 72L, 71L, 71L, 74L, 74L, 73L, 72L, 73L, 71L, 71L, 75L, 72L, 73L, 74L, 75L, 73L, 71L, 71L, 74L, 71L, 73L, 75L, 75L, 74L, 71L, 75L, 74L, 72L, 72L, 71L, 72L, 75L, 73L, 74L, 71L, 75L, 75L, 73L, 72L, 73L, 73L, 72L, 75L, 72L, 71L, 72L, 73L, 72L, 72L, 74L, 72L, 72L, 73L, 75L, 74L, 75L, 73L, 74L, 75L, 72L, 75L, 73L, 71L, 71L, 72L, 74L, 72L, 75L, 71L, 71L, 71L, 73L, 72L, 71L, 75L, 75L, 74L, 73L, 71L, 71L, 72L, 71L, 71L, 74L, 72L, 73L, 71L, 75L, 74L, 75L, 74L, 73L, 73L, 73L, 72L, 75L, 73L, 71L, 71L, 72L, 72L, 71L, 71L, 71L, 72L, 73L, 75L, 75L, 72L, 73L, 75L, 75L), Day82 = c(76L, 78L, 78L, 78L, 79L, 77L, 78L, 77L, 80L, 79L, 80L, 76L, 76L, 80L, 80L, 80L, 78L, 78L, 78L, 78L, 80L, 78L, 76L, 79L, 76L, 77L, 76L, 79L, 78L, 76L, 76L, 79L, 79L, 77L, 77L, 77L, 78L, 78L, 80L, 77L, 77L, 76L, 77L, 79L, 78L, 78L, 78L, 80L, 79L, 76L, 79L, 77L, 76L, 80L, 78L, 77L, 79L, 80L, 77L, 80L, 78L, 79L, 78L, 76L, 76L, 79L, 77L, 77L, 78L, 78L, 79L, 78L, 78L, 78L, 80L, 79L, 78L, 77L, 78L, 78L, 78L, 79L, 80L, 77L, 77L, 80L, 77L, 80L, 77L, 76L, 77L, 76L, 77L, 77L, 80L, 79L, 77L, 78L, 80L, 80L, 79L, 80L, 79L, 79L, 78L, 76L, 76L, 79L, 79L, 80L, 79L, 78L, 76L, 79L, 77L, 77L, 76L, 76L, 78L, 78L, 79L, 78L, 76L, 78L, 79L, 76L, 77L, 78L, 76L, 79L, 78L, 77L), Day92 = c(85L, 84L, 85L, 85L, 83L, 82L, 83L, 82L, 85L, 85L, 82L, 85L, 85L, 85L, 81L, 81L, 84L, 81L, 85L, 82L, 85L, 84L, 81L, 82L, 83L, 82L, 84L, 84L, 81L, 85L, 83L, 85L, 82L, 81L, 83L, 83L, 85L, 83L, 81L, 83L, 82L, 84L, 83L, 83L, 82L, 85L, 85L, 82L, 82L, 82L, 85L, 81L, 81L, 82L, 82L, 84L, 81L, 85L, 81L, 82L, 81L, 81L, 85L, 83L, 81L, 83L, 83L, 84L, 83L, 85L, 85L, 83L, 81L, 85L, 81L, 84L, 83L, 83L, 85L, 83L, 82L, 82L, 82L, 83L, 82L, 83L, 81L, 84L, 83L, 84L, 82L, 83L, 81L, 83L, 81L, 82L, 82L, 82L, 85L, 85L, 84L, 81L, 81L, 81L, 84L, 81L, 84L, 81L, 81L, 84L, 84L, 83L, 83L, 82L, 82L, 81L, 85L, 85L, 82L, 83L, 81L, 83L, 82L, 84L, 83L, 82L, 84L, 81L, 83L, 82L, 84L, 85L), Day102 = c(89L, 88L, 88L, 90L, 88L, 90L, 87L, 88L, 89L, 87L, 90L, 86L, 86L, 89L, 86L, 89L, 90L, 88L, 87L, 88L, 88L, 87L, 90L, 86L, 90L, 87L, 88L, 89L, 88L, 90L, 88L, 87L, 89L, 90L, 88L, 87L, 89L, 88L, 87L, 86L, 90L, 86L, 89L, 89L, 90L, 88L, 90L, 86L, 88L, 88L, 90L, 89L, 88L, 88L, 90L, 87L, 88L, 88L, 87L, 90L, 89L, 87L, 90L, 90L, 86L, 87L, 86L, 90L, 88L, 87L, 86L, 88L, 90L, 86L, 89L, 90L, 87L, 87L, 88L, 86L, 86L, 89L, 89L, 86L, 87L, 86L, 86L, 88L, 88L, 88L, 89L, 90L, 88L, 86L, 88L, 88L, 87L, 88L, 90L, 89L, 89L, 86L, 90L, 89L, 89L, 88L, 90L, 88L, 86L, 90L, 90L, 87L, 89L, 90L, 90L, 88L, 88L, 89L, 90L, 88L, 90L, 90L, 87L, 89L, 90L, 90L, 90L, 89L, 86L, 88L, 89L, 88L)), class = "data.frame", row.names = c(NA, -132L))
Required libraries:
tidyr, plyr, ggplot2
The steps that I have taken so far are to:
Convert the data to long format (df = name of dataset):
Fig1 <- gather(df, day, phosphorus, Day1:Day102, factor_key=TRUE)
Change the factor day to numeric
df$day2 <-revalue(df$day, c("Day1"="1", "Day2"="2", "Day4"="4", "Day7"="7", "Day10"="10", "Day13"="13", "Day18"="18", "Day23" = "23","Day28" = "28", "Day35" = "35", "Day42" = "42", "Day52" = "52", "Day62" = "62", "Day72" = "72", "Day82" = "82", Day92" = "92", "Day102" = "102"))
and
df$day3 <- as.numeric(as.character(df$day2))
Group by Trmt, Level and day3
GroupedDF <- df %>% group_by(Trmt, Level, day3)
GroupedCO2M <- GroupedDF %>% summarise(disp = mean(phosphorus))
I would now like to subtract values by accounting for Trmt and Level, thus reducing the number of rows from 102 to 51. I would like to subtract 'yT' Trmt cases from respective 'nF' cases, uniquely for each Level (X, Y and Z). For example, subtract A01yT_X from A01nf_X, A01yT_Y from A01nf_Y, A01yT_Z from A01nf_Z etc. This should give a total of 51 points, 17 for each Level.
Here is a figure of what I have in mind:
Many thanks for any advice.
thanks for sharing the data. The data you have posted is a bit long, hence might not be able to totally copy and paste
Your data is in the wide format, and you need to find the average for each measurement between similar groups (defined by Day, Level, Treatment). So we can work on this in the wide format:
tmp <- Data %>% group_by(Trmt,Level) %>% summarise_all(mean)
> head(tmp)
# A tibble: 6 x 19
# Groups: Trmt [2]
Trmt Level Day1 Day2 Day4 Day7 Day10 Day13 Day18 Day23 Day28 Day35 Day42
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A01nF X 3.5 8 12 19 23 29.5 32.5 36.5 42 50 53
2 A01nF Y 4.5 9.5 13 17.5 21 28 32.5 36 43.5 48 54.5
3 A01nF Z 1 8.5 13.5 18.5 22.5 28.5 33 37.5 43 49 51.5
4 A01yT X 2.5 8.5 11 19.5 22.5 28 31.5 38 43 50 52.5
5 A01yT Y 2.5 7.5 13.5 17 22 29.5 31 38.5 43.5 49 52.5
6 A01yT Z 3 7 14.5 18 23 28 33 38 43.5 48 54
This gives you the average for each Trmt,Level, and each column (Day) is average separately. Next step is to define the 2 subgroups under Trmt (nF and yT for A01,A02..), and for this we can introduce a subgroup called "site", which is Trmt without the nF,yT. Once you group your data.frame with this "site" and level, the first row will always be nF, and 2nd row yT, so taking the diff for all your Day columns within this grouping, will give you the difference. So we do it like this:
# need to ungroup Trmt to remove it later
tmp <- tmp%>% ungroup(Trmt) %>%
mutate(site = sub("[yn][TF]","",Trmt)) %>%
select(-Trmt) %>%
group_by(site,Level) %>%
summarize_all(diff)
Now you have the nF - yT values for each treatment, each level and each day
> head(tmp)
# A tibble: 6 x 19
# Groups: site [2]
site Level Day1 Day2 Day4 Day7 Day10 Day13 Day18 Day23 Day28 Day35 Day42
<chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A01 X -1 0.5 -1 0.5 -0.5 -1.5 -1 1.5 1 0 -0.5
2 A01 Y -2 -2 0.5 -0.5 1 1.5 -1.5 2.5 0 1 -2
3 A01 Z 2 -1.5 1 -0.5 0.5 -0.5 0 0.5 0.5 -1 2.5
4 A02 X 1.5 1 1.5 1 -1 -1.5 2 -1.5 -1.5 -1 2
5 A02 Y 0.5 0 -1.5 -1 0.5 1.5 -0.5 -3 -1.5 0 1
6 A02 Z 4 2 1 0.5 1.5 0 2.5 0.5 0.5 1.5 0
Come the last part, which is to plot. We convert it to long and also make "Day", a numeric form of day.
plotdf <- gather(tmp, day, Diff, Day1:Day102, factor_key=TRUE) %>%
mutate(Day=as.numeric(sub("Day","",day)))
# and plot
ggplot(plotdf,aes(x=Day,y=Diff,col=Level,shape=Level)) + geom_line() + geom_point() + facet_wrap(~site) + scale_color_manual(values=c("grey10","grey40","grey80"))
Plot above shows the difference for each site. For diff that is the average across all sites:
meandf <- plotdf %>% group_by(Level,Day) %>% summarize(Diff=mean(Diff))
ggplot(meandf,aes(x=Day,y=Diff,col=Level,shape=Level)) + geom_line() + geom_point() + scale_color_manual(values=c("grey10","grey40","grey80"))
example dataset, subsetted for Day1, Day2 and Day4
Data <- structure(list(Trmt = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L,
3L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 5L, 5L, 5L, 8L, 8L, 8L,
8L, 8L, 8L, 7L, 7L, 7L, 7L, 7L, 7L, 10L, 10L, 10L, 10L, 10L,
10L, 9L, 9L, 9L, 9L, 9L, 9L, 12L, 12L, 12L, 12L, 12L, 12L, 11L,
11L, 11L, 11L, 11L, 11L, 14L, 14L, 14L, 14L, 14L, 14L, 13L, 13L,
13L, 13L, 13L, 13L, 16L, 16L, 16L, 16L, 16L, 16L, 15L, 15L, 15L,
15L, 15L, 15L, 18L, 18L, 18L, 18L, 18L, 18L, 17L, 17L, 17L, 17L,
17L, 17L, 20L, 20L, 20L, 20L, 20L, 20L, 19L, 19L, 19L, 19L, 19L,
19L, 22L, 22L, 22L, 22L, 22L, 22L, 21L, 21L, 21L, 21L, 21L, 21L
), .Label = c("A01nF", "A01yT", "A02nF", "A02yT", "A03nF", "A03yT",
"A04nF", "A04yT", "A05nF", "A05yT", "A06nF", "A06yT", "A07nF",
"A07yT", "A08nF", "A08yT", "A10nF", "A10yT", "A11nF", "A11yT",
"A13nF", "A13yT"), class = "factor"), Level = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L), .Label = c("X", "Y", "Z"), class = "factor"), Day1 = c(3L,
1L, 4L, 2L, 4L, 2L, 5L, 4L, 1L, 2L, 5L, 1L, 5L, 2L, 5L, 5L, 3L,
5L, 3L, 3L, 1L, 4L, 1L, 1L, 5L, 4L, 1L, 5L, 4L, 5L, 3L, 5L, 3L,
5L, 3L, 4L, 2L, 4L, 2L, 4L, 3L, 1L, 1L, 3L, 1L, 3L, 1L, 5L, 2L,
4L, 4L, 3L, 1L, 4L, 4L, 1L, 4L, 1L, 2L, 5L, 1L, 5L, 1L, 2L, 4L,
4L, 4L, 4L, 2L, 4L, 5L, 5L, 4L, 1L, 3L, 2L, 3L, 5L, 4L, 3L, 2L,
3L, 5L, 4L, 1L, 3L, 4L, 3L, 3L, 5L, 3L, 1L, 1L, 4L, 4L, 5L, 1L,
4L, 4L, 4L, 1L, 4L, 5L, 5L, 1L, 5L, 3L, 1L, 4L, 1L, 4L, 5L, 5L,
3L, 3L, 2L, 4L, 5L, 3L, 2L, 1L, 5L, 5L, 2L, 2L, 3L, 4L, 3L, 4L,
2L, 2L, 4L), Day2 = c(10L, 9L, 7L, 7L, 6L, 7L, 10L, 9L, 10L,
6L, 10L, 7L, 8L, 9L, 8L, 9L, 7L, 10L, 7L, 10L, 6L, 8L, 6L, 8L,
8L, 8L, 10L, 6L, 8L, 8L, 6L, 10L, 7L, 10L, 7L, 10L, 6L, 6L, 7L,
9L, 8L, 10L, 8L, 7L, 9L, 8L, 6L, 9L, 7L, 9L, 8L, 6L, 6L, 8L,
10L, 7L, 8L, 6L, 8L, 8L, 6L, 9L, 10L, 6L, 8L, 7L, 9L, 7L, 8L,
10L, 10L, 6L, 7L, 10L, 9L, 9L, 8L, 9L, 6L, 8L, 6L, 8L, 6L, 9L,
10L, 7L, 7L, 7L, 8L, 7L, 8L, 10L, 7L, 8L, 9L, 6L, 8L, 9L, 8L,
9L, 6L, 7L, 10L, 9L, 10L, 7L, 6L, 9L, 9L, 9L, 6L, 10L, 9L, 8L,
9L, 7L, 10L, 7L, 10L, 9L, 6L, 8L, 9L, 8L, 9L, 6L, 6L, 10L, 9L,
8L, 8L, 7L), Day4 = c(11L, 12L, 14L, 11L, 15L, 15L, 12L, 11L,
15L, 12L, 15L, 12L, 12L, 11L, 15L, 15L, 13L, 11L, 13L, 14L, 12L,
11L, 13L, 12L, 15L, 15L, 14L, 11L, 15L, 11L, 12L, 11L, 13L, 11L,
12L, 13L, 13L, 14L, 13L, 15L, 14L, 15L, 12L, 14L, 11L, 13L, 15L,
11L, 12L, 13L, 11L, 15L, 11L, 13L, 11L, 11L, 14L, 12L, 14L, 15L,
11L, 12L, 15L, 12L, 13L, 12L, 14L, 12L, 11L, 13L, 12L, 12L, 11L,
15L, 13L, 12L, 11L, 12L, 13L, 14L, 14L, 14L, 13L, 12L, 15L, 12L,
15L, 15L, 12L, 13L, 12L, 12L, 12L, 14L, 13L, 13L, 14L, 11L, 12L,
11L, 15L, 11L, 11L, 11L, 14L, 11L, 12L, 15L, 15L, 11L, 12L, 14L,
15L, 14L, 14L, 12L, 14L, 13L, 15L, 15L, 14L, 13L, 12L, 15L, 15L,
11L, 13L, 12L, 11L, 13L, 12L, 14L)), class = "data.frame", row.names = c(NA,
-132L))

Tabu search in R

Good evening,
As part of a data analysis course we have been thrown into the Metaheuristics realm.....and I am really struggling to understand how to implement a Tabu search in R since my background in programming is rather limited.
I haven't found any R or Python example on Google or youtube either so I'm really praying I'll find something here.
The problem I have is similar to the "location problem" in optimisation. I need to find the best combination of Hubs that minimizes the total distance between Hubs and nodes.
I need to find 5 hubs, and the total capacity for each one is 120
nodes <- structure(list(node_number = 1:50,
x = c(2L, 80L, 36L, 57L, 33L, 76L, 77L, 94L,
89L, 59L, 39L, 87L, 44L, 2L, 19L, 5L,
58L, 14L, 43L, 87L, 11L, 31L, 51L, 55L,
84L, 12L, 53L, 53L, 33L, 69L, 43L, 10L,
8L, 3L, 96L, 6L, 59L, 66L, 22L, 75L, 4L,
41L, 92L, 12L, 60L, 35L, 38L, 9L, 54L, 1L),
y = c(62L, 25L, 88L, 23L, 17L, 43L, 85L, 6L, 11L,
72L, 82L, 24L, 76L, 83L, 43L, 27L, 72L, 50L,
18L, 7L, 56L, 16L, 94L, 13L, 57L, 2L, 33L, 10L,
32L, 67L, 5L, 75L, 26L, 1L, 22L, 48L, 22L, 69L,
50L, 21L, 81L, 97L, 34L, 64L, 84L, 100L, 2L, 9L, 59L, 58L),
node_demand = c(3L, 14L, 1L, 14L, 19L, 2L, 14L, 6L,
7L, 6L, 10L, 18L, 3L, 6L, 20L, 4L,
14L, 11L, 19L, 15L, 15L, 4L, 13L,
13L, 5L, 16L, 3L, 7L, 14L, 17L,
3L, 3L, 12L, 14L, 20L, 13L, 10L,
9L, 6L, 18L, 7L, 20L, 9L, 1L, 8L,
5L, 1L, 7L, 9L, 2L)),
.Names = c("node_number", "x", "y", "node_demand"),
class = "data.frame", row.names = c(NA, -50L))
hubs_required = 5
total_capacity = 120
My strategy was to create a distance matrix, then I will create another 50 x 50 matrix to represent wether a node becomes a hub or not, and finally I will multiply both and add all the distances to get the total distance.
I created the dataframe:
nodes_df <- as.data.frame(nodes)
colnames(nodes_df) <- c("x", "y", "node_demand")
rownames(nodes_df) <- paste('Node',1:50)
I created the distance matrix
distance_df <-as.data.frame(as.matrix(round(dist(nodes_df,method = "euclidean",diag = TRUE,upper = TRUE))))
colnames(distance_df) <- paste("Node",1:50)
I created the node demand matrix:
demand <- as.vector(rep(c(nodes_df[,'node_demand']),50))
demand_matrix <- matrix(demand,nrow=50,ncol=50,byrow = TRUE)
diag(demand_matrix) <- 0
demand_matrix <- as.data.frame(demand_matrix)
I created an empty matrix to show whether a node becomes a hub "1" or not "0"
hubs_matrix <- matrix(0,nrow = 50,ncol = 50,byrow = TRUE)
colnames(hubs_matrix) <- paste("Hub",1:50)
rownames(hubs_matrix) <- paste("Node",1:50)
Then to create the initial solution I randomly assign Hubs and calculate the distance and demand.
set.seed(37)
hubs_matrix <- do.call("cbind", lapply(1:50, function(x) sample(c(1, rep(0, 49)), 50)))
sum_distances <- (hubs_matrix * distance_df)
sum(rowSums(sum_distances))
The idea is to try different combinations of '1'' and '0' as to minimise the total distance but I am having the following issues:
I got no idea how to do the local search and do the permutations from the initial solution.
I got no idea how to prevent R to use the best solution for a certain period of time, i.e the Tabu list
I got no idea how to deal with the supply restriction for each node ( total demand from each node < 120), I could do it with a loop but since in this case I'm multiplying matrices I'm pretty lost.
Anybody could give me a hand???
Many thanks!

Resources