R script to average values every X days using different starting points - r
I have a long data set measuring height of trees once a week for 8 months. Also recorded are pot number ('pot'), the date of measuring ('date'), weeks since the start of the experiment ('no.week'), germination date ('germination'), weeks since germination ('after.germ').
I'm wanting to average tree height over 3 weeks starting at the week of germination.
For example, the experiment started on 3/25. Pot 3 germinated on 4/15 (no. week= 2). Pot 4 germinated on 4/29 (no. week= 4). I want to average the height of pot 3 starting on 4/15 and pot 4 starting on 4/29, and continue to average every 3 weeks for the duration of the experiment.
The key is starting the average at different points for each pot.
Any advice and tips would be great!
Subset:
pot table germination week no.week after.germ date height stem
61 3 2 4/15/2022 w1 1 NA 3/25/2022 NA NA
62 3 2 4/15/2022 w2 2 NA 4/15/2022 NA NA
63 3 2 4/15/2022 w3 3 1 4/22/2022 4.6 NA
64 3 2 4/15/2022 w4 4 2 4/29/2022 18.5 NA
65 3 2 4/15/2022 w5 5 3 5/6/2022 18.1 1
66 3 2 4/15/2022 w6 6 4 5/13/2022 18.1 1
67 3 2 4/15/2022 w7 7 5 5/20/2022 17.8 1
68 3 2 4/15/2022 w8 8 6 5/26/2022 19.4 1
69 3 2 4/15/2022 w9 9 7 6/3/2022 18.8 1
70 3 2 4/15/2022 w10 10 8 6/10/2022 19.3 1
71 3 2 4/15/2022 w11 11 9 6/17/2022 18.3 1
72 3 2 4/15/2022 w12 12 10 6/24/2022 18.6 1
73 3 2 4/15/2022 w13 13 11 7/1/2022 19.2 1
74 3 2 4/15/2022 w14 14 12 7/8/2022 19.2 1
75 3 2 4/15/2022 w15 15 13 7/15/2022 18.9 1
76 3 2 4/15/2022 w16 16 14 7/22/2022 15.3 1
77 3 2 4/15/2022 w17 17 15 7/29/2022 19.1 1
78 3 2 4/15/2022 w18 18 16 8/5/2022 19.0 1
79 3 2 4/15/2022 w19 19 17 8/12/2022 19.0 1
80 3 2 4/15/2022 w20 20 18 8/19/2022 19.8 1
81 3 2 4/15/2022 w21 21 19 8/26/2022 18.2 1
82 3 2 4/15/2022 w22 22 20 9/2/2022 19.2 1
83 3 2 4/15/2022 w24 24 21 9/16/2022 18.1 1
84 3 2 4/15/2022 w23 23 22 9/22/2022 19.2 1
85 3 2 4/15/2022 w25 25 23 9/30/2022 15.4 1
86 3 2 4/15/2022 w26 26 24 10/7/2022 18.4 1
87 3 2 4/15/2022 w27 27 25 10/14/2022 19.2 1
88 3 2 4/15/2022 w28 28 26 10/21/2022 19.0 1
89 3 2 4/15/2022 w29 29 27 10/29/2022 18.7 1
90 3 2 4/15/2022 w30 30 28 11/4/2022 19.3 1
91 6 4 4/29/2022 w1 1 NA 3/25/2022 NA NA
92 6 4 4/29/2022 w2 2 NA 4/15/2022 NA NA
93 6 4 4/29/2022 w3 3 NA 4/22/2022 NA NA
94 6 4 4/29/2022 w4 4 1 4/29/2022 16.7 NA
95 6 4 4/29/2022 w5 5 2 5/6/2022 17.5 1
96 6 4 4/29/2022 w6 6 3 5/13/2022 18.8 NA
97 6 4 4/29/2022 w7 7 4 5/20/2022 18.0 NA
98 6 4 4/29/2022 w8 8 5 5/26/2022 17.2 NA
99 6 4 4/29/2022 w9 9 6 6/3/2022 17.7 NA
100 6 4 4/29/2022 w10 10 7 6/10/2022 17.9 NA
101 6 4 4/29/2022 w11 11 8 6/17/2022 18.7 NA
102 6 4 4/29/2022 w12 12 9 6/24/2022 18.1 NA
103 6 4 4/29/2022 w13 13 10 7/1/2022 17.3 NA
104 6 4 4/29/2022 w14 14 11 7/8/2022 13.8 NA
105 6 4 4/29/2022 w15 15 12 7/15/2022 18.4 1
106 6 4 4/29/2022 w16 16 13 7/22/2022 19.0 1
107 6 4 4/29/2022 w17 17 14 7/29/2022 18.8 1
108 6 4 4/29/2022 w18 18 15 8/5/2022 NA 1
109 6 4 4/29/2022 w19 19 16 8/12/2022 19.0 1
110 6 4 4/29/2022 w20 20 17 8/19/2022 19.3 1
111 6 4 4/29/2022 w21 21 18 8/26/2022 18.6 1
112 6 4 4/29/2022 w22 22 19 9/2/2022 18.2 1
113 6 4 4/29/2022 w24 24 20 9/16/2022 18.0 1
114 6 4 4/29/2022 w23 23 21 9/22/2022 18.8 1
115 6 4 4/29/2022 w25 25 22 9/30/2022 19.7 1
116 6 4 4/29/2022 w26 26 23 10/7/2022 17.4 1
117 6 4 4/29/2022 w27 27 24 10/14/2022 18.8 1
118 6 4 4/29/2022 w28 28 25 10/21/2022 19.9 1
119 6 4 4/29/2022 w29 29 26 10/29/2022 17.9 1
120 6 4 4/29/2022 w30 30 27 11/4/2022 19.5 1
211 10 2 4/29/2022 w1 1 NA 3/25/2022 NA NA
212 10 2 4/29/2022 w2 2 NA 4/15/2022 NA NA
213 10 2 4/29/2022 w3 3 NA 4/22/2022 NA NA
214 10 2 4/29/2022 w4 4 NA 4/29/2022 NA NA
215 10 2 4/29/2022 w5 5 1 5/6/2022 9.5 1
216 10 2 4/29/2022 w6 6 2 5/13/2022 15.4 NA
217 10 2 4/29/2022 w7 7 3 5/20/2022 14.3 NA
218 10 2 4/29/2022 w8 8 4 5/26/2022 15.8 NA
219 10 2 4/29/2022 w9 9 5 6/3/2022 16.1 NA
220 10 2 4/29/2022 w10 10 6 6/10/2022 16.1 NA
221 10 2 4/29/2022 w11 11 7 6/17/2022 15.9 NA
222 10 2 4/29/2022 w12 12 8 6/24/2022 16.3 NA
223 10 2 4/29/2022 w13 13 9 7/1/2022 16.2 NA
224 10 2 4/29/2022 w14 14 10 7/8/2022 16.4 NA
225 10 2 4/29/2022 w15 15 11 7/15/2022 15.7 1
226 10 2 4/29/2022 w16 16 12 7/22/2022 15.5 1
227 10 2 4/29/2022 w17 17 13 7/29/2022 15.7 1
228 10 2 4/29/2022 w18 18 14 8/5/2022 15.5 1
229 10 2 4/29/2022 w19 19 15 8/12/2022 16.0 1
230 10 2 4/29/2022 w20 20 16 8/19/2022 15.9 1
231 10 2 4/29/2022 w21 21 17 8/26/2022 15.7 1
232 10 2 4/29/2022 w22 22 18 9/2/2022 15.5 1
233 10 2 4/29/2022 w24 24 19 9/16/2022 15.1 1
234 10 2 4/29/2022 w23 23 20 9/22/2022 15.8 1
235 10 2 4/29/2022 w25 25 21 9/30/2022 15.8 1
236 10 2 4/29/2022 w26 26 22 10/7/2022 15.1 1
237 10 2 4/29/2022 w27 27 23 10/14/2022 15.9 1
238 10 2 4/29/2022 w28 28 24 10/21/2022 16.5 1
239 10 2 4/29/2022 w29 29 25 10/29/2022 15.7 1
240 10 2 4/29/2022 w30 30 26 11/4/2022 16.2 1
271 14 2 4/15/2022 w1 1 NA 3/25/2022 NA NA
272 14 2 4/15/2022 w2 2 NA 4/15/2022 NA NA
273 14 2 4/15/2022 w3 3 1 4/22/2022 5.8 NA
274 14 2 4/15/2022 w4 4 2 4/29/2022 19.7 NA
275 14 2 4/15/2022 w5 5 3 5/6/2022 20.1 1
276 14 2 4/15/2022 w6 6 4 5/13/2022 19.4 1
277 14 2 4/15/2022 w7 7 5 5/20/2022 20.0 1
278 14 2 4/15/2022 w8 8 6 5/26/2022 19.6 1
279 14 2 4/15/2022 w9 9 7 6/3/2022 19.6 1
280 14 2 4/15/2022 w10 10 8 6/10/2022 20.2 1
281 14 2 4/15/2022 w11 11 9 6/17/2022 21.1 1
282 14 2 4/15/2022 w12 12 10 6/24/2022 21.3 1
283 14 2 4/15/2022 w13 13 11 7/1/2022 19.4 NA
284 14 2 4/15/2022 w14 14 12 7/8/2022 20.3 NA
285 14 2 4/15/2022 w15 15 13 7/15/2022 19.5 1
286 14 2 4/15/2022 w16 16 14 7/22/2022 19.3 1
287 14 2 4/15/2022 w17 17 15 7/29/2022 22.4 1
288 14 2 4/15/2022 w18 18 16 8/5/2022 20.0 1
289 14 2 4/15/2022 w19 19 17 8/12/2022 20.0 1
290 14 2 4/15/2022 w20 20 18 8/19/2022 20.4 1
291 14 2 4/15/2022 w21 21 19 8/26/2022 19.6 1
I calculated the rolling average, but a rolling average isn't quite what I am looking for since I need an average over a distinct period and starting at different points.
library(zoo)
cg22_avg<-cg22_long %>%
dplyr:::group_by(pot) %>%
dplyr:::mutate('3wkavg' = rollmean(height, 3, align="right", na.pad=TRUE ))
Perhaps this use of the slider package will help:
quux %>%
mutate(across(c(germination, date), ~ as.Date(., format = "%m/%d/%Y"))) %>%
dplyr::filter(date >= germination) %>%
group_by(pot) %>%
mutate(avg3 = slider::slide_period_dbl(.x=height, .i=date, .period="week", .f=mean, .before=3)) %>%
ungroup()
# # A tibble: 103 × 10
# pot table germination week no.week after.germ date height stem avg3
# <int> <int> <date> <chr> <int> <int> <date> <dbl> <int> <dbl>
# 1 3 2 2022-04-15 w2 2 NA 2022-04-15 NA NA NA
# 2 3 2 2022-04-15 w3 3 1 2022-04-22 4.6 NA NA
# 3 3 2 2022-04-15 w4 4 2 2022-04-29 18.5 NA NA
# 4 3 2 2022-04-15 w5 5 3 2022-05-06 18.1 1 NA
# 5 3 2 2022-04-15 w6 6 4 2022-05-13 18.1 1 14.8
# 6 3 2 2022-04-15 w7 7 5 2022-05-20 17.8 1 18.1
# 7 3 2 2022-04-15 w8 8 6 2022-05-26 19.4 1 18.4
# 8 3 2 2022-04-15 w9 9 7 2022-06-03 18.8 1 18.5
# 9 3 2 2022-04-15 w10 10 8 2022-06-10 19.3 1 18.8
# 10 3 2 2022-04-15 w11 11 9 2022-06-17 18.3 1 19.0
# # … with 93 more rows
# # ℹ Use `print(n = ...)` to see more rows
I chose .period="week", .before=3, which should include all observations within exactly 3 weeks. An alternative would be .period="day", .before=20 if you want 3 weeks but not including 3 weeks ago from today. There should be plenty of room to play here.
Data
quux <- structure(list(pot = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L), table = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), germination = c("4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/29/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022", "4/15/2022"), week = c("w1", "w2", "w3", "w4", "w5", "w6", "w7", "w8", "w9", "w10", "w11", "w12", "w13", "w14", "w15", "w16", "w17", "w18", "w19", "w20", "w21", "w22", "w24", "w23", "w25", "w26", "w27", "w28", "w29", "w30", "w1", "w2", "w3", "w4", "w5", "w6", "w7", "w8", "w9", "w10", "w11", "w12", "w13", "w14", "w15", "w16", "w17", "w18", "w19", "w20", "w21", "w22", "w24", "w23", "w25", "w26", "w27", "w28", "w29", "w30", "w1", "w2", "w3", "w4", "w5", "w6", "w7", "w8", "w9", "w10", "w11", "w12", "w13", "w14", "w15", "w16", "w17", "w18", "w19", "w20", "w21", "w22", "w24", "w23", "w25", "w26", "w27", "w28", "w29", "w30", "w1", "w2", "w3", "w4", "w5", "w6", "w7", "w8", "w9", "w10", "w11", "w12", "w13", "w14", "w15", "w16", "w17", "w18", "w19", "w20", "w21"), no.week = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 24L, 23L, 25L, 26L, 27L, 28L, 29L, 30L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 24L, 23L, 25L, 26L, 27L, 28L, 29L, 30L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 24L, 23L, 25L, 26L, 27L, 28L, 29L, 30L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), after.germ = c(NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, NA, NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, NA, NA, NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L), date = c("3/25/2022", "4/15/2022", "4/22/2022", "4/29/2022", "5/6/2022", "5/13/2022", "5/20/2022", "5/26/2022", "6/3/2022", "6/10/2022", "6/17/2022", "6/24/2022", "7/1/2022", "7/8/2022", "7/15/2022", "7/22/2022", "7/29/2022", "8/5/2022", "8/12/2022", "8/19/2022", "8/26/2022", "9/2/2022", "9/16/2022", "9/22/2022", "9/30/2022", "10/7/2022", "10/14/2022", "10/21/2022", "10/29/2022", "11/4/2022", "3/25/2022", "4/15/2022", "4/22/2022", "4/29/2022", "5/6/2022", "5/13/2022", "5/20/2022", "5/26/2022", "6/3/2022", "6/10/2022", "6/17/2022", "6/24/2022", "7/1/2022", "7/8/2022", "7/15/2022", "7/22/2022", "7/29/2022", "8/5/2022", "8/12/2022", "8/19/2022", "8/26/2022", "9/2/2022", "9/16/2022", "9/22/2022", "9/30/2022", "10/7/2022", "10/14/2022", "10/21/2022", "10/29/2022", "11/4/2022", "3/25/2022", "4/15/2022", "4/22/2022", "4/29/2022", "5/6/2022", "5/13/2022", "5/20/2022", "5/26/2022", "6/3/2022", "6/10/2022", "6/17/2022", "6/24/2022", "7/1/2022", "7/8/2022", "7/15/2022", "7/22/2022", "7/29/2022", "8/5/2022", "8/12/2022", "8/19/2022", "8/26/2022", "9/2/2022", "9/16/2022", "9/22/2022", "9/30/2022", "10/7/2022", "10/14/2022", "10/21/2022", "10/29/2022", "11/4/2022", "3/25/2022", "4/15/2022", "4/22/2022", "4/29/2022", "5/6/2022", "5/13/2022", "5/20/2022", "5/26/2022", "6/3/2022", "6/10/2022", "6/17/2022", "6/24/2022", "7/1/2022", "7/8/2022", "7/15/2022", "7/22/2022", "7/29/2022", "8/5/2022", "8/12/2022", "8/19/2022", "8/26/2022"), height = c(NA, NA, 4.6, 18.5, 18.1, 18.1, 17.8, 19.4, 18.8, 19.3, 18.3, 18.6, 19.2, 19.2, 18.9, 15.3, 19.1, 19, 19, 19.8, 18.2, 19.2, 18.1, 19.2, 15.4, 18.4, 19.2, 19, 18.7, 19.3, NA, NA, NA, 16.7, 17.5, 18.8, 18, 17.2, 17.7, 17.9, 18.7, 18.1, 17.3, 13.8, 18.4, 19, 18.8, NA, 19, 19.3, 18.6, 18.2, 18, 18.8, 19.7, 17.4, 18.8, 19.9, 17.9, 19.5, NA, NA, NA, NA, 9.5, 15.4, 14.3, 15.8, 16.1, 16.1, 15.9, 16.3, 16.2, 16.4, 15.7, 15.5, 15.7, 15.5, 16, 15.9, 15.7, 15.5, 15.1, 15.8, 15.8, 15.1, 15.9, 16.5, 15.7, 16.2, NA, NA, 5.8, 19.7, 20.1, 19.4, 20, 19.6, 19.6, 20.2, 21.1, 21.3, 19.4, 20.3, 19.5, 19.3, 22.4, 20, 20, 20.4, 19.6), stem = c(NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c("61", "62", "63", "64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79", "80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90", "91", "92", "93", "94", "95", "96", "97", "98", "99", "100", "101", "102", "103", "104", "105", "106", "107", "108", "109", "110", "111", "112", "113", "114", "115", "116", "117", "118", "119", "120", "211", "212", "213", "214", "215", "216", "217", "218", "219", "220", "221", "222", "223", "224", "225", "226", "227", "228", "229", "230", "231", "232", "233", "234", "235", "236", "237", "238", "239", "240", "271", "272", "273", "274", "275", "276", "277", "278", "279", "280", "281", "282", "283", "284", "285", "286", "287", "288", "289", "290", "291"))
Related
R: ranking variable per trial according to time column
My data looks like this: Subject Trial Task Time Fixation .. 1 1 2 1 0.335 1 1 2 456 NA 1 1 2 765 0.165 1 1 2 967 0.445 .. 2 3 1 1 0.665 2 3 1 300 0.556 2 3 1 570 NA 2 3 1 900 NA .. 15 5 3 1 0.766 15 5 3 567 0.254 15 5 3 765 0.167 15 5 3 1465 NA .. I want to create a column FixationID where I want to rank every Fixation per Trial according to Time column (1,2,3,4..). Time column shows time course in milliseconds for every trial and every Trial starts with 1. Trials have different lengths. I want my data to look like this: Subject Trial Task Time Fixation FixationID .. 1 1 2 1 0.335 1 1 1 2 456 NA NA 1 1 2 765 0.165 2 1 1 2 967 0.445 3 .. 2 3 1 1 0.665 1 2 3 1 300 0.556 2 2 3 1 570 NA NA 2 3 1 900 NA NA .. 15 5 3 1 0.766 1 15 5 3 567 0.254 2 15 5 3 765 0.167 3 15 5 3 1465 NA NA .. I tried library(data.table) setDT(mydata)[!is.na(Fixation), FixID := seq_len(.N)[order(Time)], by = Trial] but what I get is ranking 1,16,31,45,57.. for my Subject 1 Trial 1. I want 1,2,3,4,5.. Can anyone help me with this? Excerpt from my data: structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Trial = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Task = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Time = c(1L, 385L, 571L, 638L, 951L, 1020L, 1349L, 1401L, 1661L, 1706L, 2042L, 2067L, 2322L, 2375L, 2540L, 2660L, 2686L, 3108L, 3172L, 3423L, 3462L, 3845L, 3870L, 3969L, 4099L, 4132L, 1L, 471L, 513L, 697L), Fixation = c(0.383, 0.185, NA, 0.312, NA, 0.328, NA, 0.259, NA, 0.335, NA, 0.254, NA, 0.164, 0.119, NA, 0.421, NA, 0.25, NA, 0.382, NA, 0.0979999999999999, 0.129, NA, 0.335, 0.469, NA, 0.183, NA)), .Names = c("Subject", "Trial", "Task", "Time", "Fixation"), row.names = c(NA, 30L), class = "data.frame")
What about this: library(data.table) setDT(mydata) mydata[!is.na(Fixation), FixID := frank(Time), by = Trial] head(mydata, 10) Subject Trial Task Time Fixation FixID 1: 1 1 2 1 0.383 1 2: 1 1 2 385 0.185 2 3: 1 1 2 571 NA NA 4: 1 1 2 638 0.312 3 5: 1 1 2 951 NA NA 6: 1 1 2 1020 0.328 4 7: 1 1 2 1349 NA NA 8: 1 1 2 1401 0.259 5 9: 1 1 2 1661 NA NA 10: 1 1 2 1706 0.335 6 tail(mydata, 10) Subject Trial Task Time Fixation FixID 1: 1 1 2 3462 0.382 12 2: 1 1 2 3845 NA NA 3: 1 1 2 3870 0.098 13 4: 1 1 2 3969 0.129 14 5: 1 1 2 4099 NA NA 6: 1 1 2 4132 0.335 15 7: 1 2 2 1 0.469 1 8: 1 2 2 471 NA NA 9: 1 2 2 513 0.183 2 10: 1 2 2 697 NA NA
Using ave on as.logical(Fixation) and #josliber's NA-ignoring cumsum code. mydata$FixationID <- with(mydata, ave(as.logical(Fixation), Subject, Trial, FUN=function(x) cumsum(ifelse(is.na(x), 0, x)) + x*0)) Result mydata # Subject Trial task Time Fixation FixationID # 1 1 1 1 1 0.596 1 # 10 1 1 1 500 0.016 2 # 19 1 1 1 512 NA NA # 28 1 1 1 524 NA NA # 4 1 2 2 1 0.688 1 # 13 1 2 2 501 NA NA # 22 1 2 2 513 NA NA # 31 1 2 2 525 NA NA # 7 1 3 3 1 0.582 1 # 16 1 3 3 502 NA NA # 25 1 3 3 514 0.369 2 # 34 1 3 3 526 0.847 3 # 2 2 1 1 1 NA NA # 11 2 1 1 503 0.779 1 # 20 2 1 1 515 0.950 2 # 29 2 1 1 527 0.304 3 # 5 2 2 2 1 0.158 1 # 14 2 2 2 504 0.281 2 # 23 2 2 2 516 0.360 3 # 32 2 2 2 528 0.535 4 # 8 2 3 3 1 NA NA # 17 2 3 3 505 0.717 1 # 26 2 3 3 517 NA NA # 35 2 3 3 529 0.959 2 # 3 3 1 1 1 0.174 1 # 12 3 1 1 506 0.278 2 # 21 3 1 1 518 0.784 3 # 30 3 1 1 530 NA NA # 6 3 2 2 1 0.439 1 # 15 3 2 2 507 0.857 2 # 24 3 2 2 519 NA NA # 33 3 2 2 531 0.019 3 # 9 3 3 3 1 0.175 1 # 18 3 3 3 508 0.314 2 # 27 3 3 3 520 NA NA # 36 3 3 3 532 0.845 3 Data mydata <- structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Trial = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), task = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3 ), Time = c(1, 500, 512, 524, 1, 501, 513, 525, 1, 502, 514, 526, 1, 503, 515, 527, 1, 504, 516, 528, 1, 505, 517, 529, 1, 506, 518, 530, 1, 507, 519, 531, 1, 508, 520, 532), Fixation = c(0.596, 0.016, NA, NA, 0.688, NA, NA, NA, 0.582, NA, 0.369, 0.847, NA, 0.779, 0.95, 0.304, 0.158, 0.281, 0.36, 0.535, NA, 0.717, NA, 0.959, 0.174, 0.278, 0.784, NA, 0.439, 0.857, NA, 0.019, 0.175, 0.314, NA, 0.845)), row.names = c(1L, 10L, 19L, 28L, 4L, 13L, 22L, 31L, 7L, 16L, 25L, 34L, 2L, 11L, 20L, 29L, 5L, 14L, 23L, 32L, 8L, 17L, 26L, 35L, 3L, 12L, 21L, 30L, 6L, 15L, 24L, 33L, 9L, 18L, 27L, 36L), class = "data.frame")
Here is another option which should be fast: setDT(mydata)[!is.na(Fixation), FixID := .SD[order(Trial, Time), rowid(Trial)]] mydata
delete observations by days in R
My dataset has the next structure df=structure(list(Data = structure(c(12L, 13L, 14L, 15L, 16L, 17L, 18L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("01.01.2018", "02.01.2018", "03.01.2018", "04.01.2018", "05.01.2018", "06.01.2018", "07.01.2018", "12.02.2018", "13.02.2018", "14.02.2018", "15.02.2018", "25.12.2017", "26.12.2017", "27.12.2017", "28.12.2017", "29.12.2017", "30.12.2017", "31.12.2017"), class = "factor"), sku = 1:18, metric = c(100L, 210L, 320L, 430L, 540L, 650L, 760L, 870L, 980L, 1090L, 1200L, 1310L, 1420L, 1530L, 1640L, 1750L, 1860L, 1970L), action = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("Data", "sku", "metric", "action"), class = "data.frame", row.names = c(NA, -18L)) I need to delete observations that have certain dates. But in this dataset there is action variable. The action column has only two values 0 and 1. Observations on these certain dates should be deleted only for the zero category of action. these dates are presented in a separate datase. datedata=structure(list(Data = structure(c(18L, 19L, 20L, 21L, 22L, 5L, 7L, 9L, 11L, 13L, 15L, 17L, 23L, 1L, 2L, 3L, 4L, 6L, 8L, 10L, 12L, 14L, 16L), .Label = c("01.05.2018", "02.05.2018", "03.05.2018", "04.05.2018", "05.03.2018", "05.05.2018", "06.03.2018", "06.05.2018", "07.03.2018", "07.05.2018", "08.03.2018", "08.05.2018", "09.03.2018", "09.05.2018", "10.03.2018", "10.05.2018", "11.03.2018", "21.02.2018", "22.02.2018", "23.02.2018", "24.02.2018", "25.02.2018", "30.04.2018" ), class = "factor")), .Names = "Data", class = "data.frame", row.names = c(NA, -23L)) how can i do it?
A solution is to use dplyr::filter as: library(dplyr) library(lubridate) df %>% mutate(Data = dmy(Data)) %>% filter(action==1 | (action==0 & !(Data %in% dmy(datedata$Data)))) # Data sku metric action # 1 2017-12-25 1 100 0 # 2 2017-12-26 2 210 0 # 3 2017-12-27 3 320 0 # 4 2017-12-28 4 430 0 # 5 2017-12-29 5 540 0 # 6 2017-12-30 6 650 0 # 7 2017-12-31 7 760 0 # 8 2018-01-01 8 870 0 # 9 2018-01-02 9 980 1 # 10 2018-01-03 10 1090 1 # 11 2018-01-04 11 1200 1 # 12 2018-01-05 12 1310 1 # 13 2018-01-06 13 1420 1 # 14 2018-01-07 14 1530 1 # 15 2018-02-12 15 1640 1 # 16 2018-02-13 16 1750 1 # 17 2018-02-14 17 1860 1 # 18 2018-02-15 18 1970 1
I guess this will work. Fist use match to see weather there is a match in the day of df and the day in datedata, then filter it library (dplyr) df <- df %>% mutate (Data.flag = match(Data,datedata$Data)) %>% filter(!is.na(Data.flag) & action == 0)
Merge data frames within a list
I have a list which looks like, lapply(sample_list, head, 3) $`2016-04-24 00:00:00.tcp` ports freq 8 443 296 12 80 170 5 23 92 $`2016-04-24 00:00:00.udp` ports freq 4 161 138 7 53 45 1 123 28 $`2016-04-24 01:00:00.tcp` ports freq 13 443 342 20 80 215 10 25 60 $`2016-04-24 01:00:00.udp` ports freq 4 161 85 8 53 42 12 902 27 I want to merge the data frames that come from the same protocol (i.e. the tcp together and udp together) So the final result would be a new list with 2 data frames; One for tcp and one for udp such that, lapply(final_list, head, 3) $tcp ports freq.00:00:00 freq.01:00:00 1 443 296 342 2 80 170 215 3 23 92 51 $udp ports freq.00:00:00 freq.01:00:00 1 161 138 85 2 53 45 42 3 123 28 19 DATA dput(sample_list) structure(list(`2016-04-24 00:00:00.tcp` = structure(list(ports = c("443", "80", "23", "21", "22", "25", "445", "110", "389", "135", "465", "514", "91", "995", "84", "902"), freq = structure(c(296L, 170L, 92L, 18L, 16L, 15L, 14L, 4L, 3L, 2L, 2L, 2L, 2L, 2L, 1L, 1L), .Dim = 16L)), .Names = c("ports", "freq"), row.names = c(8L, 12L, 5L, 3L, 4L, 6L, 9L, 1L, 7L, 2L, 10L, 11L, 15L, 16L, 13L, 14L), class = "data.frame"), `2016-04-24 00:00:00.udp` = structure(list( ports = c("161", "53", "123", "902", "137", "514", "138", "623", "69", "88", "500"), freq = structure(c(138L, 45L, 28L, 26L, 24L, 24L, 6L, 6L, 5L, 4L, 1L), .Dim = 11L)), .Names = c("ports", "freq"), row.names = c(4L, 7L, 1L, 11L, 2L, 6L, 3L, 8L, 9L, 10L, 5L), class = "data.frame"), `2016-04-24 01:00:00.tcp` = structure(list( ports = c("443", "80", "25", "23", "88", "21", "161", "22", "445", "135", "389", "993", "548", "110", "143", "502", "514", "81", "995", "102", "111", "311", "444", "789", "902", "91" ), freq = structure(c(342L, 215L, 60L, 51L, 42L, 32L, 31L, 18L, 18L, 6L, 5L, 4L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Dim = 26L)), .Names = c("ports", "freq" ), row.names = c(13L, 20L, 10L, 9L, 22L, 7L, 6L, 8L, 15L, 4L, 12L, 25L, 18L, 2L, 5L, 16L, 17L, 21L, 26L, 1L, 3L, 11L, 14L, 19L, 23L, 24L), class = "data.frame"), `2016-04-24 01:00:00.udp` = structure(list( ports = c("161", "53", "902", "514", "123", "137", "69", "138", "389", "443", "88", "623"), freq = structure(c(85L, 42L, 27L, 24L, 19L, 15L, 15L, 4L, 2L, 2L, 2L, 1L), .Dim = 12L)), .Names = c("ports", "freq"), row.names = c(4L, 8L, 12L, 7L, 1L, 2L, 10L, 3L, 5L, 6L, 11L, 9L), class = "data.frame")), .Names = c("2016-04-24 00:00:00.tcp", "2016-04-24 00:00:00.udp", "2016-04-24 01:00:00.tcp", "2016-04-24 01:00:00.udp" )) Bonus question: What is the structure of freq? I never saw int [1:16(1d)] before. str(sample_list$`2016-04-24 00:00:00.tcp`) 'data.frame': 16 obs. of 2 variables: $ ports: chr "443" "80" "23" "21" ... $ freq : int [1:16(1d)] 296 170 92 18 16 15 14 4 3 2 ... The code I used to create the list (In this case called try1) protocol_list <- lapply(per_hour1, function(i) split(i, i$protocol)) Analytic_Protocol_List <- lapply(protocol_list, function(i) lapply(i, dest.ports)) try1 <- lapply(unlist(Analytic_Protocol_List, recursive = FALSE), `[[`, 1) Note that solutions from similar questions do not work for this case. Maybe because of the structure?
Another alternative: library(dplyr) library(tidyr) data.table::melt(sample_list) %>% separate(L1, into = c("time", "protocol"), sep = "\\.") %>% unite(f, variable, time) %>% spread(f, value) %>% split(.$protocol) Which, using your data, gives: $tcp ports protocol freq_2016-04-24 00:00:00 freq_2016-04-24 01:00:00 1 102 tcp NA 1 2 110 tcp 4 2 3 111 tcp NA 1 5 135 tcp 2 6 8 143 tcp NA 2 9 161 tcp NA 31 11 21 tcp 18 32 12 22 tcp 16 18 13 23 tcp 92 51 14 25 tcp 15 60 15 311 tcp NA 1 16 389 tcp 3 5 18 443 tcp 296 342 20 444 tcp NA 1 21 445 tcp 14 18 22 465 tcp 2 NA 24 502 tcp NA 2 25 514 tcp 2 2 28 548 tcp NA 3 31 789 tcp NA 1 32 80 tcp 170 215 33 81 tcp NA 2 34 84 tcp 1 NA 35 88 tcp NA 42 37 902 tcp 1 1 39 91 tcp 2 1 40 993 tcp NA 4 41 995 tcp 2 2 $udp ports protocol freq_2016-04-24 00:00:00 freq_2016-04-24 01:00:00 4 123 udp 28 19 6 137 udp 24 15 7 138 udp 6 4 10 161 udp 138 85 17 389 udp NA 2 19 443 udp NA 2 23 500 udp 1 NA 26 514 udp 24 24 27 53 udp 45 42 29 623 udp 6 1 30 69 udp 5 15 36 88 udp 4 2 38 902 udp 26 27 Update: If you want to sort by freq, you could do: data.table::melt(sample_list) %>% separate(L1, into = c("time", "protocol"), sep = "\\.") %>% unite(f, variable, time) %>% spread(f, value) %>% arrange(protocol, desc(`freq_2016-04-24 00:00:00`)) %>% split(.$protocol)
For the rbinding you can try the following: do.call(rbind, sample_list[grep("tcp", names(sample_list))]) and: do.call(rbind, sample_list[grep("udp", names(sample_list))]) and as refined by Marat below: d <- do.call(rbind, sample_list) d2 <- data.frame(d,do.call(rbind,strsplit(rownames((d)),'[.]'))) lapply(split(d2,d2$X2),dcast,ports~X1,value.var='freq')
you can just merge by ID create a ID for each row of the data frame let lappy(X) = x x$1 <- cbind(ID=1:nrow(x$1)) same for x1,x2,x3....,xN newx <- merge(x$1,x$2,...,x$N, by=ID) since id merging is used overlapping won't occur, jusıt there each list$(X) as a data frame itself
I want several data series on one continuous Y-axis
I have the following data set means_long <- rbindlist(means, use.names = FALSE, fill = FALSE, idcol = "ID") ID divisions divs_20 mean20 1: 1 1 20 19 2: 1 2 20 19 3: 1 3 20 19.3 4: 1 4 20 20.2 5: 1 5 20 19.2 6: 1 6 20 18.5 7: 1 7 20 19.1 8: 1 8 20 17.8 9: 1 9 20 19.6 10: 1 10 20 19.9 11: 1 11 20 20.7 12: 1 12 20 21.4 13: 1 13 20 21.4 14: 1 14 20 20.6 15: 1 15 20 22.2 16: 1 16 20 23.1 17: 1 17 20 22.5 18: 1 18 20 23.3 19: 1 19 20 24.4 20: 1 20 20 24.4 21: 2 1 15 14.9 22: 2 2 15 14.8 23: 2 3 15 14.2 24: 2 4 15 12.9 25: 2 5 15 12.2 26: 2 6 15 12.9 27: 2 7 15 13.3 28: 2 8 15 13.6 29: 2 9 15 12.7 30: 2 10 15 12.9 31: 2 11 15 12 32: 2 12 15 12.7 33: 2 13 15 12.9 34: 2 14 15 14.7 35: 2 15 15 15 36: 2 16 15 15 37: 2 17 15 16.7 38: 2 18 15 17.1 39: 2 19 15 18.9 40: 2 20 15 18.6 41: 3 1 10 8.5 42: 3 2 10 8.4 43: 3 3 10 9.3 44: 3 4 10 8.4 45: 3 5 10 7.8 46: 3 6 10 7.9 47: 3 7 10 7.8 48: 3 8 10 7.8 49: 3 9 10 7.5 50: 3 10 10 6.7 51: 3 11 10 6.1 52: 3 12 10 6.2 53: 3 13 10 6.4 54: 3 14 10 5.8 55: 3 15 10 5.5 56: 3 16 10 5.1 57: 3 17 10 5.4 58: 3 18 10 5.5 59: 3 19 10 5.8 60: 3 20 10 6.3 61: 4 1 5 4.9 62: 4 2 5 5.3 63: 4 3 5 5.5 64: 4 4 5 5.2 65: 4 5 5 5.2 I'm trying to create a graph that shows the mean after each division - I have 5 sets of data (each of 20 observations) and I want the graph to have 5 differently coloured lines to show comparison I've been using: ggplot(means_long, aes(x=divisions, y=mean20, colour=ID, group=ID)) + geom_line() + geom_point() + ggtitle("Average number of kinetochore subunits after 20 cell divisions") + xlab("Number of divisions") + ylab("Mean number of kinetochore subunits") + scale_colour_continuous(low = "#132B43", high = "#56B1F7", space="Lab", na.value="grey50", guide="legend") This creates a graph where each set of data is plotted individually on the y-axis. All data points match up on the x-axis, but for EG, rather than the y-axis spanning 0 - 12, each data set spans 1-12 and so 1-12 is repeated 5 times up the y-axis. I think it has something to do with the way the data frame is laid out but I can't work out how to change it. Output for dput(means_long): structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), divisions = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), divs_20 = c(20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), mean20 = structure(c(3L, 3L, 6L, 9L, 5L, 2L, 4L, 1L, 7L, 8L, 11L, 12L, 12L, 10L, 13L, 15L, 14L, 16L, 17L, 17L, 27L, 26L, 24L, 21L, 19L, 21L, 22L, 23L, 20L, 21L, 18L, 20L, 21L, 25L, 28L, 28L, 29L, 30L, 32L, 31L, 46L, 45L, 47L, 45L, 43L, 44L, 43L, 43L, 42L, 41L, 37L, 38L, 40L, 36L, 35L, 33L, 34L, 35L, 36L, 39L, 48L, 51L, 35L, 50L, 50L, 34L, 49L, 33L, 35L, 53L, 52L, 54L, 50L, 33L, 53L, 36L, 36L, 53L, 52L, 52L, 63L, 62L, 63L, 64L, 65L, 65L, 61L, 60L, 60L, 61L, 61L, 60L, 59L, 58L, 57L, 56L, 56L, 56L, 56L, 55L ), .Label = c("17.8", "18.5", "19", "19.1", "19.2", "19.3", "19.6", "19.9", "20.2", "20.6", "20.7", "21.4", "22.2", "22.5", "23.1", "23.3", "24.4", "12", "12.2", "12.7", "12.9", "13.3", "13.6", "14.2", "14.7", "14.8", "14.9", "15", "16.7", "17.1", "18.6", "18.9", "5.1", "5.4", "5.5", "5.8", "6.1", "6.2", "6.3", "6.4", "6.7", "7.5", "7.8", "7.9", "8.4", "8.5", "9.3", "4.9", "5", "5.2", "5.3", "5.6", "5.7", "5.9", "0.1", "0.2", "0.3", "0.4", "0.7", "0.8", "1", "1.4", "1.6", "1.7", "1.9"), class = "factor")), .Names = c("ID", "divisions", "divs_20", "mean20"), row.names = c(NA, -100L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000220788>)
My y-axis was a factor rather than a numerical value so using below code changed it to a numeric value. means_long$mean20 <- as.numeric(as.character(means_long$mean20)) Now my graph has one scale y-axis for all 5 data sets
Sorting and ranking a dataframe by date and time in r
I have a dataframe as below. Originally it was just two columns/variables -"Timestamp" (which contains date and time) and "Actor". I broke down the "Timestamp" variable into "date" and "time" and then "time further down into "hours" and "mins". This then gives the following structure dataf<-structure(list(hours = structure(c(3L, 4L, 4L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 6L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 1L, 1L, 2L, 2L), .Label = c("9", "12", "14", "15", "16", "17"), class = "factor"), mins = structure(c(17L, 1L, 2L, 14L, 15L, 16L, 3L, 4L, 6L, 6L, 7L, 9L, 9L, 13L, 13L, 10L, 11L, 12L, 2L, 5L, 8L, 8L), .Label = c("00", "04", "08", "09", "10", "12", "13", "18", "19", "20", "21", "22", "27", "39", "51", "52", "59"), class = "factor"), date = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 1L, 1L, 1L, 1L), .Label = c("4/28/2014", "5/18/2014", "5/2/2014", "5/6/2014"), class = "factor"), time = structure(c(7L, 8L, 9L, 4L, 5L, 6L, 13L, 14L, 15L, 15L, 16L, 2L, 2L, 3L, 3L, 10L, 11L, 12L, 17L, 18L, 1L, 1L), .Label = c("12:18", "12:19", "12:27", "14:39", "14:51", "14:52", "14:59", "15:00", "15:04", "16:20", "16:21", "16:22", "17:08", "17:09", "17:12", "17:13", "9:04", "9:10"), class = "factor"), Timestamp = structure(c(13L, 14L, 15L, 10L, 11L, 12L, 6L, 7L, 8L, 8L, 9L, 2L, 2L, 3L, 3L, 16L, 17L, 18L, 4L, 5L, 1L, 1L), .Label = c("4/28/2014 12:18", "4/28/2014 12:19", "4/28/2014 12:27", "4/28/2014 9:04", "4/28/2014 9:10", "5/18/2014 17:08", "5/18/2014 17:09", "5/18/2014 17:12", "5/18/2014 17:13", "5/2/2014 14:39", "5/2/2014 14:51", "5/2/2014 14:52", "5/2/2014 14:59", "5/2/2014 15:00", "5/2/2014 15:04", "5/6/2014 16:20", "5/6/2014 16:21", "5/6/2014 16:22" ), class = "factor"), Actor = c(7L, 7L, 7L, 7L, 7L, 7L, 5L, 5L, 2L, 12L, 2L, 7L, 7L, 7L, 7L, 10L, 10L, 10L, 7L, 10L, 7L, 7L)), .Names = c("hours", "mins", "date", "time", "Timestamp", "Actor"), row.names = c(NA, -22L), class = "data.frame") The reason for breaking the timestamp and time variables down into separate variables was because in my real data I have had a lot of problems sorting by data and/or time. Breaking these variables down into smaller chunks has made it much easier to sort. What I would like to do now is create a new variable called "Rank", which would return a '1' for the earliest event in the dataframe (which would be the observation at 9.04am on the 28th April 2014), then a '2' for the next observation in date/time order and so on. Sorting the dataframe appears to be relatively trivial: dataf<-dataf[order(as.Date(dataf$date, format="%m/%d/%Y"), dataf$hours, dataf$mins),] This does the job. But what I am struggling with is now to assign ranks. I tried this, because I have used 'ave' in combination with FUN=rank to rank integers, but what it produces is laughably wrong: dataf$rank <- ave((dataf[order(as.Date(dataf$date, format="%m/%d/%Y"), dataf$hours, dataf$mins),]),FUN=rank ) any help appreciated
I do not share your aversion to datetime objects, which makes this all much simpler: dataf$ts <- strptime(as.character(dataf$Timestamp),'%m/%d/%Y %H:%M') dataf <- dataf[order(dataf$ts),] dataf$ts_rank <- rank(dataf$ts,ties.method = "min") dataf ## hours mins date time Timestamp Actor ts ts_rank ## 19 9 04 4/28/2014 9:04 4/28/2014 9:04 7 2014-04-28 09:04:00 1 ## 20 9 10 4/28/2014 9:10 4/28/2014 9:10 10 2014-04-28 09:10:00 2 ## 21 12 18 4/28/2014 12:18 4/28/2014 12:18 7 2014-04-28 12:18:00 3 ## 22 12 18 4/28/2014 12:18 4/28/2014 12:18 7 2014-04-28 12:18:00 3 ## 12 12 19 4/28/2014 12:19 4/28/2014 12:19 7 2014-04-28 12:19:00 5 ## 13 12 19 4/28/2014 12:19 4/28/2014 12:19 7 2014-04-28 12:19:00 5 ## 14 12 27 4/28/2014 12:27 4/28/2014 12:27 7 2014-04-28 12:27:00 7 ## 15 12 27 4/28/2014 12:27 4/28/2014 12:27 7 2014-04-28 12:27:00 7 ## 4 14 39 5/2/2014 14:39 5/2/2014 14:39 7 2014-05-02 14:39:00 9 ## 5 14 51 5/2/2014 14:51 5/2/2014 14:51 7 2014-05-02 14:51:00 10 ## 6 14 52 5/2/2014 14:52 5/2/2014 14:52 7 2014-05-02 14:52:00 11 ## 1 14 59 5/2/2014 14:59 5/2/2014 14:59 7 2014-05-02 14:59:00 12 ## 2 15 00 5/2/2014 15:00 5/2/2014 15:00 7 2014-05-02 15:00:00 13 ## 3 15 04 5/2/2014 15:04 5/2/2014 15:04 7 2014-05-02 15:04:00 14 ## 16 16 20 5/6/2014 16:20 5/6/2014 16:20 10 2014-05-06 16:20:00 15 ## 17 16 21 5/6/2014 16:21 5/6/2014 16:21 10 2014-05-06 16:21:00 16 ## 18 16 22 5/6/2014 16:22 5/6/2014 16:22 10 2014-05-06 16:22:00 17 ## 7 17 08 5/18/2014 17:08 5/18/2014 17:08 5 2014-05-18 17:08:00 18 ## 8 17 09 5/18/2014 17:09 5/18/2014 17:09 5 2014-05-18 17:09:00 19 ## 9 17 12 5/18/2014 17:12 5/18/2014 17:12 2 2014-05-18 17:12:00 20 ## 10 17 12 5/18/2014 17:12 5/18/2014 17:12 12 2014-05-18 17:12:00 20 ## 11 17 13 5/18/2014 17:13 5/18/2014 17:13 2 2014-05-18 17:13:00 22