I have this data frame, (df1):
Month index
1 2015-09-01 1.21418847
2 2015-08-01 -4.37919039
3 2015-07-01 -1.16004624
4 2015-06-01 -1.09754890
5 2015-05-01 -4.37919039
6 2015-04-01 -4.37919039
7 2015-03-01 4.37919039
8 2015-02-01 4.37919039
9 2015-01-01 -0.11285150
10 2014-12-01 0.45712044
11 2014-11-01 0.97597018
12 2014-10-01 0.87560496
13 2014-09-01 0.66278156
14 2014-08-01 4.37919039
15 2014-07-01 1.15440685
16 2014-06-01 1.38021497
17 2014-05-01 1.67663242
18 2014-04-01 2.08358406
19 2014-03-01 2.50222843
20 2014-02-01 2.71665822
21 2014-01-01 3.13692051
22 2013-12-01 2.91702023
23 2013-11-01 3.02603774
24 2013-10-01 2.55812363
25 2013-09-01 3.12586325
26 2013-08-01 3.26063617
27 2013-07-01 2.91702023
28 2013-06-01 3.15504505
29 2013-05-01 2.53958494
30 2013-04-01 2.61528861
31 2013-03-01 2.84742861
32 2013-02-01 2.82097624
33 2013-01-01 2.53196473
34 2012-12-01 2.35786991
35 2012-11-01 2.40611260
36 2012-10-01 2.42408844
37 2012-09-01 2.91702023
38 2012-08-01 2.33372249
39 2012-07-01 2.00140636
40 2012-06-01 2.24721387
41 2012-05-01 1.89189602
42 2012-04-01 1.98807663
43 2012-03-01 1.89563925
44 2012-02-01 1.19541625
45 2012-01-01 2.91702023
46 2011-12-01 0.29072412
47 2011-11-01 -2.91702023
48 2011-10-01 -2.91702023
49 2011-09-01 -0.36402331
50 2011-08-01 -0.55409805
51 2011-07-01 -0.05902839
52 2011-06-01 -0.03946940
53 2011-05-01 0.30898661
54 2011-04-01 2.91702023
55 2011-03-01 0.80556310
56 2011-02-01 1.07001901
57 2011-01-01 2.91702023
58 2010-12-01 1.34682208
59 2010-11-01 1.30446466
60 2010-10-01 0.97753435
61 2010-09-01 0.90434619
62 2010-08-01 0.80415571
63 2010-07-01 1.41129808
64 2010-06-01 2.03576435
65 2010-05-01 2.85757135
66 2010-04-01 2.91702023
67 2010-03-01 3.96563441
68 2010-02-01 4.37919039
69 2010-01-01 4.57358010
70 2009-12-01 4.63589893
71 2009-11-01 4.40042885
72 2009-10-01 4.21359930
73 2009-09-01 4.10739350
74 2009-08-01 2.91702023
75 2009-07-01 3.85460338
76 2009-06-01 3.07796824
77 2009-05-01 2.91702023
78 2009-04-01 1.90359672
79 2009-03-01 0.68355248
80 2009-02-01 0.36218125
81 2009-01-01 -0.50814101
82 2008-12-01 0.49310633
83 2008-11-01 2.98877210
84 2008-10-01 2.28716199
85 2008-09-01 0.61433048
86 2008-08-01 0.51258623
87 2008-07-01 1.74079440
88 2008-06-01 2.91702023
89 2008-05-01 1.60899848
90 2008-04-01 2.01574569
91 2008-03-01 1.81341196
92 2008-02-01 1.48482933
93 2008-01-01 1.89122725
94 2007-12-01 1.84400308
95 2007-11-01 1.23545695
96 2007-10-01 0.44341718
97 2007-09-01 0.55630846
98 2007-08-01 0.42806839
99 2007-07-01 -0.75234218
100 2007-06-01 -1.44397151
101 2007-05-01 -2.10673018
102 2007-04-01 -1.40817350
103 2007-03-01 -0.73608848
104 2007-02-01 -0.69200513
105 2007-01-01 -0.51056142
106 2006-12-01 -0.40504212
107 2006-11-01 -0.04161989
108 2006-10-01 -0.10478629
109 2006-09-01 0.07423530
110 2006-08-01 0.13076121
111 2006-07-01 2.91702023
112 2006-06-01 1.02865488
113 2006-05-01 -0.08979180
114 2006-04-01 -1.52792341
115 2006-03-01 -2.52839603
116 2006-02-01 -3.39026284
117 2006-01-01 -3.04045769
I want to calculate quarterly mean for each year. This will result in a data.frame with 39 rows.
I did this code to implement the quarterly mean:
final<-df1[, mean(index), by = quarterly(Month)]
The error mssg is :
Error in `[.data.frame`(df1, , mean(index), :
unused argument (by = month(Month))
Information:
class(df1$index)
"numeric"
class(df1$Month)
"factor"
What i did wrong?
Thanks
It seems you are trying to use data.table syntax on a data frame. So first do
library(data.table)
setDT(df1)
to load the data.table package and set df1 to a data table. Then you can do
final <- df1[, mean(index), keyby = .(year(Month), quarter(Month))]
str(final)
# Classes ‘data.table’ and 'data.frame': 39 obs. of 3 variables:
# $ year : int 2006 2006 2006 2006 2007 2007 2007 2007 2008 2008 ...
# $ quarter: int 1 2 3 4 1 2 3 4 1 2 ...
# $ V1 : num -2.986 -0.196 1.041 -0.184 -0.646 ...
# - attr(*, "sorted")= chr "year" "quarter"
# - attr(*, ".internal.selfref")=<externalptr>
This shows we have 39 rows in the result, as you desire. Some notes: The function is named quarter() not quarterly(), you needed a capital M in Month, and needed to group by year and quarter.
Related
I have time series data:
mytime <- seq.Date(as.Date("2015-01-01"), as.Date("2022-01-01"), "day")
value <- rnorm(n = length(mytime))
df <- cbind.data.frame(mytime, value)
I want to create a new column which starts with 1 and increases by 1 for every fixed number of rows. For example, the first 100 rows get a value of 1, the next 100 rows get a value of 2, and so on.
This should work:
library(tidyverse)
mytime <- seq.Date(as.Date("2015-01-01"), as.Date("2022-01-01"), "day")
value <- rnorm(n = length(mytime))
df <- cbind.data.frame(mytime, value)
df %>%
mutate(grouping = rep(c(1:100), each = 100)[1:length(value)])
#> mytime value grouping
#> 1 2015-01-01 0.8188726406 1
#> 2 2015-01-02 -0.7051437552 1
#> 3 2015-01-03 -0.4052449198 1
#> 4 2015-01-04 0.1069765445 1
#> 5 2015-01-05 -1.7624077376 1
#> 6 2015-01-06 -1.3541046081 1
#> 7 2015-01-07 -1.5836060959 1
#> 8 2015-01-08 1.6975299938 1
#> 9 2015-01-09 0.1365021842 1
#> 10 2015-01-10 1.3583782190 1
#> 11 2015-01-11 0.4389094423 1
#> 12 2015-01-12 -1.2695437977 1
#> 13 2015-01-13 -0.0185099335 1
#> 14 2015-01-14 -1.2504712388 1
#> 15 2015-01-15 0.4999558250 1
#> 16 2015-01-16 0.7138838410 1
#> 17 2015-01-17 -1.7709035964 1
#> 18 2015-01-18 -0.6555022760 1
#> 19 2015-01-19 0.2771728385 1
#> 20 2015-01-20 -0.2647472076 1
#> 21 2015-01-21 -0.1846148670 1
#> 22 2015-01-22 -1.7964703879 1
#> 23 2015-01-23 1.0119717818 1
#> 24 2015-01-24 -1.4243738227 1
#> 25 2015-01-25 -1.2956929904 1
#> 26 2015-01-26 -0.6236723296 1
#> 27 2015-01-27 -0.4082983500 1
#> 28 2015-01-28 2.3180442246 1
#> 29 2015-01-29 -0.3830073587 1
#> 30 2015-01-30 1.7191377387 1
#> 31 2015-01-31 1.8575794944 1
#> 32 2015-02-01 -2.1582163873 1
#> 33 2015-02-02 1.7012159685 1
#> 34 2015-02-03 -1.1991523660 1
#> 35 2015-02-04 0.9475081079 1
#> 36 2015-02-05 0.1152464165 1
#> 37 2015-02-06 -2.6407434397 1
#> 38 2015-02-07 1.3821892807 1
#> 39 2015-02-08 0.8054893231 1
#> 40 2015-02-09 0.9223800849 1
#> 41 2015-02-10 -0.3546256458 1
#> 42 2015-02-11 0.6478162596 1
#> 43 2015-02-12 -0.7505130565 1
#> 44 2015-02-13 -2.0366870996 1
#> 45 2015-02-14 1.0649788649 1
#> 46 2015-02-15 -0.9217779191 1
#> 47 2015-02-16 -0.3045693249 1
#> 48 2015-02-17 0.0518172566 1
#> 49 2015-02-18 0.0532814044 1
#> 50 2015-02-19 0.1876367083 1
#> 51 2015-02-20 0.3327661457 1
#> 52 2015-02-21 -0.2952679556 1
#> 53 2015-02-22 0.3293960050 1
#> 54 2015-02-23 1.2409077698 1
#> 55 2015-02-24 0.3580355273 1
#> 56 2015-02-25 -1.4924835886 1
#> 57 2015-02-26 0.7058099312 1
#> 58 2015-02-27 0.2104966444 1
#> 59 2015-02-28 -0.3057447517 1
#> 60 2015-03-01 1.5756875721 1
#> 61 2015-03-02 -0.1917941771 1
#> 62 2015-03-03 0.5913340531 1
#> 63 2015-03-04 -0.5700276892 1
#> 64 2015-03-05 1.0740621827 1
#> 65 2015-03-06 -1.2117093430 1
#> 66 2015-03-07 1.1110831399 1
#> 67 2015-03-08 -0.4552585955 1
#> 68 2015-03-09 -0.8588412294 1
#> 69 2015-03-10 1.9932422428 1
#> 70 2015-03-11 -1.7018407616 1
#> 71 2015-03-12 -0.0308941351 1
#> 72 2015-03-13 0.5055698207 1
#> 73 2015-03-14 0.4188607070 1
#> 74 2015-03-15 0.7982967262 1
#> 75 2015-03-16 -2.2995915989 1
#> 76 2015-03-17 -0.5689886197 1
#> 77 2015-03-18 -0.7878760699 1
#> 78 2015-03-19 1.9519211037 1
#> 79 2015-03-20 0.9026785904 1
#> 80 2015-03-21 1.3952120899 1
#> 81 2015-03-22 -0.6474826181 1
#> 82 2015-03-23 0.8958113474 1
#> 83 2015-03-24 -1.2238473311 1
#> 84 2015-03-25 0.4058042441 1
#> 85 2015-03-26 -0.5709496280 1
#> 86 2015-03-27 -0.4189819537 1
#> 87 2015-03-28 -0.3253399775 1
#> 88 2015-03-29 -0.2504487158 1
#> 89 2015-03-30 -0.5048374234 1
#> 90 2015-03-31 0.2755789912 1
#> 91 2015-04-01 0.8922287071 1
#> 92 2015-04-02 1.1172195419 1
#> 93 2015-04-03 -1.6022222969 1
#> 94 2015-04-04 -0.2639181444 1
#> 95 2015-04-05 -1.0666771455 1
#> 96 2015-04-06 0.5772296824 1
#> 97 2015-04-07 0.3058956784 1
#> 98 2015-04-08 0.3958394775 1
#> 99 2015-04-09 0.4086626441 1
#> 100 2015-04-10 0.5702892942 1
#> 101 2015-04-11 -0.3305672962 2
#> 102 2015-04-12 1.1116674141 2
#> 103 2015-04-13 -0.3013942049 2
#> 104 2015-04-14 -0.5810105837 2
#> 105 2015-04-15 0.5907661366 2
#> 106 2015-04-16 -0.0033270315 2
#> 107 2015-04-17 -0.5018928764 2
#> 108 2015-04-18 -0.7306115691 2
#> 109 2015-04-19 0.8195806083 2
#> 110 2015-04-20 0.1874579340 2
#> 111 2015-04-21 0.2209803536 2
#> 112 2015-04-22 -1.3126491196 2
#> 113 2015-04-23 0.3083904082 2
#> 114 2015-04-24 0.7918114136 2
#> 115 2015-04-25 -0.6279447046 2
#> 116 2015-04-26 0.1982677395 2
#> 117 2015-04-27 0.1015474687 2
#> 118 2015-04-28 -0.9389828948 2
#> 119 2015-04-29 -0.3207971613 2
#> 120 2015-04-30 1.0007723074 2
#> 121 2015-05-01 0.3239017453 2
#> 122 2015-05-02 -1.2809204214 2
#> 123 2015-05-03 -0.5277261595 2
#> 124 2015-05-04 0.7688500927 2
#> 125 2015-05-05 1.0318535818 2
#> 126 2015-05-06 1.4980233862 2
#> 127 2015-05-07 -0.7881506596 2
#> 128 2015-05-08 -0.0463271920 2
#> 129 2015-05-09 2.0093012251 2
#> 130 2015-05-10 -0.9134628460 2
#> 131 2015-05-11 -2.3400047323 2
#> 132 2015-05-12 -1.1726139699 2
#> 133 2015-05-13 0.7844263225 2
#> 134 2015-05-14 1.3506619127 2
#> 135 2015-05-15 -2.5540631175 2
#> 136 2015-05-16 0.6912525372 2
#> 137 2015-05-17 0.7538481654 2
....
Created on 2022-09-05 with reprex v2.0.2
We could also utilize the row_number():
library(dplyr)
df |>
mutate(grouping = ceiling(row_number()/100))
Output:
mytime value grouping
1 2015-01-01 0.143792098 1
2 2015-01-02 -1.401455624 1
3 2015-01-03 -1.858456039 1
4 2015-01-04 1.187416005 1
5 2015-01-05 1.379141495 1
6 2015-01-06 -0.675080153 1
7 2015-01-07 0.320909205 1
8 2015-01-08 1.852616919 1
9 2015-01-09 -0.846052547 1
10 2015-01-10 -0.658311621 1
11 2015-01-11 0.222296116 1
12 2015-01-12 -0.543392482 1
13 2015-01-13 -0.755015488 1
14 2015-01-14 -0.178678382 1
15 2015-01-15 1.110967146 1
16 2015-01-16 -1.275580679 1
17 2015-01-17 -0.010064079 1
18 2015-01-18 -2.170296324 1
19 2015-01-19 1.250837273 1
20 2015-01-20 -1.209153067 1
21 2015-01-21 -0.550676735 1
22 2015-01-22 0.952916907 1
23 2015-01-23 0.277654831 1
24 2015-01-24 0.042829946 1
25 2015-01-25 -0.240098180 1
26 2015-01-26 -0.746263380 1
27 2015-01-27 -0.284752154 1
28 2015-01-28 0.346689091 1
29 2015-01-29 -0.666216586 1
30 2015-01-30 -0.640442501 1
31 2015-01-31 -0.244509760 1
32 2015-02-01 -2.075441987 1
33 2015-02-02 0.147406620 1
34 2015-02-03 0.363748658 1
35 2015-02-04 0.134561515 1
36 2015-02-05 -0.391123031 1
37 2015-02-06 -0.170565332 1
38 2015-02-07 0.183892659 1
39 2015-02-08 -0.854721228 1
40 2015-02-09 1.278300433 1
41 2015-02-10 -1.421003730 1
42 2015-02-11 0.913688901 1
43 2015-02-12 -0.877178883 1
44 2015-02-13 0.467617692 1
45 2015-02-14 -1.903758723 1
46 2015-02-15 -0.525691357 1
47 2015-02-16 -0.324291219 1
48 2015-02-17 -0.001652138 1
49 2015-02-18 -1.451039958 1
50 2015-02-19 -0.143701884 1
51 2015-02-20 0.921537907 1
52 2015-02-21 0.307838066 1
53 2015-02-22 1.251906011 1
54 2015-02-23 -1.824026442 1
55 2015-02-24 -1.883911514 1
56 2015-02-25 0.465843894 1
57 2015-02-26 0.087336821 1
58 2015-02-27 -0.257907284 1
59 2015-02-28 -1.215340438 1
60 2015-03-01 -0.737590344 1
61 2015-03-02 -1.152280630 1
62 2015-03-03 0.445959871 1
63 2015-03-04 0.412874111 1
64 2015-03-05 0.912774140 1
65 2015-03-06 -0.753539221 1
66 2015-03-07 -0.247727125 1
67 2015-03-08 1.248229876 1
68 2015-03-09 -0.857405365 1
69 2015-03-10 -2.062565968 1
70 2015-03-11 0.906372397 1
71 2015-03-12 1.770847797 1
72 2015-03-13 -1.194959910 1
73 2015-03-14 0.705680544 1
74 2015-03-15 0.608626405 1
75 2015-03-16 0.483917761 1
76 2015-03-17 0.486972548 1
77 2015-03-18 0.167493580 1
78 2015-03-19 1.007013432 1
79 2015-03-20 1.540288238 1
80 2015-03-21 -0.082749960 1
81 2015-03-22 0.267562341 1
82 2015-03-23 0.334862334 1
83 2015-03-24 0.678018653 1
84 2015-03-25 0.816515100 1
85 2015-03-26 1.059476108 1
86 2015-03-27 0.622612181 1
87 2015-03-28 0.851457454 1
88 2015-03-29 1.044443068 1
89 2015-03-30 -0.601267237 1
90 2015-03-31 0.569441548 1
91 2015-04-01 1.592983829 1
92 2015-04-02 1.283704270 1
93 2015-04-03 0.200713538 1
94 2015-04-04 0.902635425 1
95 2015-04-05 0.542227464 1
96 2015-04-06 -0.329488879 1
97 2015-04-07 0.040194473 1
98 2015-04-08 -0.863276688 1
99 2015-04-09 -0.830596568 1
100 2015-04-10 -0.666276306 1
101 2015-04-11 0.738113129 2
102 2015-04-12 -1.152088593 2
103 2015-04-13 0.309580066 2
104 2015-04-14 0.639723004 2
105 2015-04-15 0.926298625 2
106 2015-04-16 -1.044929798 2
107 2015-04-17 -1.088962011 2
108 2015-04-18 0.137856131 2
109 2015-04-19 0.846136781 2
110 2015-04-20 0.372345665 2
111 2015-04-21 3.400435187 2
112 2015-04-22 -2.026547096 2
113 2015-04-23 -0.106970853 2
114 2015-04-24 -1.226614624 2
115 2015-04-25 0.918546253 2
116 2015-04-26 0.027024114 2
117 2015-04-27 -2.127191506 2
118 2015-04-28 -1.600815099 2
119 2015-04-29 0.749681304 2
120 2015-04-30 0.721914459 2
121 2015-05-01 -0.338230147 2
122 2015-05-02 0.913592837 2
123 2015-05-03 0.587794938 2
124 2015-05-04 -0.851625256 2
125 2015-05-05 -0.345100249 2
126 2015-05-06 1.195675453 2
127 2015-05-07 -1.163156366 2
128 2015-05-08 0.006734588 2
129 2015-05-09 1.410087674 2
130 2015-05-10 1.322741860 2
131 2015-05-11 -0.297038999 2
132 2015-05-12 -0.197173515 2
133 2015-05-13 0.224360972 2
134 2015-05-14 0.516641666 2
135 2015-05-15 -0.779288529 2
136 2015-05-16 0.579790369 2
137 2015-05-17 -1.455354422 2
138 2015-05-18 0.080913482 2
139 2015-05-19 -0.144821155 2
140 2015-05-20 -0.114079060 2
141 2015-05-21 -0.763828057 2
142 2015-05-22 0.707339053 2
143 2015-05-23 0.647765433 2
144 2015-05-24 -1.490961303 2
145 2015-05-25 0.620563653 2
146 2015-05-26 -0.543335407 2
147 2015-05-27 0.104817520 2
148 2015-05-28 -0.003077069 2
149 2015-05-29 0.703242269 2
150 2015-05-30 -0.432612310 2
151 2015-05-31 0.765172967 2
152 2015-06-01 0.662351120 2
153 2015-06-02 0.320601441 2
154 2015-06-03 -1.542552690 2
155 2015-06-04 -0.841613323 2
156 2015-06-05 0.244023691 2
157 2015-06-06 -0.363205416 2
158 2015-06-07 0.425083853 2
159 2015-06-08 0.480960952 2
160 2015-06-09 1.171789654 2
161 2015-06-10 0.689310253 2
162 2015-06-11 0.069911244 2
163 2015-06-12 1.211315304 2
164 2015-06-13 -2.992856256 2
165 2015-06-14 -1.725439305 2
166 2015-06-15 -0.427232751 2
167 2015-06-16 -0.320677428 2
168 2015-06-17 -0.625616224 2
169 2015-06-18 0.436684268 2
170 2015-06-19 -0.051345979 2
171 2015-06-20 -0.005905043 2
172 2015-06-21 -0.650648380 2
173 2015-06-22 0.104280158 2
174 2015-06-23 0.692602024 2
175 2015-06-24 -0.284524585 2
176 2015-06-25 0.114234704 2
177 2015-06-26 -0.307465039 2
178 2015-06-27 -0.868424089 2
179 2015-06-28 -0.008077344 2
180 2015-06-29 -0.216263894 2
181 2015-06-30 0.716286098 2
182 2015-07-01 -0.246694377 2
183 2015-07-02 -0.514709162 2
184 2015-07-03 0.571000411 2
185 2015-07-04 0.951861313 2
186 2015-07-05 -0.657196354 2
187 2015-07-06 0.702772460 2
188 2015-07-07 -1.889945487 2
189 2015-07-08 -1.556305726 2
190 2015-07-09 -1.333879020 2
191 2015-07-10 -0.148308307 2
192 2015-07-11 0.862758957 2
193 2015-07-12 0.015712677 2
194 2015-07-13 -0.518988630 2
195 2015-07-14 0.381518862 2
196 2015-07-15 -0.920415442 2
197 2015-07-16 -0.291423016 2
198 2015-07-17 0.051580366 2
199 2015-07-18 -0.653667887 2
200 2015-07-19 -1.159563927 2
201 2015-07-20 -0.524343555 3
202 2015-07-21 -0.499934439 3
203 2015-07-22 0.890589850 3
204 2015-07-23 -0.583243838 3
205 2015-07-24 0.464586806 3
206 2015-07-25 -1.072116565 3
207 2015-07-26 -1.995098501 3
208 2015-07-27 -1.398424995 3
209 2015-07-28 -0.047756678 3
210 2015-07-29 0.993838354 3
211 2015-07-30 0.274223295 3
212 2015-07-31 -1.274376302 3
213 2015-08-01 -1.586586701 3
214 2015-08-02 0.230695873 3
215 2015-08-03 0.151248025 3
216 2015-08-04 1.631408895 3
217 2015-08-05 -0.878848837 3
218 2015-08-06 0.451727327 3
219 2015-08-07 0.392156218 3
220 2015-08-08 0.544240403 3
221 2015-08-09 -0.211142978 3
222 2015-08-10 1.364874158 3
223 2015-08-11 -0.541504849 3
224 2015-08-12 -0.089349427 3
225 2015-08-13 -0.815008782 3
226 2015-08-14 -0.121764644 3
227 2015-08-15 -1.741367522 3
228 2015-08-16 2.043085589 3
229 2015-08-17 1.051024717 3
230 2015-08-18 0.071467837 3
231 2015-08-19 0.346026920 3
232 2015-08-20 0.190915132 3
233 2015-08-21 -1.104888803 3
234 2015-08-22 -0.193678833 3
235 2015-08-23 0.453708267 3
236 2015-08-24 -0.114886984 3
237 2015-08-25 0.279705350 3
238 2015-08-26 -0.291677485 3
239 2015-08-27 -1.046920131 3
240 2015-08-28 0.546206788 3
241 2015-08-29 0.417895255 3
242 2015-08-30 0.607427357 3
243 2015-08-31 0.386263173 3
244 2015-09-01 1.693325483 3
245 2015-09-02 -0.269513707 3
246 2015-09-03 0.972799720 3
247 2015-09-04 -0.136891511 3
248 2015-09-05 0.036534446 3
249 2015-09-06 -0.818723816 3
250 2015-09-07 -0.270747970 3
251 2015-09-08 -0.099214990 3
252 2015-09-09 -0.441796094 3
253 2015-09-10 -0.785450099 3
254 2015-09-11 -0.266662717 3
255 2015-09-12 -0.185548366 3
256 2015-09-13 -0.587839058 3
257 2015-09-14 0.570935157 3
258 2015-09-15 0.339546529 3
259 2015-09-16 0.436241922 3
260 2015-09-17 -1.345637228 3
261 2015-09-18 0.265399285 3
262 2015-09-19 -0.490105412 3
263 2015-09-20 0.497014587 3
264 2015-09-21 -0.073881747 3
265 2015-09-22 -1.339337587 3
266 2015-09-23 -1.575732554 3
267 2015-09-24 1.590806011 3
268 2015-09-25 0.283380826 3
269 2015-09-26 -0.437666267 3
270 2015-09-27 0.086035992 3
271 2015-09-28 -0.205143330 3
272 2015-09-29 -0.368002399 3
273 2015-09-30 0.277060950 3
274 2015-10-01 1.184281033 3
275 2015-10-02 -0.042580777 3
276 2015-10-03 -0.034058572 3
277 2015-10-04 1.822264836 3
278 2015-10-05 1.418461585 3
279 2015-10-06 -1.663314285 3
280 2015-10-07 0.306396419 3
281 2015-10-08 -0.133955098 3
282 2015-10-09 1.785256820 3
283 2015-10-10 2.144114886 3
284 2015-10-11 1.946704788 3
285 2015-10-12 1.081968268 3
286 2015-10-13 0.466607356 3
287 2015-10-14 0.708654794 3
288 2015-10-15 0.250716353 3
289 2015-10-16 0.280578762 3
290 2015-10-17 -0.102182693 3
291 2015-10-18 1.519061748 3
292 2015-10-19 -0.240985742 3
293 2015-10-20 -1.392785238 3
294 2015-10-21 -1.213613515 3
295 2015-10-22 0.241381597 3
296 2015-10-23 -0.234988013 3
297 2015-10-24 1.620456160 3
298 2015-10-25 0.548044651 3
299 2015-10-26 1.520948096 3
300 2015-10-27 -1.069683544 3
301 2015-10-28 -2.149756515 4
302 2015-10-29 -0.371598782 4
303 2015-10-30 -0.017200805 4
304 2015-10-31 1.421065588 4
305 2015-11-01 -0.719738038 4
306 2015-11-02 -0.539361753 4
307 2015-11-03 0.127735424 4
308 2015-11-04 0.521494673 4
309 2015-11-05 -1.071468633 4
310 2015-11-06 0.311667225 4
311 2015-11-07 0.593034587 4
312 2015-11-08 -0.281065031 4
313 2015-11-09 -0.454378772 4
314 2015-11-10 -0.612201420 4
315 2015-11-11 1.261906072 4
316 2015-11-12 -0.832989599 4
317 2015-11-13 1.042128138 4
318 2015-11-14 0.101058897 4
319 2015-11-15 1.481095345 4
320 2015-11-16 0.550768802 4
321 2015-11-17 0.709517939 4
322 2015-11-18 1.403988053 4
323 2015-11-19 0.050966805 4
324 2015-11-20 -0.663606215 4
325 2015-11-21 -0.120978945 4
326 2015-11-22 0.830822407 4
327 2015-11-23 -0.846003819 4
328 2015-11-24 1.460456262 4
329 2015-11-25 0.758233907 4
330 2015-11-26 0.241672077 4
331 2015-11-27 0.461815643 4
332 2015-11-28 0.086404903 4
333 2015-11-29 -1.345535596 4
I have two data tables. The first table is matrix with coordinates and precipitation. It consists of four columns with latitude, longitude, precipitation and day of monitoring. The example of table is:
latitude_1 longitude_1 precipitation day_mon
54.17 62.15 5 34
69.61 48.65 3 62
73.48 90.16 7 96
66.92 90.27 7 19
56.19 96.46 9 25
72.23 74.18 5 81
88.00 95.20 7 97
92.44 44.41 6 18
95.83 52.91 9 88
99.68 96.23 8 6
81.91 48.32 8 96
54.66 52.70 0 62
95.31 91.82 2 84
60.32 96.25 9 71
97.39 47.91 7 76
65.21 44.63 9 3
The second table consists of 5 columns : station number, longitude, latitude, day when monitoring began, day when monitoring ends. It looks like:
station latitude_2 longitude_2 day_begin day_end
15 50.00 93.22 34 46
11 86.58 85.29 15 47
14 93.17 63.17 31 97
10 88.56 61.28 15 78
13 45.29 77.10 24 79
6 69.73 99.52 13 73
4 45.60 77.36 28 95
13 92.88 62.38 9 51
1 65.10 64.13 7 69
10 60.57 86.77 34 64
3 53.62 60.76 23 96
16 87.82 59.41 38 47
1 47.83 95.89 21 52
11 75.42 46.20 38 87
3 55.71 55.26 2 73
16 71.65 96.15 36 93
I want to sum precipitations from 1 table. But I have two conditions:
day_begin< day_mon< day_end. Day of monitoring(day_mon from 1 table) should be less than day of end and more than day of begin (from 2 table)
Sum precipitation from the point which is closer than others. distance between point of monitoring (coordinates consists
longitude_1 and latitude_1) and station (coordinates consists
longitude_2 and latitude_2) should be minimum. The distance is calculated by the formula :
R = 6400*arccos(sin(latitude_1)*sin(latitude_2)+cos(latitude_1)*cos(latitude_2))*cos(longitude_1-longitude_2))
At last I want to get results as table :
station latitude_2 longitude_2 day_begin day_end Sum
15 50 93.22 34 46 188
11 86.58 85.29 15 47 100
14 93.17 63.17 31 97 116
10 88.56 61.28 15 78 182
13 45.29 77.1 24 79 136
6 69.73 99.52 13 73 126
4 45.6 77.36 28 95 108
13 92.88 62.38 9 51 192
1 65.1 64.13 7 69 125
10 60.57 86.77 34 64 172
3 53.62 60.76 23 96 193
16 87.82 59.41 38 47 183
1 47.83 95.89 21 52 104
11 75.42 46.2 38 87 151
3 55.71 55.26 2 73 111
16 71.65 96.15 36 93 146
I know how to calculate it in C++. What function should I use in R?
Thank you for your help!
I'm not sure if I solved your problem correctly... but here it comes..
I used a data.table approach.
library( tidyverse )
library( data.table )
#step 1. join days as periods
#create a dummy variables to create a virtual period in dt1
dt1[, point_id := .I]
dt1[, day_begin := day_mon]
dt1[, day_end := day_mon]
setkey(dt2, day_begin, day_end)
#overlap join finding all stations for each point that overlap periods
dt <- foverlaps( dt1, dt2, type = "within" )
#step 2. calculate the distance station for each point based on TS-privided formula
dt[, distance := 6400 * acos( sin( latitude_1 ) * sin( latitude_2 ) + cos( latitude_1 ) * cos( latitude_2 ) ) * cos( longitude_1 - longitude_2 ) ]
#step 3. filter (absolute) minimal distance based on point_id
dt[ , .SD[which.min( abs( distance ) )], by = point_id ]
# point_id station latitude_2 longitude_2 day_begin day_end latitude_1 longitude_1 precipitation day_mon i.day_begin i.day_end distance
# 1: 1 1 47.83 95.89 21 52 54.17 62.15 5 34 34 34 -248.72398
# 2: 2 6 69.73 99.52 13 73 69.61 48.65 3 62 62 62 631.89228
# 3: 3 14 93.17 63.17 31 97 73.48 90.16 7 96 96 96 -1519.84886
# 4: 4 11 86.58 85.29 15 47 66.92 90.27 7 19 19 19 1371.54757
# 5: 5 11 86.58 85.29 15 47 56.19 96.46 9 25 25 25 1139.46849
# 6: 6 14 93.17 63.17 31 97 72.23 74.18 5 81 81 81 192.99264
# 7: 7 14 93.17 63.17 31 97 88.00 95.20 7 97 97 97 5822.81529
# 8: 8 3 55.71 55.26 2 73 92.44 44.41 6 18 18 18 -899.71206
# 9: 9 3 53.62 60.76 23 96 95.83 52.91 9 88 88 88 45.16237
# 10: 10 3 55.71 55.26 2 73 99.68 96.23 8 6 6 6 -78.04484
# 11: 11 14 93.17 63.17 31 97 81.91 48.32 8 96 96 96 -5467.77459
# 12: 12 3 53.62 60.76 23 96 54.66 52.70 0 62 62 62 -1361.57863
# 13: 13 11 75.42 46.20 38 87 95.31 91.82 2 84 84 84 -445.18765
# 14: 14 14 93.17 63.17 31 97 60.32 96.25 9 71 71 71 -854.86321
# 15: 15 3 53.62 60.76 23 96 97.39 47.91 7 76 76 76 1304.41634
# 16: 16 3 55.71 55.26 2 73 65.21 44.63 9 3 3 3 -7015.57516
Sample data
dt1 <- read.table( text = "latitude_1 longitude_1 precipitation day_mon
54.17 62.15 5 34
69.61 48.65 3 62
73.48 90.16 7 96
66.92 90.27 7 19
56.19 96.46 9 25
72.23 74.18 5 81
88.00 95.20 7 97
92.44 44.41 6 18
95.83 52.91 9 88
99.68 96.23 8 6
81.91 48.32 8 96
54.66 52.70 0 62
95.31 91.82 2 84
60.32 96.25 9 71
97.39 47.91 7 76
65.21 44.63 9 3", header = TRUE ) %>%
setDT()
dt2 <- read.table( text = "station latitude_2 longitude_2 day_begin day_end
15 50.00 93.22 34 46
11 86.58 85.29 15 47
14 93.17 63.17 31 97
10 88.56 61.28 15 78
13 45.29 77.10 24 79
6 69.73 99.52 13 73
4 45.60 77.36 28 95
13 92.88 62.38 9 51
1 65.10 64.13 7 69
10 60.57 86.77 34 64
3 53.62 60.76 23 96
16 87.82 59.41 38 47
1 47.83 95.89 21 52
11 75.42 46.20 38 87
3 55.71 55.26 2 73
16 71.65 96.15 36 93", header = TRUE ) %>%
setDT()
I have a dataset that looks like this:
USER.ID ISO_DATE
1 3 2014-05-02
2 3 2014-05-05
3 3 2014-05-06
4 3 2014-05-20
5 3 2014-05-21
6 3 2014-05-24
7 3 2014-06-09
8 3 2014-06-14
9 3 2014-06-18
10 3 2014-06-26
11 3 2014-07-11
12 3 2014-07-21
13 3 2014-07-22
14 3 2014-07-25
15 3 2014-07-27
16 3 2014-08-03
17 3 2014-08-07
18 3 2014-08-12
19 3 2014-08-13
20 3 2014-08-16
21 3 2014-08-17
22 3 2014-08-20
23 3 2014-08-22
24 3 2014-08-31
25 3 2014-10-22
26 3 2014-11-19
27 3 2014-11-20
28 3 2014-11-23
29 3 2014-11-25
30 3 2014-12-06
31 3 2014-12-09
32 3 2014-12-10
33 3 2014-12-12
34 3 2014-12-14
35 3 2014-12-14
36 3 2014-12-14
37 3 2014-12-15
38 3 2014-12-16
39 3 2014-12-17
40 3 2014-12-18
41 3 2014-12-20
42 3 2015-01-08
43 3 2015-01-09
44 3 2015-01-11
45 3 2015-01-12
46 3 2015-01-14
47 3 2015-01-15
48 3 2015-01-18
49 3 2015-01-18
50 3 2015-01-19
51 3 2015-01-21
52 3 2015-01-22
53 3 2015-01-22
54 3 2015-01-23
55 3 2015-01-26
56 3 2015-01-27
57 3 2015-01-28
58 3 2015-01-29
59 3 2015-01-30
60 3 2015-01-30
61 3 2015-02-01
62 3 2015-02-02
63 3 2015-02-03
64 3 2015-02-04
65 3 2015-02-08
66 3 2015-02-09
67 3 2015-02-10
68 3 2015-02-13
69 3 2015-02-15
70 3 2015-02-16
71 3 2015-02-19
72 3 2015-02-20
73 3 2015-02-21
74 3 2015-02-23
75 3 2015-02-26
76 3 2015-02-28
77 3 2015-03-01
78 3 2015-03-11
79 3 2015-03-18
80 3 2015-03-22
81 3 2015-03-28
82 3 2015-04-03
83 3 2015-04-07
84 3 2015-04-08
85 3 2015-04-08
86 3 2015-04-15
87 3 2015-04-19
88 3 2015-04-21
89 3 2015-04-22
90 3 2015-04-24
91 3 2015-04-28
92 3 2015-05-03
93 3 2015-05-03
94 3 2015-05-04
95 3 2015-05-06
96 3 2015-05-08
97 3 2015-05-15
98 3 2015-05-16
99 3 2015-05-16
100 3 2015-05-19
101 3 2015-05-21
102 3 2015-05-21
103 3 2015-05-22
104 5 2015-02-05
105 7 2015-01-02
106 7 2015-01-03
107 7 2015-01-25
108 7 2015-02-21
109 7 2015-02-28
110 7 2015-03-02
111 7 2015-03-02
112 7 2015-03-07
113 7 2015-03-14
114 7 2015-05-01
115 9 2014-03-12
116 9 2014-03-12
117 9 2014-03-19
118 9 2014-04-10
119 9 2014-04-10
120 9 2014-04-10
121 9 2014-04-11
122 9 2014-05-30
123 9 2014-05-30
124 9 2014-06-06
125 9 2014-06-07
126 9 2014-06-14
127 9 2014-10-17
128 9 2014-10-17
129 9 2014-10-17
130 9 2014-10-17
131 9 2014-10-17
132 9 2014-10-17
133 9 2014-10-17
134 9 2014-10-19
135 9 2014-10-20
136 9 2014-10-20
137 9 2014-12-20
138 13 2014-07-08
139 13 2014-07-08
140 13 2014-07-08
141 13 2014-07-11
142 13 2014-07-11
143 13 2014-07-18
144 13 2014-07-19
145 13 2014-07-23
146 13 2014-07-23
147 13 2014-07-27
148 13 2014-07-29
149 13 2014-07-31
150 13 2014-08-02
151 13 2014-08-03
152 13 2014-08-06
153 13 2014-08-14
154 13 2014-08-14
155 13 2014-08-18
156 13 2014-08-19
157 13 2014-08-26
158 13 2014-08-30
159 13 2014-09-02
160 13 2014-09-10
161 13 2014-09-12
162 13 2014-09-13
163 13 2014-09-18
164 13 2014-09-20
165 13 2014-09-21
166 13 2014-09-24
167 13 2014-09-28
168 13 2014-09-30
169 13 2014-10-04
170 13 2014-10-09
171 13 2014-10-15
172 13 2014-10-20
173 13 2014-10-20
174 13 2014-10-20
175 13 2014-10-20
176 13 2014-10-25
177 13 2014-10-26
178 13 2014-10-29
179 13 2014-11-10
180 13 2014-11-28
181 13 2014-11-28
182 13 2014-11-28
183 13 2014-11-28
184 13 2014-11-29
185 13 2014-12-03
186 13 2014-12-05
187 13 2014-12-05
188 13 2014-12-10
189 13 2015-01-03
190 13 2015-03-08
191 13 2015-03-22
192 13 2015-04-06
193 13 2015-04-16
194 13 2015-04-21
195 13 2015-04-22
196 13 2015-04-26
197 13 2015-05-05
198 13 2015-05-07
199 13 2015-05-15
200 13 2015-05-21
201 16 2014-03-11
202 16 2014-03-13
203 16 2014-03-15
204 16 2014-04-12
205 16 2014-04-14
206 16 2014-04-23
207 16 2014-05-26
208 16 2014-05-30
209 16 2014-05-31
210 16 2014-06-10
211 16 2014-06-26
212 16 2014-08-18
213 16 2014-08-21
214 16 2014-08-24
215 16 2014-08-29
216 16 2014-09-01
217 16 2014-09-07
218 16 2014-09-15
219 16 2014-09-17
220 16 2014-09-24
221 16 2014-09-29
222 16 2014-10-06
223 16 2014-10-07
224 16 2014-10-08
225 16 2014-10-20
226 16 2014-10-20
227 16 2014-10-20
228 16 2014-11-12
229 16 2014-11-12
I want to create a two new columns that would store 3rd and 6th value of ISO_DATE for each USER.ID separately.
I tried this:
users <- users %>%
arrange(USER.ID) %>%
group_by(USER.ID) %>%
mutate(third_date = head(ISO_DATE, 3)) %>%
mutate(fifth_date = head(ISO_DATE, 6))
but it is not helping. Is there a way to do this in R?
You can convert the 'ISO_DATE' column to 'Date' class (if it is not), group_by 'USER.ID', arrange the 'ISO_DATE' and create new columns with 3rd and 6th observation of 'ISO_DATE'
library(dplyr)
users1 <- users %>%
mutate(ISO_DATE = as.Date(ISO_DATE)) %>%
group_by(USER.ID) %>%
arrange(ISO_DATE) %>%
mutate(third_date = ISO_DATE[3L], sixth_date=ISO_DATE[6L])
Or using data.table
library(data.table)
setDT(users)[, ISO_DATE:= as.Date(ISO_DATE)
][order(ISO_DATE),
c('third_date', 'sixth_date') := list(ISO_DATE[3L], ISO_DATE[6L]) ,
by= USER.ID]
I have a dataset which contains user.id and purchase date. I need to calculate the duration between successive purchases for each user in R.
Here is what my sample data looks like:
row.names USER.ID ISO_DATE
1 1067 3 2014-05-05
2 1079 3 2014-05-06
3 1571 3 2014-05-20
4 1625 3 2014-05-21
5 1759 3 2014-05-24
6 2387 3 2014-06-09
7 2683 3 2014-06-14
8 2902 3 2014-06-18
9 3301 3 2014-06-26
10 4169 3 2014-07-11
11 5361 3 2014-07-21
12 5419 3 2014-07-22
13 5921 3 2014-07-25
14 6314 3 2014-07-27
15 7361 3 2014-08-03
16 8146 3 2014-08-07
17 10091 3 2014-08-12
18 10961 3 2014-08-13
19 13296 3 2014-08-16
20 13688 3 2014-08-17
21 15672 3 2014-08-20
22 18586 3 2014-08-22
23 24304 3 2014-08-31
24 38123 3 2014-10-22
25 50124 3 2014-11-19
26 50489 3 2014-11-20
27 52201 3 2014-11-23
28 52900 3 2014-11-25
29 61564 3 2014-12-06
30 64351 3 2014-12-09
31 65465 3 2014-12-10
32 67880 3 2014-12-12
33 69363 3 2014-12-14
34 69982 3 2014-12-14
35 70040 3 2014-12-14
36 70351 3 2014-12-15
37 72393 3 2014-12-16
38 73220 3 2014-12-17
39 75110 3 2014-12-18
40 78827 3 2014-12-20
41 112447 3 2015-01-08
42 113903 3 2015-01-09
43 114723 3 2015-01-11
44 114760 3 2015-01-12
45 115464 3 2015-01-14
46 116095 3 2015-01-15
47 118406 3 2015-01-18
48 118842 3 2015-01-18
49 119527 3 2015-01-19
50 120774 3 2015-01-21
51 120853 3 2015-01-22
52 121284 3 2015-01-22
53 121976 3 2015-01-23
54 126256 3 2015-01-26
55 126498 3 2015-01-27
56 127776 3 2015-01-28
57 128537 3 2015-01-29
58 128817 3 2015-01-30
59 129374 3 2015-01-30
60 131604 3 2015-02-01
61 132150 3 2015-02-02
62 132557 3 2015-02-03
63 132953 3 2015-02-04
64 135514 3 2015-02-08
65 136058 3 2015-02-09
66 136965 3 2015-02-10
67 140787 3 2015-02-13
68 143113 3 2015-02-15
69 143793 3 2015-02-16
70 146344 3 2015-02-19
71 147669 3 2015-02-20
72 148397 3 2015-02-21
73 151196 3 2015-02-23
74 156014 3 2015-02-26
75 161235 3 2015-02-28
76 162521 3 2015-03-01
77 177878 3 2015-03-11
78 190178 3 2015-03-18
79 199679 3 2015-03-22
80 212460 3 2015-03-28
81 221153 3 2015-04-03
82 228935 3 2015-04-07
83 230358 3 2015-04-08
84 230696 3 2015-04-08
85 250294 3 2015-04-15
86 267469 3 2015-04-19
87 270947 3 2015-04-21
88 274882 3 2015-04-22
89 282252 3 2015-04-24
90 299949 3 2015-04-28
91 323336 3 2015-05-03
92 324847 3 2015-05-03
93 326284 3 2015-05-04
94 337381 3 2015-05-06
95 346498 3 2015-05-08
96 372764 3 2015-05-15
97 376366 3 2015-05-16
98 379325 3 2015-05-16
99 386458 3 2015-05-19
100 392200 3 2015-05-21
101 393039 3 2015-05-21
102 399126 3 2015-05-22
103 106789 7 2015-01-03
104 124929 7 2015-01-25
105 148711 7 2015-02-21
106 161337 7 2015-02-28
107 163738 7 2015-03-02
108 164070 7 2015-03-02
109 170121 7 2015-03-07
110 184856 7 2015-03-14
111 314891 7 2015-05-01
112 182 9 2014-03-12
113 290 9 2014-03-19
114 549 9 2014-04-10
115 553 9 2014-04-10
116 559 9 2014-04-10
117 564 9 2014-04-11
118 1973 9 2014-05-30
119 1985 9 2014-05-30
120 2243 9 2014-06-06
121 2298 9 2014-06-07
122 2713 9 2014-06-14
123 35352 9 2014-10-17
124 35436 9 2014-10-17
125 35509 9 2014-10-17
126 35641 9 2014-10-17
127 35642 9 2014-10-17
128 35679 9 2014-10-17
129 35750 9 2014-10-17
130 36849 9 2014-10-19
131 37247 9 2014-10-20
132 37268 9 2014-10-20
133 79630 9 2014-12-20
134 3900 13 2014-07-08
135 3907 13 2014-07-08
136 4125 13 2014-07-11
137 4142 13 2014-07-11
138 5049 13 2014-07-18
139 5157 13 2014-07-19
140 5648 13 2014-07-23
141 5659 13 2014-07-23
142 6336 13 2014-07-27
143 6621 13 2014-07-29
144 6971 13 2014-07-31
145 7221 13 2014-08-02
146 7310 13 2014-08-03
147 8036 13 2014-08-06
148 11437 13 2014-08-14
149 11500 13 2014-08-14
150 14627 13 2014-08-18
151 15260 13 2014-08-19
152 22417 13 2014-08-26
153 23837 13 2014-08-30
154 24668 13 2014-09-02
155 26481 13 2014-09-10
156 26788 13 2014-09-12
157 27116 13 2014-09-13
158 27959 13 2014-09-18
159 28304 13 2014-09-20
160 28552 13 2014-09-21
161 29069 13 2014-09-24
162 30041 13 2014-09-28
163 30349 13 2014-09-30
164 31352 13 2014-10-04
165 32189 13 2014-10-09
166 34163 13 2014-10-15
167 36946 13 2014-10-20
168 36977 13 2014-10-20
169 37042 13 2014-10-20
170 37266 13 2014-10-20
171 40117 13 2014-10-25
172 40765 13 2014-10-26
173 43418 13 2014-10-29
174 47691 13 2014-11-10
175 54971 13 2014-11-28
176 55275 13 2014-11-28
177 55297 13 2014-11-28
178 55458 13 2014-11-28
179 55908 13 2014-11-29
180 59925 13 2014-12-03
181 60722 13 2014-12-05
182 61178 13 2014-12-05
183 65547 13 2014-12-10
184 107202 13 2015-01-03
185 173010 13 2015-03-08
186 199791 13 2015-03-22
187 227003 13 2015-04-06
188 252548 13 2015-04-16
189 271845 13 2015-04-21
190 274804 13 2015-04-22
191 294579 13 2015-04-26
192 332205 13 2015-05-05
193 339695 13 2015-05-07
194 373554 13 2015-05-15
195 390934 13 2015-05-21
196 203 16 2014-03-13
197 228 16 2014-03-15
198 616 16 2014-04-12
199 664 16 2014-04-14
200 851 16 2014-04-23
201 1826 16 2014-05-26
202 1969 16 2014-05-30
203 2026 16 2014-05-31
204 2419 16 2014-06-10
205 3295 16 2014-06-26
206 14030 16 2014-08-18
207 16368 16 2014-08-21
208 21239 16 2014-08-24
209 23651 16 2014-08-29
210 24533 16 2014-09-01
211 25868 16 2014-09-07
212 27408 16 2014-09-15
213 27721 16 2014-09-17
214 29076 16 2014-09-24
215 30122 16 2014-09-29
216 31622 16 2014-10-06
217 31981 16 2014-10-07
I want to add one more column that would give the difference in successive purchases for each user. I am using ddply function but it is showing some error.
Here is what I tried:
users_frequency <- ddply(users_ordered, "USER.ID", summarize,
orderfrequency = as.numeric(diff(ISO_DATE)))
If you're comfortable with dplyr instead of plyr
df %>%
mutate(ISO_DATE = as.Date(df$ISO_DATE, "%Y-%m-%d")) %>%
group_by(USER.ID) %>%
arrange(ISO_DATE) %>%
mutate(lag = lag(ISO_DATE), difference = ISO_DATE - lag)
I have a data set (x) that looks like this:
DATE WEEKDAY A B C D
2011-02-04 Friday 113 67 109 72
2011-02-05 Saturday 1 0 0 1
2011-02-06 Sunday 9 5 0 0
2011-02-07 Monday 154 48 85 60
str(x):
'data.frame': 4 obs. of 6 variables:
$ DATE : Date, format: "2011-02-04" "2011-02-05" "2011-02-06" "2011-02-07"
$ WEEKDAY: Factor w/ 7 levels "Friday","Monday",..: 1 3 4 2
$ A : num 113 1 9 154
$ B : num 67 0 5 48
$ C : num 109 0 0 85
$ D : num 72 1 0 60
Tuesday - Saturday values don't change, but I want Sunday to be the sum of Saturday and Sunday and Monday to be the sum of Saturday, Sunday, and Monday.
I tried shifting Saturday's and Sunday's dates to date + 2 and date + 1 respectively, then aggregating by date, but I lose the weekend records.
For my example, the correct results would be the following:
DATE WEEKDAY A B C D
2011-02-04 Friday 113 67 109 72
2011-02-05 Saturday 1 0 0 1
2011-02-06 Sunday 10 5 0 1
2011-02-07 Monday 164 53 85 61
How can I roll up weekend values into the next day?
Three weeks' worth of data:
DATE WEEKDAY A B C D
1 2011-01-02 Sunday 2 1 0 0
2 2011-01-03 Monday 153 51 7 1
3 2011-01-04 Tuesday 182 103 13 5
4 2011-01-05 Wednesday 192 102 14 12
5 2011-01-06 Thursday 160 67 50 20
6 2011-01-07 Friday 154 96 50 39
7 2011-01-09 Sunday 0 0 0 1
8 2011-01-10 Monday 195 94 48 39
9 2011-01-11 Tuesday 206 72 71 38
10 2011-01-12 Wednesday 232 94 96 52
11 2011-01-13 Thursday 178 113 93 52
12 2011-01-14 Friday 173 97 68 56
13 2011-01-15 Saturday 2 0 1 0
14 2011-01-17 Monday 170 91 66 52
15 2011-01-18 Tuesday 176 76 70 78
16 2011-01-19 Wednesday 164 159 117 37
17 2011-01-20 Thursday 198 87 95 111
18 2011-01-21 Friday 213 86 89 90
19 2011-01-24 Monday 195 73 102 52
20 2011-01-25 Tuesday 193 108 116 70
21 2011-01-26 Wednesday 193 102 118 63
Since you've provided a small data, I've not been able to test this on a bigger data. But the idea is something like this. I'll use data.table as I find it can be very efficient here.
The code:
require(data.table)
my_days <- c("Saturday", "Sunday", "Monday")
dt <- data.table(df)
dt[, `:=`(DATE = as.Date(DATE))]
setkey(dt, "DATE")
dt[WEEKDAY %in% my_days, `:=`(A = cumsum(A), B = cumsum(B),
C = cumsum(C), D = cumsum(D)), by = format(DATE-1, "%W")]
The idea:
First, change the DATE Column to actual Date type using as.Date (line 4).
Second, ensure that the columns are sorted by DATE column by setting the key column of dt to DATE (line 5).
Now, the last line (line 6) is where all the magic happens and is the trickiest:
The first part of the expression WEEKDAY %in% my_days, subsets the data.table dt with only days = Sat, Sun or Mon.
The last part of the same line by = format(DATE-1, "%W"), subsets the data by the week they belong to. Here, since Monday falls on the next week, just subtract 1 from the current Date and then get the week number. This will group the Dates by Week, where, Tuesday until Monday should have the same week.
The expression in the middle ':='(A = ... , D = ...) computes the cumsum and replaces just those values per grouping by reference.
For the new data you've posted, I get this as the result. Let me know if it's not what you seek.
# DATE WEEKDAY A B C D
# 1: 2011-01-02 Sunday 2 1 0 0
# 2: 2011-01-03 Monday 155 52 7 1
# 3: 2011-01-04 Tuesday 182 103 13 5
# 4: 2011-01-05 Wednesday 192 102 14 12
# 5: 2011-01-06 Thursday 160 67 50 20
# 6: 2011-01-07 Friday 154 96 50 39
# 7: 2011-01-09 Sunday 0 0 0 1
# 8: 2011-01-10 Monday 195 94 48 40
# 9: 2011-01-11 Tuesday 206 72 71 38
# 10: 2011-01-12 Wednesday 232 94 96 52
# 11: 2011-01-13 Thursday 178 113 93 52
# 12: 2011-01-14 Friday 173 97 68 56
# 13: 2011-01-15 Saturday 2 0 1 0
# 14: 2011-01-17 Monday 172 91 67 52
# 15: 2011-01-18 Tuesday 176 76 70 78
# 16: 2011-01-19 Wednesday 164 159 117 37
# 17: 2011-01-20 Thursday 198 87 95 111
# 18: 2011-01-21 Friday 213 86 89 90
# 19: 2011-01-24 Monday 195 73 102 52
# 20: 2011-01-25 Tuesday 193 108 116 70
# 21: 2011-01-26 Wednesday 193 102 118 63
# DATE WEEKDAY A B C D