Aggregate on a daily basis in R - r

I'm borrowing the reproducible example given here:
Aggregate daily level data to weekly level in R
since it's pretty much close to what I want to do.
Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176
In his question, he asks to aggregate on weekly intervals, what I'd like to do is aggregate on a "day of the week basis".
So I'd like to have a table similar to that one, adding the values of all the same day of the week:
Day of the week value
1 "Sunday" 60000
2 "Monday" 50000
3 "Tuesday" 60000
4 "Wednesday" 50000
5 "Thursday" 60000
6 "Friday" 50000
7 "Saturday" 60000

You can try:
aggregate(d$value, list(weekdays(as.Date(d$Interval))), sum)

We can group them by weekly intervals using weekdays :
library(dplyr)
df %>%
group_by(Day_Of_The_Week = weekdays(as.Date(Interval))) %>%
summarise(value = sum(value))
# Day_Of_The_Week value
# <chr> <int>
#1 Friday 16903
#2 Monday 26368
#3 Saturday 4738
#4 Sunday 2975
#5 Thursday 17858
#6 Tuesday 23772
#7 Wednesday 13560

We can do this with data.table
library(data.table)
setDT(df1)[, .(value = sum(value)), .(Dayofweek = weekdays(as.Date(Interval)))]
# Dayofweek value
#1: Sunday 2975
#2: Monday 26368
#3: Tuesday 23772
#4: Wednesday 13560
#5: Thursday 17858
#6: Friday 16903
#7: Saturday 4738

using lubridate https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
df1$Weekday=wday(arrive,label=TRUE)
library(data.table)
df1=data.table(df1)
df1[,sum(value),Weekday]

Related

Is there a way of converting four-digit numbers to time values in r?

When I try using as.POSIXlt or strptime I keep getting a single value of 'NA' as a result.
What I need to do is transform 3 and 4 digit numbers e.g. 2300 or 115 to 23:00 or 01:15 respectively, but I simply cannot get any code to work.
Basically, this data fame of outputs:
Time
1 2345
2 2300
3 2130
4 2400
5 115
6 2330
7 100
8 2300
9 1530
10 130
11 100
12 215
13 2245
14 145
15 2330
16 2400
17 2300
18 2230
19 2130
20 30
should look like this:
Time
1 23:45
2 23:00
3 21:30
4 24:00
5 01:15
6 23:30
7 01:00
8 23:00
9 15:30
10 01:30
11 01:00
12 02:15
13 22:45
14 01:45
15 23:30
16 24:00
17 23:00
18 22:30
19 21:30
20 00:30
I think you can use the following solution. However this is actually producing a character vector:
gsub("(\\d{2})(\\d{2})", "\\1:\\2", sprintf("%04d", df$Time)) |>
as.data.frame() |>
setNames("Time") |>
head()
Time
1 23:45
2 23:00
3 21:30
4 24:00
5 01:15
6 23:30

Subtract successive rows in a dataframe grouped by id

I have the following data frame:
id day total_amount
1 2015-07-09 1000
1 2015-10-22 100
1 2015-11-12 200
1 2015-11-27 2392
1 2015-12-16 123
6 2015-07-09 200
7 2015-07-09 1000
7 2015-08-27 100018
7 2015-11-25 1000
8 2015-08-27 1000
8 2015-12-07 10000
8 2016-01-18 796
8 2016-03-31 10000
15 2015-09-10 1500
15 2015-09-30 1000
I need to subtract every two successive time in day column if they have the same id until reaching the last row of that id then start subtracting times in day column this time for new id, something similar to following lines in output is expected:
7 2015-07-09 1000 2015-08-27 - 2015-07-09
7 2015-08-27 100018 2015-07-09 - 2015-08-27
7 2015-07-09 1000 0
8 2015-08-27 1000 2015-12-07 - 2015-08-27
8 2015-12-07 10000 2016-01-18 - 2015-12-07
8 2016-01-18 796 2016-03-31 - 2016-01-18
8 2016-03-31 10000 0
15 2015-09-10 1000 2015-09-30 - 2015-09-10
15 2015-09-30 1000 2015-10-01 - 2015-09-30
15 2015-10-01 1000
To get the difference as number of days you could try:
library(dplyr)
group_by(df, id) %>% mutate(new = as.Date(lead(day)) - as.Date(day))
Source: local data frame [15 x 4]
Groups: id [5]
id day total_amount new
(int) (fctr) (int) (dfft)
1 1 2015-07-09 1000 105 days
2 1 2015-10-22 100 21 days
3 1 2015-11-12 200 15 days
4 1 2015-11-27 2392 19 days
5 1 2015-12-16 123 NA days
6 6 2015-07-09 200 NA days
7 7 2015-07-09 1000 49 days
8 7 2015-08-27 100018 90 days
9 7 2015-11-25 1000 NA days
10 8 2015-08-27 1000 102 days
11 8 2015-12-07 10000 42 days
12 8 2016-01-18 796 73 days
13 8 2016-03-31 10000 NA days
14 15 2015-09-10 1500 20 days
15 15 2015-09-30 1000 NA days
EDITED
To subtract the last date from the current date you can use:
# First save the above result as `df1`:
df1[is.na(df1["new"]), "new"] <- as.Date(unlist(df1[is.na(df1["new"]), "day"]))
- Sys.Date()

insert new rows to the time series data, with date added automatically

I have a time-series data frame looks like:
TS.1
2015-09-01 361656.7
2015-09-02 370086.4
2015-09-03 346571.2
2015-09-04 316616.9
2015-09-05 342271.8
2015-09-06 361548.2
2015-09-07 342609.2
2015-09-08 281868.8
2015-09-09 297011.1
2015-09-10 295160.5
2015-09-11 287926.9
2015-09-12 323365.8
Now, what I want to do is add some new data points (rows) to the existing data frame, say,
320123.5
323521.7
How can I added corresponding date to each row? The data is just sequentially inhered from the last row.
Is there any package can do this automatically, so that the only thing I do is to insert new data point?
Here's some play data:
df <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-31"), "days"), x = seq(31))
new.x <- c(32, 33)
This adds the extra observations along with the proper sequence of dates:
new.df <- data.frame(date=seq(max(df$date) + 1, max(df$date) + length(new.x), "days"), x=new.x)
Then just rbind them to get your expanded data frame:
rbind(df, new.df)
date x
1 2015-01-01 1
2 2015-01-02 2
3 2015-01-03 3
4 2015-01-04 4
5 2015-01-05 5
6 2015-01-06 6
7 2015-01-07 7
8 2015-01-08 8
9 2015-01-09 9
10 2015-01-10 10
11 2015-01-11 11
12 2015-01-12 12
13 2015-01-13 13
14 2015-01-14 14
15 2015-01-15 15
16 2015-01-16 16
17 2015-01-17 17
18 2015-01-18 18
19 2015-01-19 19
20 2015-01-20 20
21 2015-01-21 21
22 2015-01-22 22
23 2015-01-23 23
24 2015-01-24 24
25 2015-01-25 25
26 2015-01-26 26
27 2015-01-27 27
28 2015-01-28 28
29 2015-01-29 29
30 2015-01-30 30
31 2015-01-31 31
32 2015-02-01 32
33 2015-02-02 33

Aggregate daily level data to weekly level in R

I have a huge dataset similar to the following reproducible sample data.
Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176
I want to aggregate this data to weekly level to get the output similar to the following:
Interval value
1 Week 2, June 2012 *aggregate value for day 10 to day 14 of June 2012*
2 Week 3, June 2012 *aggregate value for day 15 to day 21 of June 2012*
3 Week 4, June 2012 *aggregate value for day 22 to day 28 of June 2012*
4 Week 5, June 2012 *aggregate value for day 29 to day 30 of June 2012*
5 Week 1, July 2012 *aggregate value for day 1 to day 7 of July 2012*
6 Week 2, July 2012 *aggregate value for day 8 to day 10 of July 2012*
How do I achieve this easily without writing a long code?
If you mean the sum of of ‘value’ by week I think the easiest way to do it is to convert the data into a xts object as GSee suggested:
data <- as.xts(data$value,order.by=as.Date(data$interval))
weekly <- apply.weekly(data,sum)
[,1]
2012-06-10 552
2012-06-17 23629
2012-06-24 23872
2012-07-01 23667
2012-07-08 23552
2012-07-10 10902
I leave the formatting of the output as an exercise for you :-)
If you were to use week from lubridate, you would only get five weeks to pass to by. Assume dat is your data,
> library(lubridate)
> do.call(rbind, by(dat$value, week(dat$Interval), summary))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 24 552 4146 4188 3759 4529 4850
# 25 490 2498 4256 3396 4438 5156
# 26 564 2578 4206 3355 4346 4866
# 27 698 993 4868 3366 5122 5770
# 28 671 1086 3200 3200 5314 5726
This shows a summary for the 24th through 28th week of the year. Similarly, we can get the means with aggregate with
> aggregate(value~week(Interval), data = dat, mean)
# week(Interval) value
# 1 24 3758.667
# 2 25 3396.286
# 3 26 3355.000
# 4 27 3366.429
# 5 28 3199.500
I just came across this old question because it was used as a dupe target.
Unfortunately, all the upvoted answers (except the one by konvas and a now deleted one) present solutions for aggregating the data by week of the year while the OP has requested to aggregate by week of the month.
The definition of week of the year and week of the month is ambiguous as discussed here, here, and here.
However, the OP has indicated that he wants to count the days 1 to 7 of each month as week 1 of the month, days 8 to 14 as week 2 of the month, etc. Note that week 5 is a stub for most of the months consisting of only 2 or 3 days (except for the month of February if no leap year).
Having prepared the ground, here is a data.table solution for this kind of aggregation:
library(data.table)
DT[, .(value = sum(value)),
by = .(Interval = sprintf("Week %i, %s",
(mday(Interval) - 1L) %/% 7L + 1L,
format(Interval, "%b %Y")))]
Interval value
1: Week 2, Jun 2012 18366
2: Week 3, Jun 2012 24104
3: Week 4, Jun 2012 23348
4: Week 5, Jun 2012 5204
5: Week 1, Jul 2012 23579
6: Week 2, Jul 2012 11573
We can verify that we have picked the correct intervals by
DT[, .(value = sum(value),
date_range = toString(range(Interval))),
by = .(Week = sprintf("Week %i, %s",
(mday(Interval) -1L) %/% 7L + 1L,
format(Interval, "%b %Y")))]
Week value date_range
1: Week 2, Jun 2012 18366 2012-06-10, 2012-06-14
2: Week 3, Jun 2012 24104 2012-06-15, 2012-06-21
3: Week 4, Jun 2012 23348 2012-06-22, 2012-06-28
4: Week 5, Jun 2012 5204 2012-06-29, 2012-06-30
5: Week 1, Jul 2012 23579 2012-07-01, 2012-07-07
6: Week 2, Jul 2012 11573 2012-07-08, 2012-07-10
which is in line with OP's specification.
Data
library(data.table)
DT <- fread(
"rn Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176", drop = 1L)
DT[, Interval := as.Date(Interval)]
If you are using a data frame, you can easily do this with the tidyquant package. Use the tq_transmute function, which applies a mutation and returns a new data frame. Select the "value" column and apply the xts function apply.weekly. The additional argument FUN = sum will get the aggregate by week.
library(tidyquant)
df
#> # A tibble: 31 x 2
#> Interval value
#> <date> <int>
#> 1 2012-06-10 552
#> 2 2012-06-11 4850
#> 3 2012-06-12 4642
#> 4 2012-06-13 4132
#> 5 2012-06-14 4190
#> 6 2012-06-15 4186
#> 7 2012-06-16 1139
#> 8 2012-06-17 490
#> 9 2012-06-18 5156
#> 10 2012-06-19 4430
#> # ... with 21 more rows
df %>%
tq_transmute(select = value,
mutate_fun = apply.weekly,
FUN = sum)
#> # A tibble: 6 x 2
#> Interval value
#> <date> <int>
#> 1 2012-06-10 552
#> 2 2012-06-17 23629
#> 3 2012-06-24 23872
#> 4 2012-07-01 23667
#> 5 2012-07-08 23552
#> 6 2012-07-10 10902
When you say "aggregate" the values, you mean take their sum? Let's say your data frame is d and assuming d$Interval is of class Date, you can try
# if d$Interval is not of class Date d$Interval <- as.Date(d$Interval)
formatdate <- function(date)
paste0("Week ", (as.numeric(format(date, "%d")) - 1) + 1,
", ", format(date, "%b %Y"))
# change "sum" to your required function
aggregate(d$value, by = list(formatdate(d$Interval)), sum)
# Group.1 x
# 1 Week 1, Jul 2012 3725.667
# 2 Week 2, Jul 2012 3199.500
# 3 Week 2, Jun 2012 3544.000
# 4 Week 3, Jun 2012 3434.000
# 5 Week 4, Jun 2012 3333.143
# 6 Week 5, Jun 2012 3158.667

How to insert rows at variable positions in a dataframe

My original data frame shows changes on one variable (act, measured in seconds) over approx a 2-week period for several individuals (identified by Ring). My problem is that this variable stretches over the change of date (i.e. at midnight) and I wanted to split it in two: from time[i] till just at midnight, and from midnight until time[i+1]. I have added a few more variables that I need for computing these two operations:
modify the ith row (only when date changes) so I can get the portion of act[i] before midnight
insert one extra row (only when date changes) and assign it the other portion of act[i].
For example:
ith row: 01-01-2000 23:55:00 act= 360 seconds
i+1th row: 02-01-2000 00:01:00 act= 30 seconds
i+2th row: 02-01-2000 00:01:30 act= 50 seconds
.
.
.
My goal is to get:
ith row: 01-01-2000 23:55:00 act= 300 seconds # modified row
i+1th row: 02-01-2000 00:00:00 act= 60 seconds # inserted row
i+2th row: 02-01-2000 00:01:00 act= 30 seconds # previously row i+1th
i+3th row: 02-01-2000 00:01:30 act= 30 seconds #previously row i+2th
.
.
.
Data associated to each individual (Ring) stretch over a different period of time, thereby resulting in date changes between individuals that shoudn't be taken into account.
Below, a selection of my ~ 90000-row dataframe (xact) that shows date changes within and between individuals (Ring) and next my code:
Ring time act wd date clock timepos timemn actmn jul
156 6106933 09/06/11 21:37:45 267 dry 09/06/11 21:37:45 2011-06-09 21:37:45 2011-06-10 8535 15134
157 6106933 09/06/11 21:42:12 3417 wet 09/06/11 21:42:12 2011-06-09 21:42:12 2011-06-10 8268 15134
158 6106933 09/06/11 22:39:09 51 dry 09/06/11 22:39:09 2011-06-09 22:39:09 2011-06-10 4851 15134
159 6106933 09/06/11 22:40:00 7317 wet 09/06/11 22:40:00 2011-06-09 22:40:00 2011-06-10 4800 15134
160 6106933 10/06/11 00:41:57 24 dry 10/06/11 00:41:57 2011-06-10 00:41:57 2011-06-11 83883 15135
529 6106933 11/06/11 22:41:57 3177 wet 11/06/11 22:41:57 2011-06-11 22:41:57 2011-06-12 4683 15136
530 6106933 11/06/11 23:34:54 6 dry 11/06/11 23:34:54 2011-06-11 23:34:54 2011-06-12 1506 15136
531 6106933 11/06/11 23:35:00 1779 wet 11/06/11 23:35:00 2011-06-11 23:35:00 2011-06-12 1500 15136
532 6106933 12/06/11 00:04:39 594 dry 12/06/11 00:04:39 2011-06-12 00:04:39 2011-06-13 86121 15137
533 6106933 12/06/11 00:14:33 18840 wet 12/06/11 00:14:33 2011-06-12 00:14:33 2011-06-13 85527 15137
7024 6134701 24/07/11 15:24:14 6 dry 24/07/11 15:24:14 2011-07-24 15:24:14 2011-07-25 30946 15179
7025 6134701 24/07/11 15:24:20 6 wet 24/07/11 15:24:20 2011-07-24 15:24:20 2011-07-25 30940 15179
7026 6134701 24/07/11 15:24:26 810 dry 24/07/11 15:24:26 2011-07-24 15:24:26 2011-07-25 30934 15179
R = unique(xact$Ring)
for ( m in R ) {
for ( i in 1:nrow(xact) ) {
if( xact$jul[i] < xact$jul[i+1] ) {
# modify row i (jul= Julian date)
xact[i] <- c( xact$Ring[i], xact$time[i], xact$actmn[i], xact$wd[i], xact$date[i], xact$clock[i], xact$timepos[i], xact$timemn[i], xact$actmn[i], xact$jul[i] )
# add new row between row i and row i+1
r <- i
newrow <- c( xact$Ring[i], xact$timemn[i], as.numeric(xact$timepos[i+1] - xact$timemn[i]), xact$wd[i], xact$date[i+1], xact$clock[i+1], xact$timemn[i], xact$timemn[i], xact$actmn[i], xact$jul[i+1] )
insertRow <- function( xact, newrow, r ) {
xact[seq( r+1, nrow(xact) + 1), ] <- xact[seq( r, nrow(xact) ), ]
xact[r,] <- newrow
xact
}
}
}
}
I tried to adapt an existing code Add new row to dataframe, at specific row-index, not appended? but produces this message:
I would appreciate any help.
Santi
Here is an example with made-up data:
#create data
DF <- data.frame(time=seq(from=strptime("2013-01-01 01:00","%Y-%m-%d %H:%M"),to=strptime("2013-01-03 01:00","%Y-%m-%d %H:%M"),by=3500))
DF$ring <- 1:2
DF <- DF[order(DF$ID),]
#apply per ring
library(plyr)
DF <- ddply(DF,.(ring),function(df){
#index of date changes
ind <- c(FALSE,diff(as.POSIXlt(df$time)$yday)==1)
add <- df[ind,]
add$time <- round(add$time,"days")
#you can simply rbind and order, no need for inserting
df <- rbind(df,add)
df <- df[order(df$time),]
#it's easier to calculate act here
df$act <- c(diff(as.numeric(df$time)),NA)
df})
time ring act
1 2013-01-01 01:00:00 1 7000
2 2013-01-01 02:56:40 1 7000
3 2013-01-01 04:53:20 1 7000
4 2013-01-01 06:50:00 1 7000
5 2013-01-01 08:46:40 1 7000
6 2013-01-01 10:43:20 1 7000
7 2013-01-01 12:40:00 1 7000
8 2013-01-01 14:36:40 1 7000
9 2013-01-01 16:33:20 1 7000
10 2013-01-01 18:30:00 1 7000
11 2013-01-01 20:26:40 1 7000
12 2013-01-01 22:23:20 1 5800
13 2013-01-02 00:00:00 1 1200
14 2013-01-02 00:20:00 1 7000
15 2013-01-02 02:16:40 1 7000
16 2013-01-02 04:13:20 1 7000
17 2013-01-02 06:10:00 1 7000
18 2013-01-02 08:06:40 1 7000
19 2013-01-02 10:03:20 1 7000
20 2013-01-02 12:00:00 1 7000
21 2013-01-02 13:56:40 1 7000
22 2013-01-02 15:53:20 1 7000
23 2013-01-02 17:50:00 1 7000
24 2013-01-02 19:46:40 1 7000
25 2013-01-02 21:43:20 1 7000
26 2013-01-02 23:40:00 1 NA
27 2013-01-01 01:58:20 2 7000
28 2013-01-01 03:55:00 2 7000
29 2013-01-01 05:51:40 2 7000
30 2013-01-01 07:48:20 2 7000
31 2013-01-01 09:45:00 2 7000
32 2013-01-01 11:41:40 2 7000
33 2013-01-01 13:38:20 2 7000
34 2013-01-01 15:35:00 2 7000
35 2013-01-01 17:31:40 2 7000
36 2013-01-01 19:28:20 2 7000
37 2013-01-01 21:25:00 2 7000
38 2013-01-01 23:21:40 2 2300
39 2013-01-02 00:00:00 2 4700
40 2013-01-02 01:18:20 2 7000
41 2013-01-02 03:15:00 2 7000
42 2013-01-02 05:11:40 2 7000
43 2013-01-02 07:08:20 2 7000
44 2013-01-02 09:05:00 2 7000
45 2013-01-02 11:01:40 2 7000
46 2013-01-02 12:58:20 2 7000
47 2013-01-02 14:55:00 2 7000
48 2013-01-02 16:51:40 2 7000
49 2013-01-02 18:48:20 2 7000
50 2013-01-02 20:45:00 2 7000
51 2013-01-02 22:41:40 2 4700
52 2013-01-03 00:00:00 2 2300
53 2013-01-03 00:38:20 2 NA

Resources