Aggregate data by user defined time interval

Aggregate data by user defined time interval - r

I have a following dataframe:
df<-data.frame(timecol=as.POSIXct(c("2016-05-31 22:12:27 PDT","2016-05-31 22:25:03 PDT","2016-05-31 23:08:43 PDT","2016-05-31 23:24:10 PDT","2016-06-01 02:00:56 PDT","2016-06-01 03:00:56 PDT","2016-06-01 05:00:56 PDT","2016-06-01 22:12:27 PDT","2016-06-01 22:25:03 PDT","2016-06-01 23:08:43 PDT","2016-06-01 23:24:10 PDT","2016-06-02 02:00:56 PDT","2016-06-02 03:00:56 PDT","2016-06-02 05:00:56 PDT")),value=sample(1:100,14))
> df
timecol value
1 2016-05-31 22:12:27 100
2 2016-05-31 22:25:03 86
3 2016-05-31 23:08:43 39
4 2016-05-31 23:24:10 91
5 2016-06-01 02:00:56 32
6 2016-06-01 03:00:56 93
7 2016-06-01 05:00:56 53
8 2016-06-01 22:12:27 54
9 2016-06-01 22:25:03 76
10 2016-06-01 23:08:43 19
11 2016-06-01 23:24:10 56
12 2016-06-02 02:00:56 20
13 2016-06-02 03:00:56 3
14 2016-06-02 05:00:56 66
I need to aggregate the value column based of a predefined time interval - from 19pm this day to 7am the next day. I was thinking smth like this:
tm <- seq(as.POSIXct("2016-05-31 19:00:00 PDT"),as.POSIXct("2016-06-02 07:00:00 PDT"), by = "12 hours")
aggregate(df$value, list(day = cut(tm, "days")), sum)
but I can't figure out what's wrong.

Related

Aggregate on a daily basis in R

I'm borrowing the reproducible example given here:
Aggregate daily level data to weekly level in R
since it's pretty much close to what I want to do.
Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176
In his question, he asks to aggregate on weekly intervals, what I'd like to do is aggregate on a "day of the week basis".
So I'd like to have a table similar to that one, adding the values of all the same day of the week:
Day of the week value
1 "Sunday" 60000
2 "Monday" 50000
3 "Tuesday" 60000
4 "Wednesday" 50000
5 "Thursday" 60000
6 "Friday" 50000
7 "Saturday" 60000

You can try:
aggregate(d$value, list(weekdays(as.Date(d$Interval))), sum)

We can group them by weekly intervals using weekdays :
library(dplyr)
df %>%
group_by(Day_Of_The_Week = weekdays(as.Date(Interval))) %>%
summarise(value = sum(value))
# Day_Of_The_Week value
# <chr> <int>
#1 Friday 16903
#2 Monday 26368
#3 Saturday 4738
#4 Sunday 2975
#5 Thursday 17858
#6 Tuesday 23772
#7 Wednesday 13560

We can do this with data.table
library(data.table)
setDT(df1)[, .(value = sum(value)), .(Dayofweek = weekdays(as.Date(Interval)))]
# Dayofweek value
#1: Sunday 2975
#2: Monday 26368
#3: Tuesday 23772
#4: Wednesday 13560
#5: Thursday 17858
#6: Friday 16903
#7: Saturday 4738

using lubridate https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
df1$Weekday=wday(arrive,label=TRUE)
library(data.table)
df1=data.table(df1)
df1[,sum(value),Weekday]

plotting time series data in ggplot2 with facet_wrap

I'm trying to plot a ts data divided by year for comparison.
Problem is I can't figure out how to force ggplot to skip missing dates on each plot.
My data structure looks like this:
> head(pmWaw)
date stacja pm25 pm10 season year month
1 2014-01-01 00:00:00 MzWarNiepodKom 100 122 winter (DJF) 2014 1
2 2014-01-01 00:00:00 MzWarszUrsynow 63 68 winter (DJF) 2014 1
3 2014-01-01 00:00:00 MzWarTarKondra NA 79 winter (DJF) 2014 1
4 2014-01-01 01:00:00 MzWarNiepodKom 69 77 winter (DJF) 2014 1
5 2014-01-01 01:00:00 MzWarszUrsynow 36 38 winter (DJF) 2014 1
6 2014-01-01 01:00:00 MzWarTarKondra NA 90 winter (DJF) 2014 1
> tail(pmWaw)
date stacja pm25 pm10 season year month
52483 2015-12-30 22:00:00 MzWarAlNiepo 36 47 winter (DJF) 2015 12
52484 2015-12-30 22:00:00 MzWarKondrat 26 29 winter (DJF) 2015 12
52485 2015-12-30 22:00:00 MzWarWokalna 36 44 winter (DJF) 2015 12
52486 2015-12-30 23:00:00 MzWarAlNiepo 39 59 winter (DJF) 2015 12
52487 2015-12-30 23:00:00 MzWarKondrat 36 39 winter (DJF) 2015 12
52488 2015-12-30 23:00:00 MzWarWokalna 40 49 winter (DJF) 2015 12
ggplot2 code I came up with is:
pmWaw %>%
ggplot(aes(x=date, y=pm25)) +
geom_jitter(alpha=0.5) +
geom_smooth() +
facet_wrap( ~ year)
Resulting plot has gaps in each year that I'd like to remove, but can't figure out how:

Try scales = 'free_x' in facet_wrap
like this
pmWaw %>%
ggplot(aes(x=date, y=pm25)) +
geom_jitter(alpha=0.5) +
geom_smooth() +
facet_wrap( ~ year, scales = "free_x")

R - aggregate daily to weekly with start date as Saturday

I have daily data and I want to convert to weekly with Week starting on Saturday.
date value
1 11/5/2016 30
2 11/6/2016 20
3 11/7/2016 12
4 11/8/2016 22
5 11/9/2016 48
6 11/10/2016 50
7 11/11/2016 47
8 11/12/2016 12
9 11/13/2016 19
10 11/14/2016 31
11 11/15/2016 43
12 11/16/2016 26
13 11/17/2016 33
14 11/18/2016 36
15 11/19/2016 14
16 11/20/2016 15
17 11/21/2016 36
18 11/22/2016 38
19 11/23/2016 28
20 11/24/2016 21
21 11/25/2016 13
I tried the following but it assumes Start of Week on Monday
data = as.xts(df$value,order.by=as.Date(df$date))
weekly = apply.weekly(data,sum)
I want the output to be aggregated by Saturday as Start Of Week.

The order.by statement in xts call is not with the correct format of Date class
data <- xts(df$value, order.by = as.Date(df$date, '%m/%d/%Y'))
tapply(data[,1], cumsum(format(index(data), '%w')==6), sum)

Generate entries in time series data

I want to generate a row (with zero ammount) for each missing month (until the current) in the following dataframe. Can you please give me a hand in this? Thanks!
trans_date ammount
1 2004-12-01 2968.91
2 2005-04-01 500.62
3 2005-05-01 434.30
4 2005-06-01 549.15
5 2005-07-01 276.77
6 2005-09-01 548.64
7 2005-10-01 761.69
8 2005-11-01 636.77
9 2005-12-01 1517.58
10 2006-03-01 719.09
11 2006-04-01 1231.88
12 2006-05-01 580.46
13 2006-07-01 1468.43
14 2006-10-01 692.22
15 2006-11-01 505.81
16 2006-12-01 1589.70
17 2007-03-01 1559.82
18 2007-06-01 764.98
19 2007-07-01 964.77
20 2007-09-01 405.18
21 2007-11-01 112.42
22 2007-12-01 1134.08
23 2008-02-01 269.72
24 2008-03-01 208.96
25 2008-04-01 353.58
26 2008-05-01 756.00
27 2008-06-01 747.85
28 2008-07-01 781.62
29 2008-09-01 195.36
30 2008-10-01 424.24
31 2008-12-01 166.23
32 2009-02-01 237.11
33 2009-04-01 110.94
34 2009-07-01 191.29
35 2009-11-01 153.42
36 2009-12-01 222.87
37 2010-09-01 1259.97
38 2010-11-01 375.61
39 2010-12-01 496.48
40 2011-02-01 360.07
41 2011-03-01 324.95
42 2011-04-01 566.93
43 2011-06-01 281.19
44 2011-08-01 428.04
'data.frame': 44 obs. of 2 variables:
$ trans_date : Date, format: "2004-12-01" "2005-04-01" "2005-05-01" "2005-06-01" ...
$ ammount: num 2969 501 434 549 277 ...

you can use seq.Date and merge:
> str(df)
'data.frame': 44 obs. of 2 variables:
$ trans_date: Date, format: "2004-12-01" "2005-04-01" "2005-05-01" "2005-06-01" ...
$ ammount : num 2969 501 434 549 277 ...
> mns <- data.frame(trans_date = seq.Date(min(df$trans_date), max(df$trans_date), by = "month"))
> df2 <- merge(mns, df, all = TRUE)
> df2$ammount <- ifelse(is.na(df2$ammount), 0, df2$ammount)
> head(df2)
trans_date ammount
1 2004-12-01 2968.91
2 2005-01-01 0.00
3 2005-02-01 0.00
4 2005-03-01 0.00
5 2005-04-01 500.62
6 2005-05-01 434.30
and if you need months until current, use this:
mns <- data.frame(trans_date = seq.Date(min(df$trans_date), Sys.Date(), by = "month"))
note that it is sufficient to call simply seq instead of seq.Date if the parameters are Date class.

If you're using xts objects, you can use timeBasedSeq and merge.xts. Assuming your original data is in an object Data:
# create xts object:
# no comma on the first subset (Data['ammount']) keeps column name;
# as.Date needs a vector, so use comma (Data[,'trans_date'])
x <- xts(Data['ammount'],as.Date(Data[,'trans_date']))
# create a time-based vector from 2004-12-01 to 2011-08-01. The "m" denotes
# monthly time-steps. By default this returns a yearmon class. Use
# retclass="Date" to return a Date vector.
d <- timeBasedSeq(paste(start(x),end(x),"m",sep="/"), retclass="Date")
# merge x with an "empty" xts object, xts(,d), filling with zeros
y <- merge(x,xts(,d),fill=0)

Merging aggregate data in R

Following up my previous question about aggregating hourly data into daily data, I want to continue with (a) monthly aggregate and (b) merging the monthly aggregate into the original dataframe.
My original dataframe looks like this:
Lines <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"
The daily aggregates have been answered in my previous question, and then I can find my way to produce the monthly aggregates from there, to something like this:
Lines <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"
Where the OutdoorAVE is the monthly average of the daily minimum and maximum outdoor temperature. What I want to have in the end is something like this:
Lines <- "Date,Outdoor,Indoor,Month,OutdoorAVE
01/01/2000 01:00,30,25,Jan,31.33
01/01/2000 02:00,31,26,Jan,31.33
01/01/2000 03:00,33,24,Jan,31.33
02/01/2000 01:00,29,25,Feb,31.67
02/01/2000 02:00,27,26,Feb,31.67
02/01/2000 03:00,39,24,Feb,31.67
12/01/2000 02:00,27,26,Dec,31.33
12/01/2000 03:00,39,24,Dec,31.33
12/31/2000 23:00,28,25,Dec,31.33"
I do not know enough R on how to do that. Any help is greatly appreciated.

Try ave and eg POSIXlt to extract the month:
zz <- textConnection(Lines)
Data <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)
Data$Month <- strftime(
as.POSIXlt(Data$Date,format="%m/%d/%Y %H:%M"),
format='%b')
Data$outdoor_ave <- ave(Data$Outdoor,Data$Month,FUN=mean)
Gives :
> Data
Date Outdoor Indoor Month outdoor_ave
1 01/01/2000 01:00 30 25 Jan 31.33333
2 01/01/2000 02:00 31 26 Jan 31.33333
3 01/01/2000 03:00 33 24 Jan 31.33333
4 02/01/2000 01:00 29 25 Feb 31.66667
5 02/01/2000 02:00 27 26 Feb 31.66667
6 02/01/2000 03:00 39 24 Feb 31.66667
7 12/01/2000 02:00 27 26 Dec 31.33333
8 12/01/2000 03:00 39 24 Dec 31.33333
9 12/31/2000 23:00 28 25 Dec 31.33333
Edit : Then just calcualte Month in Data as shown above and use merge :
zz <- textConnection(Lines2) # Lines2 is the aggregated data
Data2 <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)
> merge(Data,Data2[-1],all=T)
Month Date Outdoor Indoor OutdoorAVE
1 Dec 12/01/2000 02:00 27 26 31.33
2 Dec 12/01/2000 03:00 39 24 31.33
3 Dec 12/31/2000 23:00 28 25 31.33
4 Feb 02/01/2000 01:00 29 25 31.67
5 Feb 02/01/2000 02:00 27 26 31.67
6 Feb 02/01/2000 03:00 39 24 31.67
7 Jan 01/01/2000 01:00 30 25 31.33
8 Jan 01/01/2000 02:00 31 26 31.33
9 Jan 01/01/2000 03:00 33 24 31.33

This is tangential to your question, but you may want to use RSQLite and a separate tables for various aggregate values instead, and join the tables with simple SQL commands. If you use many kinds of aggregations your data frame can easily get large and ugly.

Here's a zoo/xts solution. Note that Month is numeric here because you can't mix types in zoo/xts objects.
require(xts) # loads zoo too
Lines1 <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"
con <- textConnection(Lines1)
z <- read.zoo(con, header=TRUE, sep=",",
format="%m/%d/%Y %H:%M", FUN=as.POSIXct)
close(con)
zz <- merge(z, Month=.indexmon(z),
OutdoorAVE=ave(z[,1], .indexmon(z), FUN=mean))
zz
# Outdoor Indoor Month OutdoorAVE
# 2000-01-01 01:00:00 30 25 0 31.33333
# 2000-01-01 02:00:00 31 26 0 31.33333
# 2000-01-01 03:00:00 33 24 0 31.33333
# 2000-02-01 01:00:00 29 25 1 31.66667
# 2000-02-01 02:00:00 27 26 1 31.66667
# 2000-02-01 03:00:00 39 24 1 31.66667
# 2000-12-01 02:00:00 27 26 11 31.33333
# 2000-12-01 03:00:00 39 24 11 31.33333
# 2000-12-31 23:00:00 28 25 11 31.33333
Update: How do get the above result using two different data sets.
Lines2 <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"
con <- textConnection(Lines2)
z2 <- read.zoo(con, header=TRUE, sep=",", format="%m/%d/%Y",
FUN=as.POSIXct, colClasses=c("character","NULL","numeric"))
close(con)
zz2 <- na.locf(merge(z1, Month=.indexmon(z1), OutdoorAVE=z2))[index(z1)]
# same output as zz (above)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Aggregate data by user defined time interval - r

Related

Aggregate on a daily basis in R

plotting time series data in ggplot2 with facet_wrap

R - aggregate daily to weekly with start date as Saturday

Generate entries in time series data

Merging aggregate data in R

Categories

Resources