Merging aggregate data in R

Merging aggregate data in R - datetime

Following up my previous question about aggregating hourly data into daily data, I want to continue with (a) monthly aggregate and (b) merging the monthly aggregate into the original dataframe.
My original dataframe looks like this:
Lines <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"
The daily aggregates have been answered in my previous question, and then I can find my way to produce the monthly aggregates from there, to something like this:
Lines <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"
Where the OutdoorAVE is the monthly average of the daily minimum and maximum outdoor temperature. What I want to have in the end is something like this:
Lines <- "Date,Outdoor,Indoor,Month,OutdoorAVE
01/01/2000 01:00,30,25,Jan,31.33
01/01/2000 02:00,31,26,Jan,31.33
01/01/2000 03:00,33,24,Jan,31.33
02/01/2000 01:00,29,25,Feb,31.67
02/01/2000 02:00,27,26,Feb,31.67
02/01/2000 03:00,39,24,Feb,31.67
12/01/2000 02:00,27,26,Dec,31.33
12/01/2000 03:00,39,24,Dec,31.33
12/31/2000 23:00,28,25,Dec,31.33"
I do not know enough R on how to do that. Any help is greatly appreciated.

Try ave and eg POSIXlt to extract the month:
zz <- textConnection(Lines)
Data <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)
Data$Month <- strftime(
as.POSIXlt(Data$Date,format="%m/%d/%Y %H:%M"),
format='%b')
Data$outdoor_ave <- ave(Data$Outdoor,Data$Month,FUN=mean)
Gives :
> Data
Date Outdoor Indoor Month outdoor_ave
1 01/01/2000 01:00 30 25 Jan 31.33333
2 01/01/2000 02:00 31 26 Jan 31.33333
3 01/01/2000 03:00 33 24 Jan 31.33333
4 02/01/2000 01:00 29 25 Feb 31.66667
5 02/01/2000 02:00 27 26 Feb 31.66667
6 02/01/2000 03:00 39 24 Feb 31.66667
7 12/01/2000 02:00 27 26 Dec 31.33333
8 12/01/2000 03:00 39 24 Dec 31.33333
9 12/31/2000 23:00 28 25 Dec 31.33333
Edit : Then just calcualte Month in Data as shown above and use merge :
zz <- textConnection(Lines2) # Lines2 is the aggregated data
Data2 <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)
> merge(Data,Data2[-1],all=T)
Month Date Outdoor Indoor OutdoorAVE
1 Dec 12/01/2000 02:00 27 26 31.33
2 Dec 12/01/2000 03:00 39 24 31.33
3 Dec 12/31/2000 23:00 28 25 31.33
4 Feb 02/01/2000 01:00 29 25 31.67
5 Feb 02/01/2000 02:00 27 26 31.67
6 Feb 02/01/2000 03:00 39 24 31.67
7 Jan 01/01/2000 01:00 30 25 31.33
8 Jan 01/01/2000 02:00 31 26 31.33
9 Jan 01/01/2000 03:00 33 24 31.33

This is tangential to your question, but you may want to use RSQLite and a separate tables for various aggregate values instead, and join the tables with simple SQL commands. If you use many kinds of aggregations your data frame can easily get large and ugly.

Here's a zoo/xts solution. Note that Month is numeric here because you can't mix types in zoo/xts objects.
require(xts) # loads zoo too
Lines1 <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"
con <- textConnection(Lines1)
z <- read.zoo(con, header=TRUE, sep=",",
format="%m/%d/%Y %H:%M", FUN=as.POSIXct)
close(con)
zz <- merge(z, Month=.indexmon(z),
OutdoorAVE=ave(z[,1], .indexmon(z), FUN=mean))
zz
# Outdoor Indoor Month OutdoorAVE
# 2000-01-01 01:00:00 30 25 0 31.33333
# 2000-01-01 02:00:00 31 26 0 31.33333
# 2000-01-01 03:00:00 33 24 0 31.33333
# 2000-02-01 01:00:00 29 25 1 31.66667
# 2000-02-01 02:00:00 27 26 1 31.66667
# 2000-02-01 03:00:00 39 24 1 31.66667
# 2000-12-01 02:00:00 27 26 11 31.33333
# 2000-12-01 03:00:00 39 24 11 31.33333
# 2000-12-31 23:00:00 28 25 11 31.33333
Update: How do get the above result using two different data sets.
Lines2 <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"
con <- textConnection(Lines2)
z2 <- read.zoo(con, header=TRUE, sep=",", format="%m/%d/%Y",
FUN=as.POSIXct, colClasses=c("character","NULL","numeric"))
close(con)
zz2 <- na.locf(merge(z1, Month=.indexmon(z1), OutdoorAVE=z2))[index(z1)]
# same output as zz (above)

Related

Create a function to filter two columns in R

I have replicate this code with 4 different places and 4 different years.
df1 <- df %>% filter(Place == "Al" & year==2016)
rollingMean(df1, pollutant = "O", hours=8, new.name = "mean", data.thresh=75)
Sample of data:
Place O date_time year
Al 23 2016-01-01 01:00:00 2016
Al 15 2016-01-01 02:00:00 2016
Al 18 2016-01-01 03:00:00 2016
Al 18 2016-01-01 04:00:00 2016
Al 20 2016-01-01 05:00:00 2016
Al 21 2016-01-01 06:00:00 2016
Ar 23 2016-01-01 01:00:00 2016
Ar 15 2016-01-01 02:00:00 2016
Ar 18 2016-01-01 03:00:00 2016
Ar 18 2016-01-01 04:00:00 2016
Ar 20 2016-01-01 05:00:00 2016
Ar 21 2016-01-01 06:00:00 2016
Ma 23 2016-01-01 01:00:00 2016
Ma 15 2016-01-01 02:00:00 2016
Ma 18 2016-01-01 03:00:00 2016
Ma 18 2016-01-01 04:00:00 2016
Ma 20 2016-01-01 05:00:00 2016
Ma 21 2016-01-01 06:00:00 2016
Ss 23 2016-01-01 01:00:00 2016
Ss 15 2016-01-01 02:00:00 2016
Ss 18 2016-01-01 03:00:00 2016
Ss 18 2016-01-01 04:00:00 2016
Ss 20 2016-01-01 05:00:00 2016
Ss 21 2016-01-01 06:00:00 2016
How can I optimize my code? I think that I need to loop or map but it is my first time doing this.

You can split the dataset for every unique value of Place and Year and use map to run rollingMean function for each group and combine them into one dataframe.
library(dplyr)
library(purrr)
result <- df %>%
group_split(Place, Year) %>%
map_df(~rollingMean(.x, pollutant = "O", hours=8,
new.name = "mean", data.thresh=75))

how to convert 12 hour to 24 hour in r

I split the time from 2018-12-31 11:45:00 AM to 2018-12-31 and 11:45:00 aAM successfully.
However, I get difficulty that convert "11:45:00 AM" to 24 hours.
I know there are several ways to do that, for example, the most popular way is to use strptime and put format="%I:%M:%S %p. I did that several times and made double checked again and again... but still get N/A in my column. Here is, crimeData is my dataset name, toSplitHrs contains time which is "11:45:00 AM" just like what mentioned:
crimeData$toSplitHrs = strptime(crimeData$SplitHrs, format="%I:%M:%S %p")
Police.Beats SplitMs SplitHrs year month days hours mins sec toSplitHrs
1 28 2018-12-31 11:45:00 2018 12 31 11 45 00 <NA>
2 177 2018-12-31 11:42:00 2018 12 31 11 42 00 <NA>
3 233 2018-12-31 11:30:00 2018 12 31 11 30 00 <NA>
4 91 2018-12-31 11:30:00 2018 12 31 11 30 00 <NA>
5 73 2018-12-31 11:30:00 2018 12 31 11 30 00 <NA>
6 232 2018-12-31 11:27:00 2018 12 31 11 27 00 <NA>
but still, I got N/A result from that...
Also, this dataset contains over 10k observations, I really cannot change them one by one...any suggestions are appreciated!

You can try the format %r for the time, taking into account the am/pm specification (see ?strptime):
strptime("2018-12-31 11:45:00 am", format="%F %r")
#[1] "2018-12-31 11:45:00 CET"
strptime("2018-12-31 11:45:00 pm", format="%F %r")
#[1] "2018-12-31 23:45:00 CET"

Aggregate data by user defined time interval

I have a following dataframe:
df<-data.frame(timecol=as.POSIXct(c("2016-05-31 22:12:27 PDT","2016-05-31 22:25:03 PDT","2016-05-31 23:08:43 PDT","2016-05-31 23:24:10 PDT","2016-06-01 02:00:56 PDT","2016-06-01 03:00:56 PDT","2016-06-01 05:00:56 PDT","2016-06-01 22:12:27 PDT","2016-06-01 22:25:03 PDT","2016-06-01 23:08:43 PDT","2016-06-01 23:24:10 PDT","2016-06-02 02:00:56 PDT","2016-06-02 03:00:56 PDT","2016-06-02 05:00:56 PDT")),value=sample(1:100,14))
> df
timecol value
1 2016-05-31 22:12:27 100
2 2016-05-31 22:25:03 86
3 2016-05-31 23:08:43 39
4 2016-05-31 23:24:10 91
5 2016-06-01 02:00:56 32
6 2016-06-01 03:00:56 93
7 2016-06-01 05:00:56 53
8 2016-06-01 22:12:27 54
9 2016-06-01 22:25:03 76
10 2016-06-01 23:08:43 19
11 2016-06-01 23:24:10 56
12 2016-06-02 02:00:56 20
13 2016-06-02 03:00:56 3
14 2016-06-02 05:00:56 66
I need to aggregate the value column based of a predefined time interval - from 19pm this day to 7am the next day. I was thinking smth like this:
tm <- seq(as.POSIXct("2016-05-31 19:00:00 PDT"),as.POSIXct("2016-06-02 07:00:00 PDT"), by = "12 hours")
aggregate(df$value, list(day = cut(tm, "days")), sum)
but I can't figure out what's wrong.

insert new rows to the time series data, with date added automatically

I have a time-series data frame looks like:
TS.1
2015-09-01 361656.7
2015-09-02 370086.4
2015-09-03 346571.2
2015-09-04 316616.9
2015-09-05 342271.8
2015-09-06 361548.2
2015-09-07 342609.2
2015-09-08 281868.8
2015-09-09 297011.1
2015-09-10 295160.5
2015-09-11 287926.9
2015-09-12 323365.8
Now, what I want to do is add some new data points (rows) to the existing data frame, say,
320123.5
323521.7
How can I added corresponding date to each row? The data is just sequentially inhered from the last row.
Is there any package can do this automatically, so that the only thing I do is to insert new data point?

Here's some play data:
df <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-31"), "days"), x = seq(31))
new.x <- c(32, 33)
This adds the extra observations along with the proper sequence of dates:
new.df <- data.frame(date=seq(max(df$date) + 1, max(df$date) + length(new.x), "days"), x=new.x)
Then just rbind them to get your expanded data frame:
rbind(df, new.df)
date x
1 2015-01-01 1
2 2015-01-02 2
3 2015-01-03 3
4 2015-01-04 4
5 2015-01-05 5
6 2015-01-06 6
7 2015-01-07 7
8 2015-01-08 8
9 2015-01-09 9
10 2015-01-10 10
11 2015-01-11 11
12 2015-01-12 12
13 2015-01-13 13
14 2015-01-14 14
15 2015-01-15 15
16 2015-01-16 16
17 2015-01-17 17
18 2015-01-18 18
19 2015-01-19 19
20 2015-01-20 20
21 2015-01-21 21
22 2015-01-22 22
23 2015-01-23 23
24 2015-01-24 24
25 2015-01-25 25
26 2015-01-26 26
27 2015-01-27 27
28 2015-01-28 28
29 2015-01-29 29
30 2015-01-30 30
31 2015-01-31 31
32 2015-02-01 32
33 2015-02-02 33

Generate entries in time series data

I want to generate a row (with zero ammount) for each missing month (until the current) in the following dataframe. Can you please give me a hand in this? Thanks!
trans_date ammount
1 2004-12-01 2968.91
2 2005-04-01 500.62
3 2005-05-01 434.30
4 2005-06-01 549.15
5 2005-07-01 276.77
6 2005-09-01 548.64
7 2005-10-01 761.69
8 2005-11-01 636.77
9 2005-12-01 1517.58
10 2006-03-01 719.09
11 2006-04-01 1231.88
12 2006-05-01 580.46
13 2006-07-01 1468.43
14 2006-10-01 692.22
15 2006-11-01 505.81
16 2006-12-01 1589.70
17 2007-03-01 1559.82
18 2007-06-01 764.98
19 2007-07-01 964.77
20 2007-09-01 405.18
21 2007-11-01 112.42
22 2007-12-01 1134.08
23 2008-02-01 269.72
24 2008-03-01 208.96
25 2008-04-01 353.58
26 2008-05-01 756.00
27 2008-06-01 747.85
28 2008-07-01 781.62
29 2008-09-01 195.36
30 2008-10-01 424.24
31 2008-12-01 166.23
32 2009-02-01 237.11
33 2009-04-01 110.94
34 2009-07-01 191.29
35 2009-11-01 153.42
36 2009-12-01 222.87
37 2010-09-01 1259.97
38 2010-11-01 375.61
39 2010-12-01 496.48
40 2011-02-01 360.07
41 2011-03-01 324.95
42 2011-04-01 566.93
43 2011-06-01 281.19
44 2011-08-01 428.04
'data.frame': 44 obs. of 2 variables:
$ trans_date : Date, format: "2004-12-01" "2005-04-01" "2005-05-01" "2005-06-01" ...
$ ammount: num 2969 501 434 549 277 ...

you can use seq.Date and merge:
> str(df)
'data.frame': 44 obs. of 2 variables:
$ trans_date: Date, format: "2004-12-01" "2005-04-01" "2005-05-01" "2005-06-01" ...
$ ammount : num 2969 501 434 549 277 ...
> mns <- data.frame(trans_date = seq.Date(min(df$trans_date), max(df$trans_date), by = "month"))
> df2 <- merge(mns, df, all = TRUE)
> df2$ammount <- ifelse(is.na(df2$ammount), 0, df2$ammount)
> head(df2)
trans_date ammount
1 2004-12-01 2968.91
2 2005-01-01 0.00
3 2005-02-01 0.00
4 2005-03-01 0.00
5 2005-04-01 500.62
6 2005-05-01 434.30
and if you need months until current, use this:
mns <- data.frame(trans_date = seq.Date(min(df$trans_date), Sys.Date(), by = "month"))
note that it is sufficient to call simply seq instead of seq.Date if the parameters are Date class.

If you're using xts objects, you can use timeBasedSeq and merge.xts. Assuming your original data is in an object Data:
# create xts object:
# no comma on the first subset (Data['ammount']) keeps column name;
# as.Date needs a vector, so use comma (Data[,'trans_date'])
x <- xts(Data['ammount'],as.Date(Data[,'trans_date']))
# create a time-based vector from 2004-12-01 to 2011-08-01. The "m" denotes
# monthly time-steps. By default this returns a yearmon class. Use
# retclass="Date" to return a Date vector.
d <- timeBasedSeq(paste(start(x),end(x),"m",sep="/"), retclass="Date")
# merge x with an "empty" xts object, xts(,d), filling with zeros
y <- merge(x,xts(,d),fill=0)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Merging aggregate data in R - datetime

This is tangential to your question, but you may want to use RSQLite and a separate tables for various aggregate values instead, and join the tables with simple SQL commands. If you use many kinds of aggregations your data frame can easily get large and ugly.

Related

Create a function to filter two columns in R

how to convert 12 hour to 24 hour in r

Aggregate data by user defined time interval

insert new rows to the time series data, with date added automatically

Generate entries in time series data

Categories

Resources