I have data measured over a 7 day period. Part of the data looks as follows:
start wk end wk X1
2/1/2004 2/7/2004 89
2/8/2004 2/14/2004 65
2/15/2004 2/21/2004 64
2/22/2004 2/28/2004 95
2/29/2004 3/6/2004 79
3/7/2004 3/13/2004 79
I want to convert this weekly (7 day) data into monthly data using weighted averages of X1. Notice that some of the 7 day X1 data will overlap from one month to the other (X1=79 for the period 2/29 to 3/6 of 2004).
Specifically I would obtain the February 2004 monthly data (say, Y1) the following way
(7*89 + 7*65 + 7*64 + 7*95 + 1*79)/29 = 78.27
Does R have a function that will properly do this? (to.monthly in the xts library DOES NOT do what I need) If, not what is the best way to do this in R?
Convert the data to daily data and then aggregate:
Lines <- "start end X1
2/1/2004 2/7/2004 89
2/8/2004 2/14/2004 65
2/15/2004 2/21/2004 64
2/22/2004 2/28/2004 95
2/29/2004 3/6/2004 79
3/7/2004 3/13/2004 79
"
library(zoo)
# read data into data frame DF
DF <- read.table(text = Lines, header = TRUE)
# convert date columns to "Date" class
fmt <- "%m/%d/%Y"
DF <- transform(DF, start = as.Date(start, fmt), end = as.Date(end, fmt))
# convert to daily zoo series
to.day <- function(i) with(DF, zoo(X1[i], seq(start[i], end[i], "day")))
z.day <- do.call(c, lapply(1:nrow(DF), to.day))
# aggregate by month
aggregate(z.day, as.yearmon, mean)
The last line gives:
Feb 2004 Mar 2004
78.27586 79.00000
If you are willing to get rid of "end week" from your DF, apply.monthly will work like a charm.
DF.xts <- xts(DF$X1, order.by=DF$start_wk)
DF.xts.monthly <- apply.monthly(DF.xts, "sum")
Then you can always recreate end dates if you absolutely need them by adding 30.
Related
I have data measuring precipitation daily using R. My dates are in format 2008-01-01 and range for 10 years. I am trying to aggregate from 2008-10-01 to 2009-09-31 but I am not sure how. Is there a way in aggregate to set a start date of aggregation and group.
My current code is
data<- aggregate(data$total_snow_cm, by=list(data$year), FUN = 'sum')
but this output gives me a sum total of the snowfall for each year from jan - dec but I want it to include oct / 08 to sept / 09.
Assuming your data are in long format, I'd do something like this:
library(tidyverse)
#make sure R knows your dates are dates - you mention they're 'yyyy-mm-dd', so
yourdataframe <- yourdataframe %>%
mutate(yourcolumnforprecipdate = ymd(yourcolumnforprecipdate)
#in this script or another, define a water year function
water_year <- function(date) {
ifelse(month(date) < 10, year(date), year(date)+1)}
#new wateryear column for your data, using your new function
yourdataframe <- yourdataframe %>%
mutate(wateryear = water_year(yourcolumnforprecipdate)
#now group by water year (and location if there's more than one)
#and sum and create new data.frame
wy_sums <- yourdataframe %>% group_by(locationcolumn, wateryear) %>%
summarize(wy_totalprecip = sum(dailyprecip))
For more info, read up on the tidyverse 's great sublibrary called lubridate -
where the ymd() function is from. There are others like ymd_hms(). mutate() is from the tidyverse's dplyr libary. Both libraries are extremely useful!
I'd like to give the actual answer to the question, where the aggregate() way was asked.
You may use with() to wrap the data specification around aggregate(). In the with() you can define date intervals as you can with numbers.
df1.agg <- with(df1[as.Date("2008-10-01") <= df1$year & df1$year <= as.Date("2009-09-30"), ],
aggregate(total_snow_cm, by=list(year), FUN=sum))
Another way is to use aggregate()'s formula interface, where data and, hence, also the interval can be specified inside the aggregate() call.
df1.agg <- aggregate(total_snow_cm ~ year,
data=df1[as.Date("2008-10-01") <= df1$year &
df1$year <= as.Date("2009-09-30"), ], FUN=sum)
Result
head(df1.agg)
# year total_snow_cm
# 1 2008-10-01 171
# 2 2008-10-02 226
# 3 2008-10-03 182
# 4 2008-10-04 129
# 5 2008-10-05 135
# 6 2008-10-06 222
Data
set.seed(42)
df1 <- data.frame(total_snow_cm=sample(120:240, 4018, replace=TRUE),
year=seq(as.Date("2000-01-01"),as.Date("2010-12-31"), by="day"))
I have the following data:
Date,Rain
1979_8_9_0,8.775
1979_8_9_6,8.775
1979_8_9_12,8.775
1979_8_9_18,8.775
1979_8_10_0,0
1979_8_10_6,0
1979_8_10_12,0
1979_8_10_18,0
1979_8_11_0,8.025
1979_8_12_12,0
1979_8_12_18,0
1979_8_13_0,8.025
[1] The data is six hourly but some dates have incomplete 6 hourly data. For example, August 11 1979 has only one value at 00H. I would like to get the daily accumulated from this kind of data using R. Any suggestion on how to do this easily in R?
I'll appreciate any help.
You can transform your data to dates very easily with:
dat$Date <- as.Date(strptime(dat$Date, '%Y_%m_%d_%H'))
After that you should aggregate with:
aggregate(Rain ~ Date, dat, sum)
The result:
Date Rain
1 1979-08-09 35.100
2 1979-08-10 0.000
3 1979-08-11 8.025
4 1979-08-12 0.000
5 1979-08-13 8.025
Based on the comment of Henrik, you can also transform to dates with:
dat$Date <- as.Date(dat$Date, '%Y_%m_%d')
# split the "date" variable into new, separate variable
splitDate <- stringr::str_split_fixed(string = df$Date, pattern = "_", n = 4)
df$Day <- splitDate[,3]
# split data by Day, loop over each split and add rain variable
unlist(lapply(split(df$Rain, df$Day), sum))
I've been scouring the net but haven't found a solution to this quite possibly simple problem.
This is the half-hourly data using the library 'xts',
library(xts)
data.xts <- as.xts(1:nrow(data), as.POSIXct("2007-08-24 17:30:00") +
1800 * (1:nrow(data)))
data.xts <-as.data.frame(data.xts)
I changed it to data.frame because the original data is in data.frame format. Actually, in the original data frame, there is a time_stamp column and I prefer if I can just use the time_stamp column instead of using the 'xts' format.
How can I average every hourly data for a month so that I can plot a hourly time series of 24 hours for the different months?
For example,
2007-08-24 17:30:00 1
2007-08-25 17:00:00 47
2007-08-25 17:30:00 48
2007-08-26 17:00:00 95
would be averaged for the month of August 2007, etc.
Goal is to plot averaged 24-hourly time series for each month.
Thanks!
Try
library(dplyr)
res <- dat %>%
group_by(month=format(datetime, '%m'),
#year=format(datetime, '%Y'), #if you need year also
# as grouping variable
hour=format(as.POSIXct(cut(datetime, breaks='hour')), '%H')) %>%
summarise(Meanval=mean(val, na.rm=TRUE))
head(res,3)
# month hour Meanval
#1 01 00 -0.02780036
#2 01 01 -0.06589948
#3 01 02 -0.02166218
Update
If your datetime is POSIXlt you could convert it to POSIXct.
dat$datetime <- as.POSIXlt(dat$datetime)
By running the above code, I get the error
# Error: column 'datetime' has unsupported type
You could use mutate and convert the datetime to POSIXct class by as.POSIXct
res1 <- dat %>%
mutate(datetime= as.POSIXct(datetime)) %>%
group_by(month=format(datetime, '%m'),
#year=format(datetime, '%Y'), #if you need year also
# as grouping variable
hour=format(as.POSIXct(cut(datetime, breaks='hour')), '%H')) %>%
summarise(Meanval=mean(val, na.rm=TRUE))
data
set.seed(24)
dat <- data.frame(datetime=seq(Sys.time(), by='1 hour', length.out=2000),
val=rnorm(2000))
If I understand you correctly, you want to average all the values for a given hour, for all the days in a given month, and do this for all months. So average all the values between midnight and 00:59:59 for all the days in a given month, etc.
I see that you want to avoid xts but aggregate.zoo(...) was designed for this, and avoids dplyr and cut.
library(xts)
# creates sample dataset...
set.seed(1)
data <- rnorm(1000)
data.xts <- as.xts(data, as.POSIXct("2007-08-24 17:30:00") +
1800 * (1:length(data)))
# using aggregate.zoo(...)
as.hourly <- function(x) format(x,"%Y-%m %H")
result <- aggregate(data.xts,by=as.hourly,mean)
result <- data.frame(result)
head(result)
# result
# 2007-08 00 0.12236024
# 2007-08 01 0.41593567
# 2007-08 02 0.22670817
# 2007-08 03 0.23402842
# 2007-08 04 0.22175078
# 2007-08 05 0.05081899
I've been scouring the net but haven't found a solution to this quite possibly simple problem.
This is the half-hourly data using the library 'xts',
library(xts)
data.xts <- as.xts(1:nrow(data), as.POSIXct("2007-08-24 17:30:00") +
1800 * (1:nrow(data)))
data.xts <-as.data.frame(data.xts)
I changed it to data.frame because the original data is in data.frame format. Actually, in the original data frame, there is a time_stamp column and I prefer if I can just use the time_stamp column instead of using the 'xts' format.
How can I average every hourly data for a month so that I can plot a hourly time series of 24 hours for the different months?
For example,
2007-08-24 17:30:00 1
2007-08-25 17:00:00 47
2007-08-25 17:30:00 48
2007-08-26 17:00:00 95
would be averaged for the month of August 2007, etc.
Goal is to plot averaged 24-hourly time series for each month.
Thanks!
Try
library(dplyr)
res <- dat %>%
group_by(month=format(datetime, '%m'),
#year=format(datetime, '%Y'), #if you need year also
# as grouping variable
hour=format(as.POSIXct(cut(datetime, breaks='hour')), '%H')) %>%
summarise(Meanval=mean(val, na.rm=TRUE))
head(res,3)
# month hour Meanval
#1 01 00 -0.02780036
#2 01 01 -0.06589948
#3 01 02 -0.02166218
Update
If your datetime is POSIXlt you could convert it to POSIXct.
dat$datetime <- as.POSIXlt(dat$datetime)
By running the above code, I get the error
# Error: column 'datetime' has unsupported type
You could use mutate and convert the datetime to POSIXct class by as.POSIXct
res1 <- dat %>%
mutate(datetime= as.POSIXct(datetime)) %>%
group_by(month=format(datetime, '%m'),
#year=format(datetime, '%Y'), #if you need year also
# as grouping variable
hour=format(as.POSIXct(cut(datetime, breaks='hour')), '%H')) %>%
summarise(Meanval=mean(val, na.rm=TRUE))
data
set.seed(24)
dat <- data.frame(datetime=seq(Sys.time(), by='1 hour', length.out=2000),
val=rnorm(2000))
If I understand you correctly, you want to average all the values for a given hour, for all the days in a given month, and do this for all months. So average all the values between midnight and 00:59:59 for all the days in a given month, etc.
I see that you want to avoid xts but aggregate.zoo(...) was designed for this, and avoids dplyr and cut.
library(xts)
# creates sample dataset...
set.seed(1)
data <- rnorm(1000)
data.xts <- as.xts(data, as.POSIXct("2007-08-24 17:30:00") +
1800 * (1:length(data)))
# using aggregate.zoo(...)
as.hourly <- function(x) format(x,"%Y-%m %H")
result <- aggregate(data.xts,by=as.hourly,mean)
result <- data.frame(result)
head(result)
# result
# 2007-08 00 0.12236024
# 2007-08 01 0.41593567
# 2007-08 02 0.22670817
# 2007-08 03 0.23402842
# 2007-08 04 0.22175078
# 2007-08 05 0.05081899
I'm trying to load time series in R with the 'zoo' library.
The observations I have varying precision. Some have the day/month/year, others only month and year, and others year:
02/10/1915
1917
07/1917
07/1918
30/08/2018
Subsequently, I need to aggregate the rows by year, year and month.
The basic R as.Date function doesn't handle that.
How can I model this data with zoo?
Thanks,
Mulone
We use the test data formed from the index data in the question followed by a number:
# test data
Lines <- "02/10/1915 1
1917 2
07/1917 3
07/1918 4
30/08/2018 5"
yearly aggregation
library(zoo)
to.year <- function(x) as.numeric(sub(".*/", "", as.character(x)))
read.zoo(text = Lines, FUN = to.year, aggregate = mean)
The last line returns:
1915 1917 1918 2018
1.0 2.5 4.0 5.0
year/month aggregation
Since year/month aggregation of data with no months makes no sense we first drop the year only data and aggregate the rest:
DF <- read.table(text = Lines, as.is = TRUE)
# remove year-only records. DF.ym has at least year and month.
yr <- suppressWarnings(as.numeric(DF[[1]]))
DF.ym <- DF[is.na(yr), ]
# remove day, if present, and convert to yearmon.
to.yearmon <- function(x) as.yearmon( sub("\\d{1,2}/(\\d{1,2}/)", "\\1", x), "%m/%Y" )
read.zoo(DF.ym, FUN = to.yearmon, aggregate = mean)
The last line gives:
Oct 1915 Jul 1917 Jul 1918 Aug 2018
1 3 4 5
UPDATE: simplifications