Calculate average value over multiple years for each hour and day - r

I am trying to calculate an average over multiple years for hourly data. I want to retain the days and hours and average over the years. I feel like this should be simple but I have looked around for an answer and not found one.
I am using R version 3.0.3.
start <- ISOdatetime(1970, 1, 1, hour=0, min=0, sec=0, tz="GMT")
end <- ISOdatetime(1971, 12, 31, hour=18, min=0, sec=0, tz="GMT")
set.seed(1)
z <- zooreg(rnorm(2920), start = start , end = end, frequency = 4, deltat = 21600)
#attempt to aggregate ... doesn't work
z.daily.agg <- aggregate(z, as.POSIXct(cut(time(z), "6 hours", include=T)), mean)
What I would like for the output is the following:
01-01 00:00 average of all January 1st zero hours from 1970-1971
01-01 06:00 average of all January 6th zero hours from 1970-1971
Thanks for your assistance with this!

I believe this will work - using the hour function from the lubridate package.
require(lubridate)
aggregate(z, hour(index(z)), mean)
Edit in response to your comments - sorry, I didn't realise exactly what you wanted. You can average across each hour by day by month across the two years (which I think is what you want) like so:
aggregate(z ~ month(index(z)) + day(index(z)) + hour(index(z)), FUN = 'mean')
Hope that helps

A little crude but you could
#1) Use the substr function to extract the parts of the date string you want:
date = substr(time(z), 6,16)
#2) Then bind this to the data:
temp = data.frame(z, date)
#3) Make sure the date is a factor:
temp$date = as.factor(temp$date)
#4) And now aggregate:
aggregate(temp$z~temp$date, FUN=mean)
Does this give you the results you were after?

Related

Date Formatting in Time Series Codes

I have a .csv file that looks like this:
Date
Time
Demand
01-Jan-05
6:30
6
01-Jan-05
6:45
3
...
23-Jan-05
21:45
0
23-Jan-05
22:00
1
The days are broken into 15 minute increments from 6:30 - 22:00.
Now, I am trying to do a time series on this, but I am a little lost on the notation of this.
I have the following so far:
library(tidyverse)
library(forecast)
library(zoo)
tp <- read.csv(".csv")
tp.ts <- ts(tp$DEMAND, start = c(), end = c(), frequency = 63)
The frequency I am after is an entire day, which I believe makes the number 63.***
However, I am unsure as to how to notate the dates in c().
***Edit
If the frequency is meant to be observations per a unit of time, and I am trying to observe just (Demand) by the 15 minute time slots (Time) in each day (Date), maybe my Frequency is 1?
***Edit 2
So I think I am struggling with doing the time series because I have a Date column (which is characters) and a Time column.
Since I need the data for Demand at the given hours on the dates, maybe I need to convert the dates to be used in ts() and combine the Date and Time date into a new column?
If I do this, I am assuming this should give me the times I need (6:30 to 22:00) but with the addition of having the date?
However, the data is to be used to predict the Demand for the rest of the month. So maybe the Date is an important variable if the day of the week impacts Demand?
We assume you are starting with tp shown reproducibly in the Note at the end. A complete cycle of 24 * 4 = 96 points should be represented by one unit of time internally. The chron class does that so read it in as a zoo series z with chron time index and then convert that to ts giving ts_ser or possibly leave it as a zoo series depending on what you are going to do next.
library(zoo)
library(chron)
to_chron <- function(date, time) as.chron(paste(date, time), "%d-%b-%y %H:%M")
z <- read.zoo(tp, index = 1:2, FUN = to_chron, frequency = 4 * 24)
ts_ser <- as.ts(z)
Note
tp <- structure(list(Date = c("01-Jan-05", "01-Jan-05"), Time = c("6:30",
"6:45"), Demand = c(6L, 3L)), row.names = 1:2, class = "data.frame")

R How to use a complex function at seasonal period under hydroTSM and xts packages?

I want to calculate the seasonal mean of my parameter values (when x > 0.002). To do this, I use xts::period.apply() to separate the values seasonally. I use the "quarter" period in endpoints(), but the "quarter" period divides the year under four seasons as following:
"January+February+March",
"April+May+June",
"July+August+Septembre",
"October+November+December"
For example:
library(xts)
library(PerformanceAnalytics)
data(edhec)
head(edhec)
edhec_4yr <- edhec["1997/2001"]
ep <- endpoints(edhec_4yr, "quarter")
# mean
period.apply(edhec_4yr, INDEX = ep,
function(x) apply(x,2, function(y) mean(y[y>0.002])))
But for my study, I want my seasonal period divided as following:
"December+January+February",
"March+April+May",
"June+July+August",
"Septembre+October+November"
Can you help me how to change the order months of "quarter" period?
I can use the simple function (mean, max, min) under the hydroTSM package with the following function:
dm2seasonal(edhec_4yr, FUN=mean, season="DJF")
Where:
DJF : December, January, February
MAM : March, April, May
JJA : June, July, August
SON : September, October, November
But I cannot applied the complex function (mean with condition) as the following function:
dm2seasonal(edhec_4yr, season="DJF",
function(x) apply(x,2, function(y) mean(y[y>0.002])))
Can you help me how to improve this function in order to calculate mean value (when x > 0.02) for DJF for example?
The xts::endpoints() function always returns the last observation in a "standard" period, starting from the origin (midnight, 1970-01-01). So it can't easily do what you want.
You can calculate your own period end points by finding the observation on the last day of the last month in each 3-month window. Here's one way to do that with monthly data:
# .indexmon() returns a zero-based month
ep <- which((.indexmon(edhec_4yr) + 1) %in% c(2, 5, 8, 11))
aggfn <- function(x, bound = 0.002, ...) {
apply(x,2, function(y) mean(y[y > bound], ...))
}
period.apply(edhec_4yr, ep, aggfn)
If you have daily data, you need to find the last day of each month your periods end in. You can do that by using .indexmon() to find all months that end each season, then construct an xts object with the locations of all those observations in the original daily data object. Then you can use apply.monthly() and last() to extract the location of the last day of each season-ending month. The resulting object contains the end points you need to pass to period.apply().
data(prices)
prices <- as.xts(prices) # 'prices' is zoo; convert to xts
season_months <- (.indexmon(prices)+1) %in% c(2, 5, 8, 11)
ep_months <- xts(which(season_months), index(prices)[season_months])
ep_seasons <- as.numeric(apply.monthly(ep_months, last))
period.apply(prices, ep_seasons, aggfn)
And I should note that I'm thinking about how to specify end points in a more flexible manner, and I'll make sure to include a way to specify seasons.

Adding quarters to R date

I have a R time series data, where I am calculating the means for all values up to a particular date, and storing this means in the date + 4 quarters. The dates are all month ends. To achieve this, I am looking to increment 4 quarters to a date. My question is how can I add 4 quarters to an R date data-type. An illustration:
a <- as.Date("2006-01-01")
b <- as.Date("2011-01-01")
date_range <- quarter(seq.Date(a, b, by = "quarter"), with_year = TRUE)
> date_range[1] + 1
[1] 2007.1
> date_range[1] + quarter(1)
[1] 2007.1
> date_range[1] + 0.25
[1] 2006.35
One possible way I am thinking is to get year-quarter dates, and then adding 4 to it. But wasn't sure what is the best way to do this?
The problem is that quarters have different lengths. Q1 is shortest because it includes February (though it ties with Q2 in leap years). Things like this make "adding a quarter to a date" poorly defined. Even adding months to a date can be tricky at the ends months - what is 1 month after January 31?
Beginnings of months are more straightforward, and I would recommend you use the 1st day of quarters rather than the last (if you must use a specific date). lubridate provides functions like floor_date() and ceiling_date() to which you can pass unit = "quarter" and they will return the first day of the current or subsequent quarter, respectively. You can also always add months(3) to a day at the beginning of a month, though of course if your intention is to add 4 quarters you may as well just add 1 year.
Just add 12 months or a year instead?
Or if it must be quarters, define yourself a function, like so:
quarters <- function(x) {
months(3*x)
}
and then use it to add to the date sequence:
date_range <- seq.Date(a, b, by = "quarter")
date_range + quarters(4)
Lubridate has a function for quarters already included. This is a much better solution than creating your own function.
https://www.rdocumentation.org/packages/lubridate/versions/1.7.4/topics/quarter
Old answer but to those arriving here, lubridate has a function %m+%that adds months and preserves monthends.
a <- as.Date("2006-01-01")
Add future months worth of dates:
The original poster wanted 4 quarters in future so that will be 12 months.
future_date <- a %m+% months(12)
future_date
[1] "2007-01-01"
You could also do years as the period:
future_date <- a %m+% years(1)
Remove months from date:
Subtract dates with %m-%
If you wanted a date 3 months ago from 1/1/2006:
past_date <- a %m-% months(3)
past_date
[1] "2005-10-01"
Example with dates not at end of months:
mplus will preserve days in month:
as.Date("2022-10-10") %m-% months(3)
[1] "2022-07-10"
For more, see documentation on "Add and subtract months to a date without exceeding the last day of the new month"
Note that other answers that use Date class will give irregularly spaced series and so are unsuitable for time series analysis.
To do this in such a way that time series analyses can be performed and noting the zoo tag on the question, the yearmon class represents year/month as year + fraction where fraction is 0 for Jan, 1/12 for Feb, 2/12 for Mar, ..., 11/12 for Dec. Thus adding 4 quarters is just a matter of adding 1. (Adding x quarters is done by adding x/4.)
library(zoo)
ym <- yearmon(2006) + 0:11/12 # months in 2006
ym + 1 # one year later
Also this converts yearmon objects to end-of-month Date and in the second line Date to yearmon. Using frac = 0 or omitting frac in the first line would convert to beginning of month dates.
d <- as.Date(ym, frac = 1) # d is Date vector of end-of-months
as.yearmon(d) # convert Date vector to yearmon
If your input dates represent quarters then there is also the yearqtr class which represents a year/quarter as year + fraction where fraction is 0, 1/4, 2/4, 3/4 for the 4 quarters of a year. Adding 4 quarters is done by adding 1 (or to add x quarters add x/4).
yq <- as.yearqtr(2006) + 0:3/4 # all quarters in 2006
yq + 1 # one year later
Conversions work similarly to yearmon:
d <- as.Date(ym, frac = 1) # d is Date vector of end-of-quarters
as.yearqtr(d) # convert Date vector to yearqtr

calculating seasonal range in r for a number of years

I have a data frame of daily temperature measurements spanning 20 years. I would like to calculate the annual range in the data series for each year (i.e. end up with 20 values, representing the range for each year). Example data:
begin_date = as.POSIXlt("1990-01-01", tz = "GMT")
dat = data.frame(dt = begin_date + (0:(20*365)) * (86400))
dat = within(dat, {speed = runif(length(dt), 1, 10)})
I was thinking of writing a loop which goes through each year and then calculate the range, but was hoping there was another solution.
I think the best way forward would be to have the maximum and minimum values for each year and then calculate the range from that. Can anyone suggest a method to do this without writing a loop to go through each year individually?
Try
library(dplyr)
dat %>%
group_by(year=year(dt)) %>%
summarise(Range=diff(range(speed)))
Or
library(data.table)
setDT(dat)[, list(Range=diff(range(speed))), year(dt)]
Or
aggregate(speed~cbind(year=year(dt)), dat, function(x) diff(range(x)))

How do I subset every day except the last five days of zoo data?

I am trying to extract all dates except for the last five days from a zoo dataset into a single object.
This question is somewhat related to How do I subset the last week for every month of a zoo object in R?
You can reproduce the dataset with this code:
set.seed(123)
price <- rnorm(365)
data <- cbind(seq(as.Date("2013-01-01"), by = "day", length.out = 365), price)
zoodata <- zoo(data[,2], as.Date(data[,1]))
For my output, I'm hoping to get a combined dataset of everything except the last five days of each month. For example, if there are 20 days in the first month's data and 19 days in the second month's, I only want to subset the first 15 and 14 days of data respectively.
I tried using the head() function and the first() function to extract the first three weeks, but since each month will have a different amount of days according to month or leap year months, it's not ideal.
Thank you.
Here are a few approaches:
1) as.Date Let tt be the dates. Then we compute a Date vector the same length as tt which has the corresponding last date of the month. We then pick out those dates which are at least 5 days away from that:
tt <- time(zoodata)
last.date.of.month <- as.Date(as.yearmon(tt), frac = 1)
zoodata[ last.date.of.month - tt >= 5 ]
2) tapply/head For each month tapply head(x, -5) to the data and then concatenate the reduced months back together:
do.call("c", tapply(zoodata, as.yearmon(time(zoodata)), head, -5))
3) ave Define revseq which given a vector or zoo object returns sequence numbers in reverse order so that the last element corresponds to 1. Then use ave to create a vector ix the same length as zoodata which assigns such reverse sequence numbers to the days of each month. Thus the ix value for the last day of the month will be 1, for the second last day 2, etc. Finally subset zoodata to those elements corresponding to sequence numbers greater than 5:
revseq <- function(x) rev(seq_along(x))
ix <- ave(seq_along(zoodata), as.yearmon(time(zoodata)), FUN = revseq)
z <- zoodata[ ix > 5 ]
ADDED Solutions (1) and (2).
Exactly the same way as in the answer to your other question:
Split dataset by month, remove last 5 days, just add a "-":
library(xts)
xts.data <- as.xts(zoodata)
lapply(split(xts.data, "months"), last, "-5 days")
And the same way, if you want it on one single object:
do.call(rbind, lapply(split(xts.data, "months"), last, "-5 days"))

Resources