hydrological year time series - r

Currently I am working on a river discharge data analysis. I have the daily discharge record from 1935 to now. I want to extract the annual maximum discharge for each hydrolocial year (start from 01/11 to next year 31/10). However, I found that the hydroTSM package can only deal with the natural year. I tried to use the "zoo" package, but I found it's difficult to compute, as each year have different days. Does anyone have some idea? Thanks.
the data looks like:
01-11-1935 663
02-11-1935 596
03-11-1935 450
04-11-1935 381
05-11-1935 354
06-11-1935 312
my code:
mydata<-read.table("discharge")
colnames(mydata) <- c("date","discharge")
library(zoo)
z<-zooreg(mydata[,2],start=as.Date("1935-11-1"))
mydta$date <- as.POSIXct(dat$date)
q.month<-daily2monthly(z,FUN=max,na.rm = TRUE,date.fmt = "%Y-%m-%d",out.fmt="numeric")
q.month.plain=coredata(q.month)
z.month<-zooreg(q.month.plain,start=1,frequency=12)

With dates stored in a vector of class Date, you can just use cut() and tapply(), like this:
## Example data
df <- data.frame(date = seq(as.Date("1935-01-01"), length = 100, by = "week"),
flow = (runif(n = 100, min = 0, max = 1000)))
## Use vector of November 1st dates to cut data into hydro-years
breaks <- seq(as.Date("1934-11-01"), length=4, by="year")
df$hydroYear <- cut(df$date, breaks, labels=1935:1937)
## Find the maximum flow in each hydro-year
with(df, tapply(flow, hydroYear, max))
# 1935 1936 1937
# 984.7327 951.0440 727.4210
## Note: whenever using `cut()`, I take care to double-check that
## I've got the cuts exactly right
cut(as.Date(c("1935-10-31", "1935-11-01")), breaks, labels=1935:1937)
# [1] 1935 1936
# Levels: 1935 1936 1937

Here is a one-liner to do that.
First convert the dates to "yearmon" class. This class represents a year month as the sum of a year as the integer part and a month as the fractional part (Jan = 0, Feb = 1/12, etc.). Add 2/12 to shift November to January and then truncate to give just the years. Aggregate over those. Although the test data we used starts at the beginning of the hydro year this solution works even if the data does not start on the beginning of the hydro year.
# test data
library(zoo)
z <- zooreg(1:1000, as.Date("2000-11-01")) # test input
aggregate(z, as.integer(as.yearmon(time(z)) + 2/12), max)
This gives:
2001 2002 2003
365 730 1000

Try the xts package, which works together with zoo:
require(zoo)
require(xts)
dates = seq(Sys.Date(), by = 'day', length = 365 * 3)
y = cumsum(rnorm(365 * 3))
serie = zoo(y, dates)
# if you need to specify `start` and `end`
# serie = window(serie, start = "2015-06-01")
# xts function
apply.yearly(serie, FUN = max)

Related

Converting a data frame into TS object in R

I have a dataframe that looks like this:
DAY X1996 X1997
1 1-Jul 98 86
2 2-Jul 97 90
3 3-Jul 97 93
....
I want to end up with a TS object so that I can do HoltWinters smoothing on it. I think I want it to look like this (though I'm not sure because I haven't done HoltWinters before):
Day Year Temp
1-Jul 1996 98
2-Jul 1996 98
3-Jul 1996 98
...
1-Jul 1997 86
2-Jul 1997 90
3-Jul 1997 93
This is what I'm trying to do:
df <- read.delim("temps.txt")
myts <- as.ts(df)
But this doesn't look close to what I'll need to do a Holtwinters model. I've looked all over stackoverflow and the docs for TS and Zoo and I'm stuck on how to create this TS object. A push in the right direction will be much appreciated.
ts objects are normally used with monthly, quarterly or annual data, not daily data; however, if we remove Feb 29th then we can create a ts object whose times are the year plus a fraction 0/365, 1/365, ..., 364/365 which will be regularly spaced if there are no missing dates. The key point is that if the seasonality is based on a year then we must have the same number of points in each year to represent it as a ts object.
First convert to a zoo object z0 having an ordinary Date, remove Feb 29th giving z, create the time index described above in a zoo object zz and then convert that to ts.
library(data.table)
library(lubridate)
library(zoo)
m <- melt(as.data.table(df), id.vars = 1)
z0 <- with(m, zoo(value, as.Date(paste(variable, DAY), "X%Y %d-%b")))
z <- z0[! (month(time(z)) == 2 & day(time(z)) == 29)]
tt <- time(z)
zz <- zoo(coredata(z), year(tt) + (yday(tt) - ((month(tt) > 2) & leap_year(tt)) - 1)/365)
as.ts(zz)
Remove Dec 31 in leap years
Above we removed Feb 29th in leap years but an alternate approach would be to remove Dec 31st in leap years giving slightly simpler code which avoids the need to use leap_year as we can simply remove any day for which yday is 366. z0 is from above.
zz0 <- z0[yday(time(z0)) <= 365]
tt <- time(zz0)
zz <- zoo(coredata(zz0), year(tt) + (yday(tt) - 1) / 365)
as.ts(zz)
Aggregating to Monthly
Another approach would to reduce the data to monthly data. Then it is relatively straightforward since ts has facilities to represent monthly data. Below we used the last point in each month but we could use the mean value or other scalar summary if desired.
ag <- aggregate(z0, as.yearmon, tail, 1) # use last point in each month
as.ts(ag)
Note
df in the question made into a reproducible form is the following (however, we would need to fill it out with more data to avoid generating a ts object with many NAs).
df <- structure(list(DAY = structure(1:3, .Label = c("1-Jul", "2-Jul",
"3-Jul"), class = "factor"), X1996 = c(98L, 97L, 97L), X1997 = c(86L,
90L, 93L)), class = "data.frame", row.names = c("1", "2", "3"
))

How do I calculate a monthly rate of change from a daily time series in R?

I'm beginning to get my feet wet with R, and I'm brand new to time series concepts. Can anyone point me in the right direction to calculate a monthly % change, based on a daily data point? I want the change between the first and last data points of each month. For example:
tseries data:
1/1/2000 10.00
...
1/31/2000 10.10
2/1/2000 10.20
...
2/28/2000 11.00
I'm looking for a return data frame of the form:
1/31/2000 .01
2/28/2000 .0784
Ideally, I'd be able to calculate from the endpoint of the prior month to the endpoint of current month, but I'm supposing partitioning by month is easier as a starting point. I'm looking at packages zoo and xts, but am still stuck. Any takers? Thanks...
Here's one way to do it using plyr and ddply.
I use ddply sequentially, first to get the first and last rows of each month, and again to calculate the monthlyReturn.
(Perhaps using xts or zoo might be easier, I am not sure.)
#Using plyr and the data in df
df$Date <- as.POSIXlt(as.Date(df$Date, "%m/%d/%Y"))
df$Month <- (df$Date$mon + 1) #0 = January
sdf <- df[,-1] #drop the Date Column, ddply doesn't like it
library("plyr")
#this function is called with 2 row data frames
monthlyReturn<- function(df) {
(df$Value[2] - df$Value[1])/(df$Value[1])
}
adf <- ddply(sdf, .(Month), function(x) x[c(1, nrow(x)), ]) #get first and last values for each Month
mon.returns <- ddply(adf, .(Month), monthlyReturn)
Here's the data I used to test it out:
> df
Date Value
1 1/1/2000 10.0
2 1/31/2000 10.1
3 2/1/2000 10.2
4 2/28/2000 11.0
5 3/1/2000 10.0
6 3/31/2000 24.1
7 5/10/2000 510.0
8 5/22/2000 522.0
9 6/04/2000 604.0
10 7/03/2000 10.1
11 7/30/2000 7.2
12 12/28/2000 11.0
13 12/30/2000 3.0
> mon.returns
Month V1
1 1 0.01000000
2 2 0.07843137
3 3 1.41000000
4 5 0.02352941
5 6 0.00000000
6 7 -0.28712871
7 12 -0.72727273
Hope that helps.
Here is another way to do this(using the quantmod package):
This calculates the monthly return from the daily price of AAPL.
*library(quantmod) # load the quantmod package
getSymbols("AAPL") # download daily price for stock AAPL
monthlyReturn = periodReturn(AAPL,period="monthly")
monthlyReturn2014 = periodReturn(AAPL,period="monthly",subset='2014:') # for 2014*
This is a pretty old thread, but for reference, here comes a data.table solution using same data as #Ram:
structure(list(Date = structure(c(10957, 10987, 10988, 11015, 11017, 11047, 11087, 11099, 11112, 11141, 11168, 11319, 11321), class = "Date"), Value = c(10, 10.1, 10.2, 11, 10, 24.1, 510, 522, 604, 10.1, 7.2, 11, 3)), .Names = c("Date", "Value"), row.names = c(NA, -13L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x00000000001b0788>)
It's essentially a one-liner that uses the data.table::month function:
library(data.table)
setDT(df)[ , diff(Value) / Value[1], by= .(month(Date))]
This will produce the change, relative to the first recorded day in each month. If the change relative to the last day is preferred, then the expression in the middle should be changed to diff(Value) / Vale[2].
1) no packages Try this:
DF <- read.table(text = Lines)
fmt <- "%m/%d/%Y"
ym <- format(as.Date(DF$V1, format = fmt), "%Y-%m")
ret <- function(x) diff(range(x))/x[1]
ag <- aggregate(V2 ~ ym, DF, ret)
giving:
> ag
ym V2
1 2000-01 0.01000000
2 2000-02 0.07843137
We could convert this to "ts" class, if desired. Assuming no missing months:
ts(ag$V2, start = 2000, freq = 12)
giving:
Jan Feb
2000 0.01000000 0.07843137
2) It's a bit easier if you use the zoo or xts time series packages. fmt and ret are from above:
library(zoo)
z <- read.zoo(text = Lines, format = fmt)
z.ret <- aggregate(z, as.yearmon, ret)
giving:
> z.ret
Jan 2000 Feb 2000
0.01000000 0.07843137
If you already have a data.frame DF then the read.zoo statement could be replaced with z <- read.zoo(DF, format = fmt) or omit the format arg if the first column is of "Date" class.
If "ts" class were desired then use as.ts(z.ret)
Note: The input Lines is:
Lines <- "1/1/2000 10.00
1/31/2000 10.10
2/1/2000 10.20
2/28/2000 11.00"
The ROC function in the TTR package will do this. You can use to.monthly or endpoints() (From daily time series to weekly time series in R xts object) first if you will only be looking at monthly behaviour.
library(TTR)
# data.monthly <- to.monthly( data, indexAt='periodEnd' ) # if OHLC data
# OR
data.monthly <- data[ endpoints(data, on="months", k=1), ]
data.roc <- ROC(data.monthly, n = 1, type = "discrete")

Handling time with zoo in R

I'm trying to load time series in R with the 'zoo' library.
The observations I have varying precision. Some have the day/month/year, others only month and year, and others year:
02/10/1915
1917
07/1917
07/1918
30/08/2018
Subsequently, I need to aggregate the rows by year, year and month.
The basic R as.Date function doesn't handle that.
How can I model this data with zoo?
Thanks,
Mulone
We use the test data formed from the index data in the question followed by a number:
# test data
Lines <- "02/10/1915 1
1917 2
07/1917 3
07/1918 4
30/08/2018 5"
yearly aggregation
library(zoo)
to.year <- function(x) as.numeric(sub(".*/", "", as.character(x)))
read.zoo(text = Lines, FUN = to.year, aggregate = mean)
The last line returns:
1915 1917 1918 2018
1.0 2.5 4.0 5.0
year/month aggregation
Since year/month aggregation of data with no months makes no sense we first drop the year only data and aggregate the rest:
DF <- read.table(text = Lines, as.is = TRUE)
# remove year-only records. DF.ym has at least year and month.
yr <- suppressWarnings(as.numeric(DF[[1]]))
DF.ym <- DF[is.na(yr), ]
# remove day, if present, and convert to yearmon.
to.yearmon <- function(x) as.yearmon( sub("\\d{1,2}/(\\d{1,2}/)", "\\1", x), "%m/%Y" )
read.zoo(DF.ym, FUN = to.yearmon, aggregate = mean)
The last line gives:
Oct 1915 Jul 1917 Jul 1918 Aug 2018
1 3 4 5
UPDATE: simplifications

Aggregating weekly (7 day) data to monthly in R

I have data measured over a 7 day period. Part of the data looks as follows:
start wk end wk X1
2/1/2004 2/7/2004 89
2/8/2004 2/14/2004 65
2/15/2004 2/21/2004 64
2/22/2004 2/28/2004 95
2/29/2004 3/6/2004 79
3/7/2004 3/13/2004 79
I want to convert this weekly (7 day) data into monthly data using weighted averages of X1. Notice that some of the 7 day X1 data will overlap from one month to the other (X1=79 for the period 2/29 to 3/6 of 2004).
Specifically I would obtain the February 2004 monthly data (say, Y1) the following way
(7*89 + 7*65 + 7*64 + 7*95 + 1*79)/29 = 78.27
Does R have a function that will properly do this? (to.monthly in the xts library DOES NOT do what I need) If, not what is the best way to do this in R?
Convert the data to daily data and then aggregate:
Lines <- "start end X1
2/1/2004 2/7/2004 89
2/8/2004 2/14/2004 65
2/15/2004 2/21/2004 64
2/22/2004 2/28/2004 95
2/29/2004 3/6/2004 79
3/7/2004 3/13/2004 79
"
library(zoo)
# read data into data frame DF
DF <- read.table(text = Lines, header = TRUE)
# convert date columns to "Date" class
fmt <- "%m/%d/%Y"
DF <- transform(DF, start = as.Date(start, fmt), end = as.Date(end, fmt))
# convert to daily zoo series
to.day <- function(i) with(DF, zoo(X1[i], seq(start[i], end[i], "day")))
z.day <- do.call(c, lapply(1:nrow(DF), to.day))
# aggregate by month
aggregate(z.day, as.yearmon, mean)
The last line gives:
Feb 2004 Mar 2004
78.27586 79.00000
If you are willing to get rid of "end week" from your DF, apply.monthly will work like a charm.
DF.xts <- xts(DF$X1, order.by=DF$start_wk)
DF.xts.monthly <- apply.monthly(DF.xts, "sum")
Then you can always recreate end dates if you absolutely need them by adding 30.

Extract Date in R

I struggle mightily with dates in R and could do this pretty easily in SPSS, but I would love to stay within R for my project.
I have a date column in my data frame and want to remove the year completely in order to leave the month and day. Here is a peak at my original data.
> head(ds$date)
[1] "2003-10-09" "2003-10-11" "2003-10-13" "2003-10-15" "2003-10-18" "2003-10-20"
> class((ds$date))
[1] "Date"
I "want" it to be.
> head(ds$date)
[1] "10-09" "10-11" "10-13" "10-15" "10-18" "10-20"
> class((ds$date))
[1] "Date"
If possible, I would love to set the first date to be October 1st instead of January 1st.
Any help you can provide will be greatly appreciated.
EDIT: I felt like I should add some context. I want to plot an NHL player's performance over the course of a season which starts in October and ends in April. To add to this, I would like to facet the plots by each season which is a separate column in my data frame. Because I want to compare cumulative performance over the course of the season, I believe that I need to remove the year portion, but maybe I don't; as I indicated, I struggle with dates in R. What I am looking to accomplish is a plot that compares cumulative performance over relative dates by season and have the x-axis start in October and end in April.
> d = as.Date("2003-10-09", format="%Y-%m-%d")
> format(d, "%m-%d")
[1] "10-09"
Is this what you are looking for?
library(ggplot2)
## make up data for two seasons a and b
a = as.Date("2010/10/1")
b = as.Date("2011/10/1")
a.date <- seq(a, by='1 week', length=28)
b.date <- seq(b, by='1 week', length=28)
## make up some score data
a.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
b.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
## create a data frame
df <- data.frame(a.date, b.date, a.score, b.score)
df
## Since I am using ggplot I better create a "long formated" data frame
df.molt <- melt(df, measure.vars = c("a.score", "b.score"))
levels(df.molt$variable) <- c("First season", "Second season")
df.molt
Then, I am using ggplot2 for plotting the data:
## plot it
ggplot(aes(y = value, x = a.date), data = df.molt) + geom_point() +
geom_line() + facet_wrap(~variable, ncol = 1) +
scale_x_date("Date", format = "%m-%d")
If you want to modify the x-axis (e.g., display format), then you'll probably be interested in scale_date.
You have to remember Date is a numeric format, representing the number of days passed since the "origin" of the internal date counting :
> str(Date)
Class 'Date' num [1:10] 14245 14360 14475 14590 14705 ...
This is the same as in EXCEL, if you want a reference. Hence the solution with format as perfectly valid.
Now if you want to set the first date of a year as October 1st, you can construct some year index like this :
redefine.year <- function(x,start="10-1"){
year <- as.numeric(strftime(x,"%Y"))
yearstart <- as.Date(paste(year,start,sep="-"))
year + (x >= yearstart) - min(year) + 1
}
Testing code :
Start <- as.Date("2009-1-1")
Stop <- as.Date("2011-11-1")
Date <- seq(Start,Stop,length.out=10)
data.frame( Date=as.character(Date),
year=redefine.year(Date))
gives
Date year
1 2009-01-01 1
2 2009-04-25 1
3 2009-08-18 1
4 2009-12-11 2
5 2010-04-05 2
6 2010-07-29 2
7 2010-11-21 3
8 2011-03-16 3
9 2011-07-09 3
10 2011-11-01 4

Resources