R: aggregating list of time series by period - r

what would be the best way to aggregate several time series together by reference period? Ideally by using ts objects only.
For example, I have two monthly series TS1 and TS2, I want to get TSTOT:
TIME_PERIOD TS1 TS2 TSTOT
2000-01-01 25 25 50
2000-02-01 35 30 65
2000-03-01 40 30 70
I have several ts objects so I could imagin some function working with a list.
Thank you!

If these are ts objects, we can use merge
ts1 <- ts(c(25, 35, 40), start = c(2000, 1), freq = 12)
ts2 <- ts(c(25, 30, 30), start = c(2000, 1), freq = 12)
transform(merge(ts1, ts2, by = "row.names"), TSTOT = x.x + x.y)

Related

R, to view data-frame chicken from the astsa [duplicate]

The output of a time-series looks like a data frame:
ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
Jan Feb Mar Apr May Jun Jul ...
1981 14.064085 21.664250 14.800249 -5.773095 16.477470 1.129674 16.747669 ...
1982 23.973620 17.851890 21.387944 28.451552 24.177141 25.212271 19.123179 ...
1983 19.801210 11.523906 8.103132 9.382778 4.614325 21.751529 9.540851 ...
1984 15.394517 21.021790 23.115453 12.685093 -2.209352 28.318686 10.159940 ...
1985 20.708447 13.095117 32.815273 9.393895 19.551045 24.847337 18.703991 ...
It would be handy to transform it into a data frame with columns Jan, Feb, Mar... and rows 1981, 1982, ... and then back. What's the most elegant way to do this?
Here are two ways. The first way creates dimnames for the matrix about to be created and then strings out the data into a matrix, transposes it and converts it to data frame. The second way creates a by list consisting of year and month variables and uses tapply on that later converting to data frame and adding names.
# create test data
set.seed(123)
tt <- ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
1) matrix. This solution requires that we have whole consecutive years
dmn <- list(month.abb, unique(floor(time(tt))))
as.data.frame(t(matrix(tt, 12, dimnames = dmn)))
If we don't care about the nice names it is just as.data.frame(t(matrix(tt, 12))) .
We could replace the dmn<- line with the following simpler line using #thelatemail's comment:
dmn <- dimnames(.preformat.ts(tt))
2) tapply. A more general solution using tapply is the following:
Month <- factor(cycle(tt), levels = 1:12, labels = month.abb)
tapply(tt, list(year = floor(time(tt)), month = Month), c)
Note: To invert this suppose X is any of the solutions above. Then try:
ts(c(t(X)), start = 1981, freq = 12)
Update
Improvement motivated by comments of #latemail below.
Example with the AirPassengers dataset:
Make the data available and check its type:
data(AirPassengers)
class(AirPassengers)
Convert Time-Series into a data frame:
df <- data.frame(AirPassengers, year = trunc(time(AirPassengers)),
month = month.abb[cycle(AirPassengers)])
Redo the creation of the Time-Series object:
tsData = ts(df$AirPassengers, start = c(1949,1), end = c(1960,12), frequency = 12)
Plot the results to ensure correct execution:
components.ts = decompose(tsData)
plot(components.ts)
Try the package "tsbox"
ts = ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
df = ts_df(ts)
str(df)
data.frame: 60 obs. of 2 variables:
time : Date, format: "1981-01-01" "1981-02-01"
value: num 23.15 22.77 5.1 1.05 13.87

Convert time series to Data frame in R [duplicate]

The output of a time-series looks like a data frame:
ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
Jan Feb Mar Apr May Jun Jul ...
1981 14.064085 21.664250 14.800249 -5.773095 16.477470 1.129674 16.747669 ...
1982 23.973620 17.851890 21.387944 28.451552 24.177141 25.212271 19.123179 ...
1983 19.801210 11.523906 8.103132 9.382778 4.614325 21.751529 9.540851 ...
1984 15.394517 21.021790 23.115453 12.685093 -2.209352 28.318686 10.159940 ...
1985 20.708447 13.095117 32.815273 9.393895 19.551045 24.847337 18.703991 ...
It would be handy to transform it into a data frame with columns Jan, Feb, Mar... and rows 1981, 1982, ... and then back. What's the most elegant way to do this?
Here are two ways. The first way creates dimnames for the matrix about to be created and then strings out the data into a matrix, transposes it and converts it to data frame. The second way creates a by list consisting of year and month variables and uses tapply on that later converting to data frame and adding names.
# create test data
set.seed(123)
tt <- ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
1) matrix. This solution requires that we have whole consecutive years
dmn <- list(month.abb, unique(floor(time(tt))))
as.data.frame(t(matrix(tt, 12, dimnames = dmn)))
If we don't care about the nice names it is just as.data.frame(t(matrix(tt, 12))) .
We could replace the dmn<- line with the following simpler line using #thelatemail's comment:
dmn <- dimnames(.preformat.ts(tt))
2) tapply. A more general solution using tapply is the following:
Month <- factor(cycle(tt), levels = 1:12, labels = month.abb)
tapply(tt, list(year = floor(time(tt)), month = Month), c)
Note: To invert this suppose X is any of the solutions above. Then try:
ts(c(t(X)), start = 1981, freq = 12)
Update
Improvement motivated by comments of #latemail below.
Example with the AirPassengers dataset:
Make the data available and check its type:
data(AirPassengers)
class(AirPassengers)
Convert Time-Series into a data frame:
df <- data.frame(AirPassengers, year = trunc(time(AirPassengers)),
month = month.abb[cycle(AirPassengers)])
Redo the creation of the Time-Series object:
tsData = ts(df$AirPassengers, start = c(1949,1), end = c(1960,12), frequency = 12)
Plot the results to ensure correct execution:
components.ts = decompose(tsData)
plot(components.ts)
Try the package "tsbox"
ts = ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
df = ts_df(ts)
str(df)
data.frame: 60 obs. of 2 variables:
time : Date, format: "1981-01-01" "1981-02-01"
value: num 23.15 22.77 5.1 1.05 13.87

calculation of week number in R

I have found many answer regarding the week number of a particular date. What I want is to get a week number for 2 years i.e for first year it will give 1 to 53 weeks and then keep the count from 53 only and should not start with 1 again. Is it possible in R?. Example data is shown below:
We can use rep to add 53 to the vector ('vN2') after finding the number of observations for each year.
vN2 + rep(c(0, 53), tapply(vN2, cumsum(c(TRUE, diff(vN2) < 0)), FUN = length))
data
set.seed(24)
vN <- rep(1:53, sample(1:5, 53, replace=TRUE))
vN1 <- rep(1:53, sample(1:6, 53, replace=TRUE))
vN2 <- c(vN, vN1)

Calculate number of days between two dates in r

I need to calculate the number of days elapsed between multiple dates in two ways and then output those results to new columns: i) number of days that has elapsed as compared to the first date (e.g., RESULTS$FIRST) and ii) between sequential dates (e.g., RESULTS$BETWEEN). Here is an example with the desired results. Thanks in advance.
library(lubridate)
DATA = data.frame(DATE = mdy(c("7/8/2013", "8/1/2013", "8/30/2013", "10/23/2013",
"12/16/2013", "12/16/2015")))
RESULTS = data.frame(DATE = mdy(c("7/8/2013", "8/1/2013", "8/30/2013", "10/23/2013",
"12/16/2013", "12/16/2015")),
FIRST = c(0, 24, 53, 107, 161, 891), BETWEEN = c(0, 24, 29, 54, 54, 730))
#Using dplyr package
library(dplyr)
df1 %>% # your dataframe
mutate(BETWEEN0=as.numeric(difftime(DATE,lag(DATE,1))),BETWEEN=ifelse(is.na(BETWEEN0),0,BETWEEN0),FIRST=cumsum(as.numeric(BETWEEN)))%>%
select(-BETWEEN0)
DATE BETWEEN FIRST
1 2013-07-08 0 0
2 2013-08-01 24 24
3 2013-08-30 29 53
4 2013-10-23 54 107
5 2013-12-16 54 161
6 2015-12-16 730 891
This will get you what you want:
d <- as.Date(DATA$DATE, format="%m/%d/%Y")
first <- c()
for (i in seq_along(d))
first[i] <- d[i] - d[1]
between <- c(0, diff(d))
This uses the as.Date() function in the base package to cast the vector of string dates to date values using the given format. Since you have dates as month/day/year, you specify format="%m/%d/%Y" to make sure it's interpreted correctly.
diff() is the lagged difference. Since it's lagged, it doesn't include the difference between element 1 and itself, so you can concatenate a 0.
Differences between Date objects are given in days by default.
Then constructing the output dataframe is simple:
RESULTS <- data.frame(DATE=DATA$DATE, FIRST=first, BETWEEN=between)
For the first part:
DATA = data.frame((c("7/8/2013", "8/1/2013", "8/30/2013", "10/23/2013","12/16/2013", "12/16/2015")))
names(DATA)[1] = "V1"
date = as.Date(DATA$V1, format="%m/%d/%Y")
print(date-date[1])
Result:
[1] 0 24 53 107 161 891
For second part - simply use a for loop
You can just add each column with the simple difftime and lagged diff calculations.
DATA$FIRST <- c(0,
with(DATA,
difftime(DATE[2:length(DATE)],DATE[1], unit="days")
)
)
DATA$BETWEEN <- c(0,
with(DATA,
diff(DATE[1:(length(DATE) - 1)], unit="days")
)
)
identical(DATA, RESULTS)
[1] TRUE

hydrological year time series

Currently I am working on a river discharge data analysis. I have the daily discharge record from 1935 to now. I want to extract the annual maximum discharge for each hydrolocial year (start from 01/11 to next year 31/10). However, I found that the hydroTSM package can only deal with the natural year. I tried to use the "zoo" package, but I found it's difficult to compute, as each year have different days. Does anyone have some idea? Thanks.
the data looks like:
01-11-1935 663
02-11-1935 596
03-11-1935 450
04-11-1935 381
05-11-1935 354
06-11-1935 312
my code:
mydata<-read.table("discharge")
colnames(mydata) <- c("date","discharge")
library(zoo)
z<-zooreg(mydata[,2],start=as.Date("1935-11-1"))
mydta$date <- as.POSIXct(dat$date)
q.month<-daily2monthly(z,FUN=max,na.rm = TRUE,date.fmt = "%Y-%m-%d",out.fmt="numeric")
q.month.plain=coredata(q.month)
z.month<-zooreg(q.month.plain,start=1,frequency=12)
With dates stored in a vector of class Date, you can just use cut() and tapply(), like this:
## Example data
df <- data.frame(date = seq(as.Date("1935-01-01"), length = 100, by = "week"),
flow = (runif(n = 100, min = 0, max = 1000)))
## Use vector of November 1st dates to cut data into hydro-years
breaks <- seq(as.Date("1934-11-01"), length=4, by="year")
df$hydroYear <- cut(df$date, breaks, labels=1935:1937)
## Find the maximum flow in each hydro-year
with(df, tapply(flow, hydroYear, max))
# 1935 1936 1937
# 984.7327 951.0440 727.4210
## Note: whenever using `cut()`, I take care to double-check that
## I've got the cuts exactly right
cut(as.Date(c("1935-10-31", "1935-11-01")), breaks, labels=1935:1937)
# [1] 1935 1936
# Levels: 1935 1936 1937
Here is a one-liner to do that.
First convert the dates to "yearmon" class. This class represents a year month as the sum of a year as the integer part and a month as the fractional part (Jan = 0, Feb = 1/12, etc.). Add 2/12 to shift November to January and then truncate to give just the years. Aggregate over those. Although the test data we used starts at the beginning of the hydro year this solution works even if the data does not start on the beginning of the hydro year.
# test data
library(zoo)
z <- zooreg(1:1000, as.Date("2000-11-01")) # test input
aggregate(z, as.integer(as.yearmon(time(z)) + 2/12), max)
This gives:
2001 2002 2003
365 730 1000
Try the xts package, which works together with zoo:
require(zoo)
require(xts)
dates = seq(Sys.Date(), by = 'day', length = 365 * 3)
y = cumsum(rnorm(365 * 3))
serie = zoo(y, dates)
# if you need to specify `start` and `end`
# serie = window(serie, start = "2015-06-01")
# xts function
apply.yearly(serie, FUN = max)

Resources