Convert time series to Data frame in R [duplicate] - r

The output of a time-series looks like a data frame:
ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
Jan Feb Mar Apr May Jun Jul ...
1981 14.064085 21.664250 14.800249 -5.773095 16.477470 1.129674 16.747669 ...
1982 23.973620 17.851890 21.387944 28.451552 24.177141 25.212271 19.123179 ...
1983 19.801210 11.523906 8.103132 9.382778 4.614325 21.751529 9.540851 ...
1984 15.394517 21.021790 23.115453 12.685093 -2.209352 28.318686 10.159940 ...
1985 20.708447 13.095117 32.815273 9.393895 19.551045 24.847337 18.703991 ...
It would be handy to transform it into a data frame with columns Jan, Feb, Mar... and rows 1981, 1982, ... and then back. What's the most elegant way to do this?

Here are two ways. The first way creates dimnames for the matrix about to be created and then strings out the data into a matrix, transposes it and converts it to data frame. The second way creates a by list consisting of year and month variables and uses tapply on that later converting to data frame and adding names.
# create test data
set.seed(123)
tt <- ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
1) matrix. This solution requires that we have whole consecutive years
dmn <- list(month.abb, unique(floor(time(tt))))
as.data.frame(t(matrix(tt, 12, dimnames = dmn)))
If we don't care about the nice names it is just as.data.frame(t(matrix(tt, 12))) .
We could replace the dmn<- line with the following simpler line using #thelatemail's comment:
dmn <- dimnames(.preformat.ts(tt))
2) tapply. A more general solution using tapply is the following:
Month <- factor(cycle(tt), levels = 1:12, labels = month.abb)
tapply(tt, list(year = floor(time(tt)), month = Month), c)
Note: To invert this suppose X is any of the solutions above. Then try:
ts(c(t(X)), start = 1981, freq = 12)
Update
Improvement motivated by comments of #latemail below.

Example with the AirPassengers dataset:
Make the data available and check its type:
data(AirPassengers)
class(AirPassengers)
Convert Time-Series into a data frame:
df <- data.frame(AirPassengers, year = trunc(time(AirPassengers)),
month = month.abb[cycle(AirPassengers)])
Redo the creation of the Time-Series object:
tsData = ts(df$AirPassengers, start = c(1949,1), end = c(1960,12), frequency = 12)
Plot the results to ensure correct execution:
components.ts = decompose(tsData)
plot(components.ts)

Try the package "tsbox"
ts = ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
df = ts_df(ts)
str(df)
data.frame: 60 obs. of 2 variables:
time : Date, format: "1981-01-01" "1981-02-01"
value: num 23.15 22.77 5.1 1.05 13.87

Related

R create a multivariate time series matrix from normalised data.frame for hts()

I have a data.frame that has 3 columns:
telar <- data.frame(
class = c("A","B","A","B"),
date = as.Date(c("2019-01-01", "2019-01-01", "2019-02-01", "2019-02-01")),
number = c(10, 20, 11, 21)
)
The first one contains the class, the second one the date and the third one the date. I want to create a multivariate time series matrix that can be used by the hts function from the hts package. It should be a root node and the rest, leaves of the tree.
The code should look like this:
nodes <- list(length(unique(telar)))
## Here something to create the new time series matrix
my_hts <- hts(new_time_series_matrix, nodes)
Thank you everyone!
new_time_series_matrix <- ts(
select(
dcast(telar, date ~ class), -date
),
start=c(year(telar$date[1]), month(telar$date[1])),
frequency = 12
)
new_time_series_matrix
Output
A B
Jan 2019 10 20
Feb 2019 11 21

R, to view data-frame chicken from the astsa [duplicate]

The output of a time-series looks like a data frame:
ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
Jan Feb Mar Apr May Jun Jul ...
1981 14.064085 21.664250 14.800249 -5.773095 16.477470 1.129674 16.747669 ...
1982 23.973620 17.851890 21.387944 28.451552 24.177141 25.212271 19.123179 ...
1983 19.801210 11.523906 8.103132 9.382778 4.614325 21.751529 9.540851 ...
1984 15.394517 21.021790 23.115453 12.685093 -2.209352 28.318686 10.159940 ...
1985 20.708447 13.095117 32.815273 9.393895 19.551045 24.847337 18.703991 ...
It would be handy to transform it into a data frame with columns Jan, Feb, Mar... and rows 1981, 1982, ... and then back. What's the most elegant way to do this?
Here are two ways. The first way creates dimnames for the matrix about to be created and then strings out the data into a matrix, transposes it and converts it to data frame. The second way creates a by list consisting of year and month variables and uses tapply on that later converting to data frame and adding names.
# create test data
set.seed(123)
tt <- ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
1) matrix. This solution requires that we have whole consecutive years
dmn <- list(month.abb, unique(floor(time(tt))))
as.data.frame(t(matrix(tt, 12, dimnames = dmn)))
If we don't care about the nice names it is just as.data.frame(t(matrix(tt, 12))) .
We could replace the dmn<- line with the following simpler line using #thelatemail's comment:
dmn <- dimnames(.preformat.ts(tt))
2) tapply. A more general solution using tapply is the following:
Month <- factor(cycle(tt), levels = 1:12, labels = month.abb)
tapply(tt, list(year = floor(time(tt)), month = Month), c)
Note: To invert this suppose X is any of the solutions above. Then try:
ts(c(t(X)), start = 1981, freq = 12)
Update
Improvement motivated by comments of #latemail below.
Example with the AirPassengers dataset:
Make the data available and check its type:
data(AirPassengers)
class(AirPassengers)
Convert Time-Series into a data frame:
df <- data.frame(AirPassengers, year = trunc(time(AirPassengers)),
month = month.abb[cycle(AirPassengers)])
Redo the creation of the Time-Series object:
tsData = ts(df$AirPassengers, start = c(1949,1), end = c(1960,12), frequency = 12)
Plot the results to ensure correct execution:
components.ts = decompose(tsData)
plot(components.ts)
Try the package "tsbox"
ts = ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
df = ts_df(ts)
str(df)
data.frame: 60 obs. of 2 variables:
time : Date, format: "1981-01-01" "1981-02-01"
value: num 23.15 22.77 5.1 1.05 13.87

hydrological year time series

Currently I am working on a river discharge data analysis. I have the daily discharge record from 1935 to now. I want to extract the annual maximum discharge for each hydrolocial year (start from 01/11 to next year 31/10). However, I found that the hydroTSM package can only deal with the natural year. I tried to use the "zoo" package, but I found it's difficult to compute, as each year have different days. Does anyone have some idea? Thanks.
the data looks like:
01-11-1935 663
02-11-1935 596
03-11-1935 450
04-11-1935 381
05-11-1935 354
06-11-1935 312
my code:
mydata<-read.table("discharge")
colnames(mydata) <- c("date","discharge")
library(zoo)
z<-zooreg(mydata[,2],start=as.Date("1935-11-1"))
mydta$date <- as.POSIXct(dat$date)
q.month<-daily2monthly(z,FUN=max,na.rm = TRUE,date.fmt = "%Y-%m-%d",out.fmt="numeric")
q.month.plain=coredata(q.month)
z.month<-zooreg(q.month.plain,start=1,frequency=12)
With dates stored in a vector of class Date, you can just use cut() and tapply(), like this:
## Example data
df <- data.frame(date = seq(as.Date("1935-01-01"), length = 100, by = "week"),
flow = (runif(n = 100, min = 0, max = 1000)))
## Use vector of November 1st dates to cut data into hydro-years
breaks <- seq(as.Date("1934-11-01"), length=4, by="year")
df$hydroYear <- cut(df$date, breaks, labels=1935:1937)
## Find the maximum flow in each hydro-year
with(df, tapply(flow, hydroYear, max))
# 1935 1936 1937
# 984.7327 951.0440 727.4210
## Note: whenever using `cut()`, I take care to double-check that
## I've got the cuts exactly right
cut(as.Date(c("1935-10-31", "1935-11-01")), breaks, labels=1935:1937)
# [1] 1935 1936
# Levels: 1935 1936 1937
Here is a one-liner to do that.
First convert the dates to "yearmon" class. This class represents a year month as the sum of a year as the integer part and a month as the fractional part (Jan = 0, Feb = 1/12, etc.). Add 2/12 to shift November to January and then truncate to give just the years. Aggregate over those. Although the test data we used starts at the beginning of the hydro year this solution works even if the data does not start on the beginning of the hydro year.
# test data
library(zoo)
z <- zooreg(1:1000, as.Date("2000-11-01")) # test input
aggregate(z, as.integer(as.yearmon(time(z)) + 2/12), max)
This gives:
2001 2002 2003
365 730 1000
Try the xts package, which works together with zoo:
require(zoo)
require(xts)
dates = seq(Sys.Date(), by = 'day', length = 365 * 3)
y = cumsum(rnorm(365 * 3))
serie = zoo(y, dates)
# if you need to specify `start` and `end`
# serie = window(serie, start = "2015-06-01")
# xts function
apply.yearly(serie, FUN = max)

How do I calculate a monthly rate of change from a daily time series in R?

I'm beginning to get my feet wet with R, and I'm brand new to time series concepts. Can anyone point me in the right direction to calculate a monthly % change, based on a daily data point? I want the change between the first and last data points of each month. For example:
tseries data:
1/1/2000 10.00
...
1/31/2000 10.10
2/1/2000 10.20
...
2/28/2000 11.00
I'm looking for a return data frame of the form:
1/31/2000 .01
2/28/2000 .0784
Ideally, I'd be able to calculate from the endpoint of the prior month to the endpoint of current month, but I'm supposing partitioning by month is easier as a starting point. I'm looking at packages zoo and xts, but am still stuck. Any takers? Thanks...
Here's one way to do it using plyr and ddply.
I use ddply sequentially, first to get the first and last rows of each month, and again to calculate the monthlyReturn.
(Perhaps using xts or zoo might be easier, I am not sure.)
#Using plyr and the data in df
df$Date <- as.POSIXlt(as.Date(df$Date, "%m/%d/%Y"))
df$Month <- (df$Date$mon + 1) #0 = January
sdf <- df[,-1] #drop the Date Column, ddply doesn't like it
library("plyr")
#this function is called with 2 row data frames
monthlyReturn<- function(df) {
(df$Value[2] - df$Value[1])/(df$Value[1])
}
adf <- ddply(sdf, .(Month), function(x) x[c(1, nrow(x)), ]) #get first and last values for each Month
mon.returns <- ddply(adf, .(Month), monthlyReturn)
Here's the data I used to test it out:
> df
Date Value
1 1/1/2000 10.0
2 1/31/2000 10.1
3 2/1/2000 10.2
4 2/28/2000 11.0
5 3/1/2000 10.0
6 3/31/2000 24.1
7 5/10/2000 510.0
8 5/22/2000 522.0
9 6/04/2000 604.0
10 7/03/2000 10.1
11 7/30/2000 7.2
12 12/28/2000 11.0
13 12/30/2000 3.0
> mon.returns
Month V1
1 1 0.01000000
2 2 0.07843137
3 3 1.41000000
4 5 0.02352941
5 6 0.00000000
6 7 -0.28712871
7 12 -0.72727273
Hope that helps.
Here is another way to do this(using the quantmod package):
This calculates the monthly return from the daily price of AAPL.
*library(quantmod) # load the quantmod package
getSymbols("AAPL") # download daily price for stock AAPL
monthlyReturn = periodReturn(AAPL,period="monthly")
monthlyReturn2014 = periodReturn(AAPL,period="monthly",subset='2014:') # for 2014*
This is a pretty old thread, but for reference, here comes a data.table solution using same data as #Ram:
structure(list(Date = structure(c(10957, 10987, 10988, 11015, 11017, 11047, 11087, 11099, 11112, 11141, 11168, 11319, 11321), class = "Date"), Value = c(10, 10.1, 10.2, 11, 10, 24.1, 510, 522, 604, 10.1, 7.2, 11, 3)), .Names = c("Date", "Value"), row.names = c(NA, -13L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x00000000001b0788>)
It's essentially a one-liner that uses the data.table::month function:
library(data.table)
setDT(df)[ , diff(Value) / Value[1], by= .(month(Date))]
This will produce the change, relative to the first recorded day in each month. If the change relative to the last day is preferred, then the expression in the middle should be changed to diff(Value) / Vale[2].
1) no packages Try this:
DF <- read.table(text = Lines)
fmt <- "%m/%d/%Y"
ym <- format(as.Date(DF$V1, format = fmt), "%Y-%m")
ret <- function(x) diff(range(x))/x[1]
ag <- aggregate(V2 ~ ym, DF, ret)
giving:
> ag
ym V2
1 2000-01 0.01000000
2 2000-02 0.07843137
We could convert this to "ts" class, if desired. Assuming no missing months:
ts(ag$V2, start = 2000, freq = 12)
giving:
Jan Feb
2000 0.01000000 0.07843137
2) It's a bit easier if you use the zoo or xts time series packages. fmt and ret are from above:
library(zoo)
z <- read.zoo(text = Lines, format = fmt)
z.ret <- aggregate(z, as.yearmon, ret)
giving:
> z.ret
Jan 2000 Feb 2000
0.01000000 0.07843137
If you already have a data.frame DF then the read.zoo statement could be replaced with z <- read.zoo(DF, format = fmt) or omit the format arg if the first column is of "Date" class.
If "ts" class were desired then use as.ts(z.ret)
Note: The input Lines is:
Lines <- "1/1/2000 10.00
1/31/2000 10.10
2/1/2000 10.20
2/28/2000 11.00"
The ROC function in the TTR package will do this. You can use to.monthly or endpoints() (From daily time series to weekly time series in R xts object) first if you will only be looking at monthly behaviour.
library(TTR)
# data.monthly <- to.monthly( data, indexAt='periodEnd' ) # if OHLC data
# OR
data.monthly <- data[ endpoints(data, on="months", k=1), ]
data.roc <- ROC(data.monthly, n = 1, type = "discrete")

How to get the date of maximum values of rainfall in programming language R

I have a data frame with an year of daily values of rainfall (complete dates in column 1,months in column 2, rainfall in column 3). I am trying to calculate monthly maximum rainfall and I also would like to know the date when the maximum occurred.
I tried the following code:
for (imonth in 1:12) {
month <- which(data[,2]==imonth)
monthly_max[imonth] <- max(data[month,3])
maxi[imonth] <- which.max(data[month,3])
}
tabela <- cbind(monthly_max, maxi)
write.table(tabela, col.names=TRUE, row.names=TRUE, append=FALSE, sep="\t")
The monthly maximum worked perfectly but the which.max function is not working correctly. Is giving me rows that do not correspond to the maximum values of rainfall. Can anybody tell me why or maybe suggest a better way of doing this?
Thank you for helping!
Here is a possible solution using the plyr package
library(plyr)
# create a dummy data frame
df = data.frame(date = sample(LETTERS, 100, replace = T),
month = sample(12, 100, replace = T),
rainfall = sample(1000, 100, replace = F));
# use plyr to figure out max rainfall and date for each month
df.max = ddply(df, .(month), summarize,
max.rain = max(rainfall),
date.max.rain = date[which.max(rainfall)])
Let me know if this works.
EDIT. If there are multiple dates with max rainfall, the code needs to be modified slightly
# find max rainfall for each month
df.max = ddply(df, .(month), transform, max.rain = max(rainfall))
# extract subset such that max.rain = rainfall
df.max = subset(df.max, max.rain == rainfall)
The index function works well here:
library(zoo)
data(AirPassengers)
APZ = zoo(AirPassengers)
ndx = which.max(APZ)
dmax = index(APZ[ndx])
# returns '1960.5' which is Jul 1960 once you know the series freq
frequency(APZ)
# returns 12
I have assumed that you are working with a timeseries object; for those (objects created using eg, ts, zooreg, xts) the dates are actually the value indices. If instead you have a dataframe (ie, so that date is a column in the data frame and the value is another column) then you can just access the row directly.
Edit in light of OP's comment below. For data stored as a data frame:
Suppose your data looks like this, a data frame, D0:
D0[1:10,]
# returns
Time Value
1 2011-03-12 10:48:24 -3.077784
2 2011-03-12 10:49:24 -20.145500
3 2011-03-12 10:50:24 -45.047560
4 2011-03-12 10:51:24 -69.949640
5 2011-03-12 10:52:24 -94.571920
6 2011-03-12 10:53:24 -112.199200
7 2011-03-12 10:54:24 -118.914400
8 2011-03-12 10:55:24 -114.997200
9 2011-03-12 10:56:24 -97.369900
10 2011-03-12 10:57:24 -78.063800
ndx = which.max(D0$Value)
dmax = D0[ndx,] # dmax gives the date corresponding to the max value

Resources