I've got an R time series object that is measured in 1 hour intervals.
library(lubridate)
library(timeSeries)
set.seed(100)
c <- Sys.time()
d <- c + hours(1:200)
e <- rnorm(200)
f <- data.frame(d,e)
g <- as.timeSeries(f)
I would like to convert this to a daily time, , I am fine with using the average or value of the data column for this conversion.
The outcome would be a time series object with one entry per day whose value is the average of all the hourly values of that particular day.
How can this be done?
First, take advantage of lubridate package to calculate date:
library(lubridate)
f$date <- floor_date(ymd_hms(f$d), "day")
Then, calculate average for given day with
library(dplyr)
dplyr::group_by(f, date) %>%
dplyr::summarise(avg = mean(e))
And use this for time series.
Related
I am working with a time series that looks something like this:
# making a df with POSIXct datetime sequence with just minutes
#Make reproducible data frame:
set.seed(1234)
datetime <- rep(lubridate::ymd_hm("2016-08-01 15:10"), 60)
# Generate measured value
value <- runif(n = 60, min = 280, max = 1000)
df <- data.frame(datetime, value)
The data is actually recorded at 1 second intervals, but it appears as 60 rows with the same hour and minute with with seconds part always at 00. I want to change it such that each minute has its seconds value increasing at one second intervals. The actual dataset includes many hours of data. Thank you
We can use
df$datetime <- with(df, datetime + seconds(seq_along(datetime)) -1)
I know there has been a lot on this topic already but I can't seem to get what I want working.
I've read:
how to convert data frame into time series in R
Convert data frame with date column to timeseries
As well as several others but can't get it to work.
I have the following df
df <- data.frame(CloseTime = c("2017-09-13 19:15:00","2017-09-13 19:30:00","2017-09-13 19:45:00","2017-09-13 20:00:00","2017-09-13 20:15:00"),
OpenPice = c(271.23,269.50,269.82,269.10,269.50),
HightPrice = c(271.23,269.50,269.82,269.10,269.50),
LowPrice = c(271.23,269.50,269.82,269.10,269.50),
ClosePrice = c(271.23,269.50,269.82,269.10,269.50))
I'd like to convert it into a tsobject, with 15-minute intervals and decompose the time series.
I also read that the zoo package allows you to decompose specific multiple intervals i.e. 15 mins, 1h, 1 day?
Can someone please help. How can I convert this into a ts object and decompose my ts object?
Just for the reproducibility purpose, another toy-example with longer period of time.
df <-
data.frame(
CloseTime = seq(as.POSIXct("2017-09-13 19:15:00"),as.POSIXct("2018-10-20 21:45:00"),by="15 mins"),
ClosePrice1 = cumsum(rnorm(38603)),
ClosePrice2 = cumsum(rnorm(38603)),
ClosePrice3 = cumsum(rnorm(38603))
)
I found it much better to aggregate time series into different intervals using dplyr and lubridate::floor_date. Instead of mean, one can summarise using min, max, first, last. I would recommend stay around the tidyr to keep code readable. Below example converting into 30minutes interval.
library(lubridate); library(dplyr); library(magrittr)
df30m <-
df %>%
group_by( CloseTime = floor_date( CloseTime, "30 mins")) %>%
summarize_all(mean)
Data.frame can be converted to timeseries object such as zoo and than to ts for decomposing purposes.
library(zoo)
df30m_zoo <- zoo( df30m[-1], order.by = df30m$CloseTime )
df30m_ts <- ts(df30m_zoo, start=1, frequency = 2 * pi)
df30m_decomposed <- decompose(df30m_ts)
The points are already 15 minutes apart so assuming that you want a period of 1 day this will convert it. There are 24 * 60 * 60 seconds in a day (which s the period) but you can change the denominator to the number of seconds in a period get a different period. You will need at least two periods of data to decompose it.
library(zoo)
z <- read.zoo(df)
time(z) <- (as.numeric(time(z)) - as.numeric(start(z))) / (24 * 60 * 60)
as.ts(z)
giving:
Time Series:
Start = c(0, 1)
End = c(0, 5)
Frequency = 96
OpenPice HightPrice LowPrice ClosePrice
0.00000000 271.23 271.23 271.23 271.23
0.01041667 269.50 269.50 269.50 269.50
0.02083333 269.82 269.82 269.82 269.82
0.03125000 269.10 269.10 269.10 269.10
0.04166667 269.50 269.50 269.50 269.50
Alhtough not asked for in the question, in another answer the data was converted to 30 minutes. That could readily be done like this:
library(xts) # also loads zoo
z <- read.zoo(df)
to.minutes30(z)
I have a data frame of daily temperature measurements spanning 20 years. I would like to calculate the annual range in the data series for each year (i.e. end up with 20 values, representing the range for each year). Example data:
begin_date = as.POSIXlt("1990-01-01", tz = "GMT")
dat = data.frame(dt = begin_date + (0:(20*365)) * (86400))
dat = within(dat, {speed = runif(length(dt), 1, 10)})
I was thinking of writing a loop which goes through each year and then calculate the range, but was hoping there was another solution.
I think the best way forward would be to have the maximum and minimum values for each year and then calculate the range from that. Can anyone suggest a method to do this without writing a loop to go through each year individually?
Try
library(dplyr)
dat %>%
group_by(year=year(dt)) %>%
summarise(Range=diff(range(speed)))
Or
library(data.table)
setDT(dat)[, list(Range=diff(range(speed))), year(dt)]
Or
aggregate(speed~cbind(year=year(dt)), dat, function(x) diff(range(x)))
I have a two variable dataframe (df) in R of daily sales for a ten year period from 2004-07-09 through 2014-12-31. Not every single date is represented in the ten year period, but pretty much most days Monday through Friday.
My objective is to aggregate sales by quarter, convert to a time series object, and run a seasonal decomposition and other time series forecasting.
I am having trouble with the conversion, as ulitmately I receive a error:
time series has no or less than 2 periods
Here's the structure of my code.
# create a time series object
library(xts)
x <- xts(df$amount, df$date)
# create a time series object aggregated by quarter
q.x <- apply.quarterly(x, sum)
When I try to run
fit <- stl(q.x, s.window = "periodic")
I get the error message
series is not periodic or has less than two periods
When I try to run
q.x.components <- decompose(q.x)
# or
decompose(x)
I get the error message
time series has no or less than 2 periods
So, how do I take my original dataframe, with a date variable and an amount variable (sales), aggregate that quarterly as a time series object, and then run a time series analysis?
I think I was able to answer my own question. I did this. Can anyone confirm if this structure makes sense?
library(lubridate)
# add a new variable indicating the calendar year.quarter (i.e. 2004.3) of each observation
df$year.quarter <- quarter(df$date, with_year = TRUE)
library(plyr)
# summarize gift amount by year.quarter
new.data <- ddply(df, .(year.quarter), summarize,
sum = round(sum(amount), 2))
# convert the new data to a quarterly time series object beginning
# in July 2004 (2004, Q3) and ending in December 2014 (2014, Q4)
nd.ts <- ts(new.data$sum, start = c(2004,3), end = c(2014,4), frequency = 4)
I have a chunk of data logging temperatures from a few dozen devices every hour for over a year. The data are stored as a zoo object. I'd very much like to summarize those data by looking at the average values for every one of the 24 hours in a day (1am, 2am, 3am, etc.). So that for each device I can see what its average value is for all the 1am times, 2am times, and so on. I can do this with a loop but sense that there must be a way to do this in zoo with an artful use of aggregate.zoo. Any help?
require(zoo)
# random hourly data over 30 days for five series
x <- matrix(rnorm(24 * 30 * 5),ncol=5)
# Assign hourly data with a real time and date
x.DateTime <- as.POSIXct("2014-01-01 0100",format = "%Y-%m-%d %H") +
seq(0,24 * 30 * 60 * 60, by=3600)
# make a zoo object
x.zoo <- zoo(x, x.DateTime)
#plot(x.zoo)
# what I want:
# the average value for each series at 1am, 2am, 3am, etc. so that
# the dimensions of the output are 24 (hours) by 5 (series)
# If I were just working on x I might do something like:
res <- matrix(NA,ncol=5,nrow=24)
for(i in 1:nrow(res)){
res[i,] <- apply(x[seq(i,nrow(x),by=24),],2,mean)
}
res
# how can I avoid the loop and write an aggregate statement in zoo that
# will get me what I want?
Calculate the hour for each time point and then aggregate by that:
hr <- as.numeric(format(time(x.zoo), "%H"))
ag <- aggregate(x.zoo, hr, mean)
dim(ag)
## [1] 24 5
ADDED
Alternately use hours from chron or hour from data.table:
library(chron)
ag <- aggregate(x.zoo, hours, mean)
This is quite similar to the other answer but takes advantage of the fact the the by=... argument to aggregate.zoo(...) can be a function which will be applied to time(x.zoo):
as.hour <- function(t) as.numeric(format(t,"%H"))
result <- aggregate(x.zoo,as.hour,mean)
identical(result,ag) # ag from G. Grothendieck answer
# [1] TRUE
Note that this produces a result identical to the other answer, not not the same as yours. This is because your dataset starts at 1:00am, not midnight, so your loop produces a matrix wherein the 1st row corresponds to 1:00am and the last row corresponds to midnight. These solutions produce zoo objects wherein the first row corresponds to midnight.