I have a .csv file that looks like this:
Date
Time
Demand
01-Jan-05
6:30
6
01-Jan-05
6:45
3
...
23-Jan-05
21:45
0
23-Jan-05
22:00
1
The days are broken into 15 minute increments from 6:30 - 22:00.
Now, I am trying to do a time series on this, but I am a little lost on the notation of this.
I have the following so far:
library(tidyverse)
library(forecast)
library(zoo)
tp <- read.csv(".csv")
tp.ts <- ts(tp$DEMAND, start = c(), end = c(), frequency = 63)
The frequency I am after is an entire day, which I believe makes the number 63.***
However, I am unsure as to how to notate the dates in c().
***Edit
If the frequency is meant to be observations per a unit of time, and I am trying to observe just (Demand) by the 15 minute time slots (Time) in each day (Date), maybe my Frequency is 1?
***Edit 2
So I think I am struggling with doing the time series because I have a Date column (which is characters) and a Time column.
Since I need the data for Demand at the given hours on the dates, maybe I need to convert the dates to be used in ts() and combine the Date and Time date into a new column?
If I do this, I am assuming this should give me the times I need (6:30 to 22:00) but with the addition of having the date?
However, the data is to be used to predict the Demand for the rest of the month. So maybe the Date is an important variable if the day of the week impacts Demand?
We assume you are starting with tp shown reproducibly in the Note at the end. A complete cycle of 24 * 4 = 96 points should be represented by one unit of time internally. The chron class does that so read it in as a zoo series z with chron time index and then convert that to ts giving ts_ser or possibly leave it as a zoo series depending on what you are going to do next.
library(zoo)
library(chron)
to_chron <- function(date, time) as.chron(paste(date, time), "%d-%b-%y %H:%M")
z <- read.zoo(tp, index = 1:2, FUN = to_chron, frequency = 4 * 24)
ts_ser <- as.ts(z)
Note
tp <- structure(list(Date = c("01-Jan-05", "01-Jan-05"), Time = c("6:30",
"6:45"), Demand = c(6L, 3L)), row.names = 1:2, class = "data.frame")
Related
How to create for every date hourly timestamps?
So for example from 00:00 til 23:59. The result of the function could be 10:00. I read on the internet that loop could work but we couldn't make it fit.
Data sample:
df = data.frame( id = c(1, 2, 3, 4), Date = c(2021-04-18, 2021-04-19, 2021-04-21
07:07:08.000, 2021-04-22))
A few points:
The input shown in the question is not valid R syntax so we assume what we have is the data frame shown reproducibly in the Note at the end.
the question did not describe the specific output desired so we will assume that what is wanted is a POSIXct vector of hourly values which in (1) below we assume is from the first hour of the minimum date to the last hour of the maximum date in the current time zone or in (2) below we assume that we only want hourly sequences for the dates in df also in the current time zone.
we assume that any times in the input should be dropped.
we assume that the id column of the input should be ignored.
No packages are used.
1) This calculates hour 0 of the first date and hour 0 of the day after the last date giving rng. The as.Date takes the Date part, range extracts out the smallest and largest dates into a vector of two components, adding 0:1 adds 0 to the first date leaving it as is and 1 to the second date converting it to the date after the last date. The format ensures that the Dates are converted to POSIXct in the current time zone rather than UTC. Then it creates an hourly sequence from those and uses head to drop the last value since it would be the day after the input's last date.
rng <- as.POSIXct(format(range(as.Date(df$Date)) + 0:1))
head(seq(rng[1], rng[2], "hour"), -1)
2) Another possibility is to paste together each date with each hour from 0 to 23 and then convert that to POSIXct. This will give the same result if the input dates are sequential; otherwise, it will give the hours only for those dates provided.
with(expand.grid(Date = as.Date(df$Date), hour = paste0(0:23, ":00:00")),
sort(as.POSIXct(paste(Date, hour))))
Note
df <- data.frame( id = c(1, 2, 3, 4),
Date = c("2021-04-18", "2021-04-19", "2021-04-21 07:07:08.000", "2021-04-22"))
What is the meaning of frequency below; when I have converted my xts object to ts object and tried printing ts object I got below information.
My data is hourly data. But I could not understand how this below frequency is calculated. I want to make sure my ts object is treating my data as hourly data.
Time Series:
Start = 1
End = 15548401
Frequency = 0.000277777777777778 (how this is equivalent to hourly frequency?)
So, My dataframe looks like below intitally:
y
1484337600 19.22819
1484341200 19.28906
1484344800 19.28228
1484348400 19.21669
1484352000 19.32759
1484355600 19.21833
1484359200 19.20626
1484362800 19.28737
1484366400 19.20651
1484370000 19.18424
It has epoch times and values. Epoch times are row.names in this dataframe.
Now, I converted into xts object using --
xts_dataframe <- xts(x = dataframe$y,
order.by = as.POSIXct(as.numeric(row.names(dataframe)), origin="1970-01-01"))
ts_dataframe <- as.ts(xts_dataframe)
Please suggest what I'm doing wrong? Basically I want to convert my initial dataframe to ts() object as I need to apply ARIMA on it. This data is per hour data. I'm really facing hard time to work with it.
The frequency is equivalent to 1/deltat, where deltat is the fraction of the sampling period between successive observations. ?frequency gives the example that deltat would be "1/12 for monthly data".
In the case of hourly data, deltat is 3600, since there are 3600 seconds in an hour. Since frequency = 1 / deltat, that means frequency = 1 / 3600, or 0.0002777778.
I need to process five years of weekly data. I used the following command to create a time series from that:
my.ts <- ts(x[,3], start = c(2009,12), freq=52)
When plotting the series it looks good. However, the time points of the observations are stored as:
time(my.ts)
# Time Series:
# Start = c(2009, 12)
# End = c(2014, 26)
# Frequency = 52
# [1] 2009.212 2009.231 2009.250 2009.269 2009.288 2009.308 2009.327 ...
I expected to see proper dates instead (which should be aligned with a Calendar). What shall I do?
That is how the "ts" class works.
The zoo package can represent time series with dates (and other indexes):
library(zoo)
z <- zooreg(1:3, start = as.Date("2009-12-01"), deltat = 7)
giving:
> z
2009-12-01 2009-12-08 2009-12-15
1 2 3
> time(z)
[1] "2009-12-01" "2009-12-08" "2009-12-15"
The xts package and a number of other packages can also represent time series with dates although they do it by converting to POSIXct internally whereas zoo maintains the original class.
I have a two variable dataframe (df) in R of daily sales for a ten year period from 2004-07-09 through 2014-12-31. Not every single date is represented in the ten year period, but pretty much most days Monday through Friday.
My objective is to aggregate sales by quarter, convert to a time series object, and run a seasonal decomposition and other time series forecasting.
I am having trouble with the conversion, as ulitmately I receive a error:
time series has no or less than 2 periods
Here's the structure of my code.
# create a time series object
library(xts)
x <- xts(df$amount, df$date)
# create a time series object aggregated by quarter
q.x <- apply.quarterly(x, sum)
When I try to run
fit <- stl(q.x, s.window = "periodic")
I get the error message
series is not periodic or has less than two periods
When I try to run
q.x.components <- decompose(q.x)
# or
decompose(x)
I get the error message
time series has no or less than 2 periods
So, how do I take my original dataframe, with a date variable and an amount variable (sales), aggregate that quarterly as a time series object, and then run a time series analysis?
I think I was able to answer my own question. I did this. Can anyone confirm if this structure makes sense?
library(lubridate)
# add a new variable indicating the calendar year.quarter (i.e. 2004.3) of each observation
df$year.quarter <- quarter(df$date, with_year = TRUE)
library(plyr)
# summarize gift amount by year.quarter
new.data <- ddply(df, .(year.quarter), summarize,
sum = round(sum(amount), 2))
# convert the new data to a quarterly time series object beginning
# in July 2004 (2004, Q3) and ending in December 2014 (2014, Q4)
nd.ts <- ts(new.data$sum, start = c(2004,3), end = c(2014,4), frequency = 4)
I have one question. How to convert that format 20110711201023 of date and time, to the number of hours. This is output of software which I use to image analysis, and I can’t change it. It is very important to define starting Date and Time.
Format: 2011 year, 07 month, 11 day, 20 hour, 10 minute, 23 second.
Example:
Starting Data and Time - 20110709201023,
First Data and Time - 20110711214020
Result = 49,5h.
I have 10000 data in this format so I don't want to do this manually.
I will be very gratefully for any advice.
Best is to first make it a real R time object using strptime:
time_obj = strptime("20110711201023", format = "%Y%m%d%H%M%S")
If you do this with both the start and the end date, you can simply say:
end_time - start_time
to get the difference in seconds, which can easily be converted to number of hours. To convert a whole list of these time strings, simply do:
time_vector = strptime(dat$time_string, format = "%Y%m%d%H%M%S")
where dat is the data.frame with the data, and time_string the column containing the time strings. Note that strptime works also on a vector (it is vectorized). You can also make the new time vector part of dat:
dat$time = strptime(dat$time_string, format = "%Y%m%d%H%M%S")
or more elegantly (at least if you hate $ as much as me :)):
dat = within(dat, { time = strptime(dat$time_string, format = "%Y%m%d%H%M%S") })