How to conduct timeseries analysis on half-hourly data? - r

I have the dataset below with half hourly timeseries data.
Date <- c("2018-01-01 08:00:00", "2018-01-01 08:30:00",
"2018-01-01 08:59:59","2018-01-01 09:29:59")
Volume <- c(195, 188, 345, 123)
Dataset <- data.frame(Date, Volume)
I would like to know how to read this dataframe in order to conduct time series analysis. How should I define starting and ending date and what the frequency will be?

I'm not sure what you exactly mean by "half hour data" since it isn't. In case you want to round it to half hours, we can adapt this solution to your case.
Dataset$Date <- as.POSIXlt(round(as.double(Dataset$Date)/(30*60))*(30*60),
origin=(as.POSIXlt('1970-01-01')))
In case you don't want to round it just do
Dataset$Date <- as.POSIXct(Dataset$Date)
Basically your Date column should be formatted to a date format, e.g. "POSIXlt" so that e.g.:
> class(Dataset$Date)
[1] "POSIXlt" "POSIXt"
Then we can convert the data into time series with xts.
library(xts)
Dataset.xts <- xts(Dataset$Volume, order.by=Dataset$Date)
Result (rounded case):
> Dataset.xts
[,1]
2018-01-01 08:00:00 195
2018-01-01 08:30:00 188
2018-01-01 09:00:00 345
2018-01-01 09:30:00 123

you can use dplyr and lubridate from tidyverse to get the data into a POSIX date format, then convert to time series with ts. Within that you can define parameters.
Dataset2 <- Dataset %>%
mutate(Date = as.character(Date),
Date = ymd_hms(Date)) %>%
ts(start = c(2018, 1), end = c(2018, 2), frequency = 1)
try ?ts for more details on the parameters. Personally I think zoo and xts provide a better framework for time series analysis.

Related

What's the best way to aggregate date time by hourly interval?

I'm having trouble parsing my date time column that are currently in 'chr' type. I want the date time to be grouped by hour interval and sum corresponding values, then merge two date frames.
a <- c("2016-04-12 12:00:00", "2016-04-12 12:01:00")
b <- c(10, 20)
df_1 <- data.frame(a,b)
names(df_1) <- c('Date', 'Steps')
c1 <- c("4/12/2016 12:00:00 AM", "4/12/2016 05:00:00 PM")
d <- c(20,8)
df_2 <- data.frame(c1,d)
names(df_2) <- c('Date', 'Intensity')
df_1 (with minutes interval) to join df_2 (with hourly interval but the whole day is separated by AM PM)
I have tried converting it using as.POSIXct and ymd to datetime type but it's returning NA values. I tried below code from a post I saw before, it worked but it didn't record the PM time of the day. code below
df_1 <- aggregate(df_1["Steps"],
list(Date=cut(as.POSIXct(df_1$Date), "hour")),
sum)
Also, I wanna remove that AM PM on the second date frame.
While the aggregate for df_1 appears to work fine, for df_2 you need to define the time format, using strptime which converts character objects to "POSIX*t".
aggregate(df_2["Intensity"],
list(Date=cut(strptime(df_2$Date, '%m/%d/%Y %I:%M:%S %p'), "hour")),
sum)
# Date Intensity
# 1 2016-04-12 05:00:00 8
# 2 2016-04-12 12:00:00 20
Explanation:
%m/%d/%Y month, day, year, separated by a slash
the space between date and time
%I:%M:%S hour (12h format), minute, second, separated by a colon
another space
%p the AM/PM indicator
Read ?strptime for different options, since this may also depend on your locale.

How to calculate time difference in R using an dataframe

Have an large data frame where there's 2 columns (POSIXct) and need to calculate length of ride.
Dates are formatted as follows:
format: "2020-10-31 19:39:43"
Can use the difftime function, correct?
Thanks
Given your data is using the correct POSIXct format you can simply subtract two dates to get the difference. No need for additional functions.
date1 <- as.POSIXct(strptime("2020-10-31 19:39:43", format = "%Y-%m-%d %H:%M:%OS"))
date2 <- as.POSIXct(strptime("2020-10-31 19:20:43", format = "%Y-%m-%d %H:%M:%OS"))
date1 - date2
Output: Time difference of 19 mins
It depends what output format you want.
For example if you want month difference between two dates, you can use the "interval" function from library "lubridate"
library(lubridate)
interval(as.Date(df$date1),as.Date(df$date2) %/% months(1))
It also works with years, weeks, days, hours

Can I specify the dates and times of a time series in R?

I have a dataset that contains times and dates in the first column, and the stock prices in the second column.
I used the following format.
Time Price
2015-02-01 10:00 50
I want to turn this into a time series object. I tried ts(data) function, but when I plot the data I cannot observe the dates in the x-axis. Also I tried ts(data, start=) function. Because I have some hours with missing prices, and those hours are not included in my data set, if I set start date and frequency, my plot will be misleading.
Here is the sample data that I have. It is called df.
time price
1 2013-05-01 00:00:00 124.30
2 2013-05-01 01:00:00 98.99
3 2013-05-01 02:00:00 64.00
4 2013-05-01 03:00:00 64.00
This is the code that I used
Time1 <- ts(df)
autoplot(Time1)
Also tried this,
Time1 <- zoo(Time_series_data[,2], order.by = Time_series_data[,1])
Time_n <- ts(Time1)
autoplot(Time1)
However, when I plot the graph with autoplot(Time1) the x-axis doesn't show the times that I specified but numbers from 0 to 4. I want to have plot of a ts object that includes the date columns in the x-axis and values in Y
Is there any way to convert it to a time series object in R. Thanks.
Try the following:
Create some data using the nifty tribble function from the tibble package.
library(tibble)
df <- tribble(~time, ~price,
"2013-05-01 00:00:00", 124.30,
"2013-05-01 01:00:00", 98.99,
"2013-05-01 02:00:00", 64.00,
"2013-05-01 03:00:00", 64.00)
The time column is a character class and cannot be plotted in the usual way. So convert it using as.Posixct. I'll use the dplyr package here but that's optional.
library(dplyr)
df <- df %>%
mutate(time=as.POSIXct(time))
Next, convert the data to a time series object. This requires the xts package, although I'm sure there are other options including zoo.
library(xts)
df.ts <- xts(df[, -1], order.by=df$time)
Now you can visualise the data.
plot(df.ts) # This should call the `plot.xts` method
And if you prefer ggplot2.
library(ggplot2)
autoplot(df.ts)

xts - how to subset on each day of the week

I understand similar questions have been answered. My problem is I have a time series data for 2033 days on 15 minutes interval. I would like to plot the series for each day (Mon-Sun). For instance how an average Monday looks like.
I tried to subset by using .indexwday, but the series for the day starts at 13:00.
I am kind of novice, so please let me know if I need to provide additional details.
Sample data (xts)
2008-01-01 00:00:00 16
2008-01-01 00:15:00 56
2008-01-01 00:30:00 136
2008-01-01 00:45:00 170
2008-01-01 01:00:00 132
....
2013-07-25 22:30:00 95
2013-07-25 22:45:00 82
2013-07-25 23:00:00 66
2013-07-25 23:15:00 65
2013-07-25 23:30:00 66
2013-07-25 23:45:00 46
The plot below might make more sense what I want (This is the average of all Mondays)
Here's another solution, which does not depend on packages other than xts and zoo.
# example data
ix <- seq(as.POSIXct("2008-01-01"), as.POSIXct("2013-07-26"), by="15 min")
set.seed(21)
x <- xts(sample(200, length(ix), TRUE), ix)
# aggregate by 15-minute observations for each weekday
a <- lapply(split.default(x, format(index(x), "%A")), # split by weekday
function(x) aggregate(x, format(index(x), "%H:%M"), mean)) # aggregate by 15-min
# merge aggregated data into one zoo object, ordering columns
z <- do.call(merge, a)[,c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")]
# convert index to POSIXct to make plotting easier
index(z) <- as.POSIXct(index(z), format="%H:%M")
# plot
plot(z, type="l", nc=1, ylim=range(z), main="Average daily volume", las=1)
Setting ylim forces each plot to have the same y-axis range. Otherwise they would depend on each individual series, which may make them difficult to compare if the values vary greatly.
Try this:
#Get necessary packages
install.packages("lubridate")
install.packages("magrittr")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("scales")
#Import packages
library(lubridate,warn=F)
library(dplyr,warn=F)
library(magrittr)
library(ggplot2,warn=F)
library(scales, warn=F)
#Getting the data
tstart = as.POSIXct('2008-01-01 00:00:00')
tend = as.POSIXct('2013-07-25 23:45:00')
ttimes <- seq(from = tstart,to=tend,by='15 mins')
tvals <- sample(seq(1,200),length(ttimes),T)
tsdata <- data.frame(Dates=ttimes,Vals=tvals)
tsdata <- tsdata %>% mutate(DayofWeek = wday(Dates,label=T), Hours = as.POSIXct(strftime(Dates,format="%H:%M:%S"),format="%H:%M:%S"))
#Pick a day at a time. I am using Mondays for this example.
tsdata_monday <- tsdata %>% filter(DayofWeek=='Mon') %>% group_by(Hours) %>% summarise(meanVals=mean(Vals)) %>% as.data.frame()
#Plotting the graph of mean values versus times for Monday:
ggplot(tsdata_monday) + aes(x=Hours,y=meanVals) + geom_line() + scale_x_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M"))
#If you want you can go ahead and plot all the days. But please keep in mind
#that this does not look good at all. Too many plots for the plot window to
#Display nicely.
alltsdata <- tsdata %>% group_by(DayofWeek, Hours) %>% summarise(MeanVals=mean(Vals)) %>% as.data.frame()
ggplot(alltsdata) + aes(x=Hours,y=MeanVals) + geom_line() + scale_x_datetime(breaks=date_breaks("4 hour"), labels=date_format("%H:%M")) + facet_grid(.~DayofWeek)
I recommend you plot one day at a time or use a for loop or one of the apply function variations to get the plots.
Also when filtering by day of the week, please keep in mind that the days are shortened as follows:
unique(tsdata$DayofWeek)
[1] Tues Wed Thurs Fri Sat Sun Mon
Hope it helps.
apply.daily does exactly what you want.( assumming your data is called d.xts and a xts-object)
apply.daily(d.xts,sum)
another solution would be using aggregate:
aggregate(d.xts,as.Date(index(d.xts)),sum)
Note that the answers are slightly different: apply.daily starts from start(d.xts) to end(d.xts) whereas aggregate goes by day from midnight to midnight.

Extract Time and date from POSIXct

I have a vector with DateTime character ("2014-04-17 23:33:00") and want to make a matrix with date and time as my columns.
This is my code:
dat <- as.POSIXct(dates)
date = data.frame(
date=dat,
time=format(dat, "%H:%M")
)
I took a look at extract hours and seconds from POSIXct for plotting purposes in R and it helped, but the problem is that I only get 00:00 as the time in my time column. It does not extract the time from the dates vector.
Any help is appreciated.
Using the following vector as an example:
dates<- c("2012-02-06 15:47:00","2012-02-06 15:02:00")
dat <- as.POSIXct(dates)
date.df = data.frame(
date=dat,
time=format(dat, "%H:%M")
)
You will obtain the correct times ("%H:%M")
> date.df
date time
1 2012-02-06 15:47:00 15:47
2 2012-02-06 15:02:00 15:02

Resources