What's the best way to aggregate date time by hourly interval? - r

I'm having trouble parsing my date time column that are currently in 'chr' type. I want the date time to be grouped by hour interval and sum corresponding values, then merge two date frames.
a <- c("2016-04-12 12:00:00", "2016-04-12 12:01:00")
b <- c(10, 20)
df_1 <- data.frame(a,b)
names(df_1) <- c('Date', 'Steps')
c1 <- c("4/12/2016 12:00:00 AM", "4/12/2016 05:00:00 PM")
d <- c(20,8)
df_2 <- data.frame(c1,d)
names(df_2) <- c('Date', 'Intensity')
df_1 (with minutes interval) to join df_2 (with hourly interval but the whole day is separated by AM PM)
I have tried converting it using as.POSIXct and ymd to datetime type but it's returning NA values. I tried below code from a post I saw before, it worked but it didn't record the PM time of the day. code below
df_1 <- aggregate(df_1["Steps"],
list(Date=cut(as.POSIXct(df_1$Date), "hour")),
sum)
Also, I wanna remove that AM PM on the second date frame.

While the aggregate for df_1 appears to work fine, for df_2 you need to define the time format, using strptime which converts character objects to "POSIX*t".
aggregate(df_2["Intensity"],
list(Date=cut(strptime(df_2$Date, '%m/%d/%Y %I:%M:%S %p'), "hour")),
sum)
# Date Intensity
# 1 2016-04-12 05:00:00 8
# 2 2016-04-12 12:00:00 20
Explanation:
%m/%d/%Y month, day, year, separated by a slash
the space between date and time
%I:%M:%S hour (12h format), minute, second, separated by a colon
another space
%p the AM/PM indicator
Read ?strptime for different options, since this may also depend on your locale.

Related

I'm getting NA on date column while trying to change date time format to date only in R programming language

I want to merge two data frames in R programming language using the date as the primary key. While trying to change the date, time format on one of the data frames to date only, im getting NA on the date column. Below is the date time format which i want to change to mm dd yy only.
4/12/2016 0:00
This is the code chunk i used.
sleep_day <- sleep_day %>%
rename(date = sleepday) %>%
mutate(date = as.Date(date,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
i am expecting the date column to change from date, time to date alone. ie from mm dd yy 00:00 to mm dd yy. The result i got on the date column is NA in R programming
Your format is not correct:
test <- "4/12/2016 0:00"
as.Date(test,format ="%m/%d/%Y %H" , tz=Sys.timezone())
will work. Look at ?strptime.
As an advice, prefer to work with lubridate library, with has easy-to-use functions, which parse a lot of different formats:
library(lubridate)
mdy_hm(test)
"2016-04-12 UTC"

Convert date time in R to date time for time series

I have this dataframe where DT is in char, i would like to convert it into date time format in R so that i can plot a time series
DT Name
12-12-21 1:30 James
01-01-22 12:30 Job
03-02-22 1:00 Seth
03-02-22 1:14 Michael
I explored the following code
time <- as.POSIXct(dataframe$DT, format="%m-%d-%Y %H:%M")
but it returned year as 01-01-0021 instead of 01-01-2021, may i know how could i specify the year so that it could be read as 2021?
Instead of %Y, use %y. :
time <- as.POSIXct(dataframe$DT, format="%m-%d-%y %H:%M")
A possible solution
time<-as.POSIXct(dataframe$DT,format="%m/%d/%y %H:%M")
or
# install.packages("hms")
library(hms)
time <- as.hms(dataframe$DT)

How to add days of the week to the existing date's column in Rstudio

For my analysis, I need to add days of the week to the dates in Rstudio. My data starts from the first day of January ( 2019-01-01 00:00:00) and time steps are 5 minutes therefore, the second term is "2019-01-01 00:05:00 to the last day of the year. Unfortunately, some rows are missing and for example, in one case next reading after 2019-05-01 10:05:00 can be 2019-05-17 23:05:00. How can I assign days of the week to my dates?
How's this work? First, we convert the date into the magic R format, and then extract it back in the format we want.
posix <- strptime(x = "2019-01-01 00:00:00",
format = "%Y-%m-%d %H:%M:%S")
strftime(posix, format = "%Y-%m-%d %H:%M:%S %A")
Here, we're essentially exporting it in the exact same format, plus the %A operator that corresponds to the weekday.

How to conduct timeseries analysis on half-hourly data?

I have the dataset below with half hourly timeseries data.
Date <- c("2018-01-01 08:00:00", "2018-01-01 08:30:00",
"2018-01-01 08:59:59","2018-01-01 09:29:59")
Volume <- c(195, 188, 345, 123)
Dataset <- data.frame(Date, Volume)
I would like to know how to read this dataframe in order to conduct time series analysis. How should I define starting and ending date and what the frequency will be?
I'm not sure what you exactly mean by "half hour data" since it isn't. In case you want to round it to half hours, we can adapt this solution to your case.
Dataset$Date <- as.POSIXlt(round(as.double(Dataset$Date)/(30*60))*(30*60),
origin=(as.POSIXlt('1970-01-01')))
In case you don't want to round it just do
Dataset$Date <- as.POSIXct(Dataset$Date)
Basically your Date column should be formatted to a date format, e.g. "POSIXlt" so that e.g.:
> class(Dataset$Date)
[1] "POSIXlt" "POSIXt"
Then we can convert the data into time series with xts.
library(xts)
Dataset.xts <- xts(Dataset$Volume, order.by=Dataset$Date)
Result (rounded case):
> Dataset.xts
[,1]
2018-01-01 08:00:00 195
2018-01-01 08:30:00 188
2018-01-01 09:00:00 345
2018-01-01 09:30:00 123
you can use dplyr and lubridate from tidyverse to get the data into a POSIX date format, then convert to time series with ts. Within that you can define parameters.
Dataset2 <- Dataset %>%
mutate(Date = as.character(Date),
Date = ymd_hms(Date)) %>%
ts(start = c(2018, 1), end = c(2018, 2), frequency = 1)
try ?ts for more details on the parameters. Personally I think zoo and xts provide a better framework for time series analysis.

Extract Time and date from POSIXct

I have a vector with DateTime character ("2014-04-17 23:33:00") and want to make a matrix with date and time as my columns.
This is my code:
dat <- as.POSIXct(dates)
date = data.frame(
date=dat,
time=format(dat, "%H:%M")
)
I took a look at extract hours and seconds from POSIXct for plotting purposes in R and it helped, but the problem is that I only get 00:00 as the time in my time column. It does not extract the time from the dates vector.
Any help is appreciated.
Using the following vector as an example:
dates<- c("2012-02-06 15:47:00","2012-02-06 15:02:00")
dat <- as.POSIXct(dates)
date.df = data.frame(
date=dat,
time=format(dat, "%H:%M")
)
You will obtain the correct times ("%H:%M")
> date.df
date time
1 2012-02-06 15:47:00 15:47
2 2012-02-06 15:02:00 15:02

Resources