Creating a dummy variable for certain hours of the day - r

i need some help. I'm currently trying to fit a linear model to hourly electricity prices. So, I was thinking of creating a dummy, which takes the value 1, if the hour of the day is between 06:00 and 20:00. Unfortunately, I have struggled so far.
time.cet <- as.POSIXct(time.numeric, origin = "1970-01-01", tz=local.time.zone)
hours.S <- strftime(time.cet, format = "%H:%M:%S", tz=local.time.zone)
head(time.cet)
[1] "2007-01-01 00:00:00 CET" "2007-01-01 01:00:00 CET" "2007-01-01 02:00:00 CET"
[4] "2007-01-01 03:00:00 CET" "2007-01-01 04:00:00 CET" "2007-01-01 05:00:00 CET"
I, hope someone can help.

When I do time cutoffs I like to make the cutoffs as objects. This way, if you need to change the cutoffs, it's much easier to change the object's value instead of the value in the conditional statements.
My code below uses lubridate(), which is a great package for managing time/dates.
My code below should give you the info you need to incorporate a dummy variable into your analysis.
###
### Load Package
###
library(lubridate)
###
### Designate Time Cut-Offs
###
Beginning <- hms("06:00:00")
End <- hms("20:00:00")
###
### Designate Test Cut-Offs
###
Test.1 <- hms("5:00:00")
Test.2 <- hms("11:00:00")
###
### Test Conditional Logic
###
### Value will be 1 if time is between, value will be 0 if it is not.
###
ifelse( ((Test.1 >= Beginning) & (Test.1 <= End)) , 1, 0)
########## This should (and does) return a 0
ifelse( ((Test.2 >= Beginning) & (Test.2 <= End)) , 1, 0)
####### This should (and does) return a 1
###
### Create New Variable On Previous Data Frame (Your.DF) named Time.Dummy
###
### Value for new variable will be 1 if time is between, value will be 0 if it is not.
###
Your.DF$Time.Dummy <- ifelse( ((time.cet >= Beginning) & (time.cet <= End)) , 1, 0)

ifelse() statements are a convenient way to create a dummy variable. I don't know much about working with time personally, but creating a dummy variable would take a form similar to:
dummy <- with(data, ifelse(time > 06:00 & time < 20:00, 1, 0)
Where data is whatever your data is called, and time is the column that your times are stored in. You may need to play around with the conditions a little bit if the times don't behave like normal numeric vectors (which I assume for this purpose they will).

library(lubridate)
# Create fake data
set.seed(2)
dat = data.frame(time = seq(ymd_hms("2016-01-01 00:00:00"), ymd_hms("2016-01-31 00:00:00"), by="hour"))
dat$price = 1 + cumsum(rnorm(nrow(dat), 0, 0.01))
# Create time dummy
dat$dummy = ifelse(hour(dat$time) >=6 & hour(dat$time) <= 20, 1, 0)

Try to include reproducible code next time. Looks like you're missing time.numeric for instance.
Okay, I had to make up some random times.
time.cet <- c( ymd_hms( "2007-01-01 00:00:00" ),
ymd_hms( "2007-01-01 06:00:00" ),
ymd_hms( "2007-01-01 12:00:00" ) )
time.cet
[1] "2006-12-31 18:00:00 CST" "2007-01-01 00:00:00 CST" "2007-01-01 06:00:00 CST"
Note a time zone issue, which is unimportant to the solution.
You can use dplyr::between and lubridate::hour to get a list of TRUE/FALSE (or 1/0) for whether X time is between A & B.
library(dplyr)
library(lubridate)
A <- 6
B <- 20
between( hour(time.cet), A, B )
[1] TRUE FALSE TRUE
Note that between is inclusive >= & <=

Related

How to convert length of time to numeric in R?

I have a data frame with the amount of time it takes to do a lap and I'm trying to separate that into individual data frames for each driver.
These time values look like this, being in minutes:seconds.milliseconds, except for the first lap which has a Colon in between seconds and milliseconds.
13:14:50 1:28.322 1:24.561 1:23.973 1:23.733 1:24.752
I'd like to have these in a separate data frame in a seconds format like this.
794.500 88.322 84.561 83.973 83.733 84.752
When I convert this to a numeric it gives the following values.
214 201 174 150 133 183
And when I use strptime or POSIXlt it gives me huge values which are also wrong, even when I use the format codes. However, I subtracted 2 values to find that the time difference was correct, and through that I found that were all off by 1609164020. Also, these values ignore the decimal values which I need.
You can use POSIXlt in conjunction with a conversion to seconds.
First, add a date to your first time element:
ds <- c("13:14:50", "1:28.322", "1:24.561", "1:23.973", "1:23.733", "1:24.752")
ds[1] <- paste( Sys.Date(), ds[1] )
#[1] "2020-12-29 13:14:50" "1:28.322" "1:24.561"
#[4] "1:23.973" "1:23.733" "1:24.752"
Create a function to convert the subsequent minutes:seconds.milliseconds to seconds.milliseconds:
to_sec <- function(x){ as.numeric(sub( ":.*","", x )) * 60 +
as.numeric( sub( ".*:","", x ) ) }
Convert the vector to dates that enable calculation of time differences:
ds[2:6] <- to_sec(ds[2:6])
ds[2:6] <- cumsum(ds[2:6])
dv <- c( as.POSIXlt(ds[1]), as.POSIXlt(ds[1]) + as.numeric(ds[2:6]) )
# [1] "2020-12-29 13:14:50 CET" "2020-12-29 13:16:18 CET"
# [3] "2020-12-29 13:17:42 CET" "2020-12-29 13:19:06 CET"
# [5] "2020-12-29 13:20:30 CET" "2020-12-29 13:21:55 CET"
dv[6] - dv[1]
# Time difference of 7.089017 mins

How can I alter the date of a datetime vector depending on the time with lubridate?

I am trying to manipulate a date inside a datetime vector depending on time of day.
Each item in the vector newmagic looks something like this "2020-03-05 02:03:54 UTC"
For all the items that have a time between 19:00 and 23:59 I want to go back one day.
I tried writing an if statement:
if(hour(newmagic)>=19&hour(newmagic)<=23){
date(newmagic)<-date(newmagic)-1
}
giving me no output but
Warning message: In if (hour(newmagic) >= 19 & hour(newmagic) <= 23) {
: the condition has length > 1 and only the first element will be
used
when I limit the data to the condition and simply execute date()-1
newmagic[hour(newmagic)>=19&hour(newmagic)<=23&!is.na(newmagic)] <- date(newmagic[hour(newmagic)>=19&hour(newmagic)<=23&!is.na(newmagic)])-1
The output does remove 1 day but also sets the time to 0
Original:
"2020-03-07 20:58:00 UTC"
After date()-1
"2020-03-06 00:00:00 UTC"
I don't really know how to go on.
How can I adapt the if statement so that it will actually do what I intend to?
How can I rewrite the limitation in the second approach so that the time itself will stay intact?
Thank you for the help
You can try out this in your original data set. I have used lubridate and tidyverse
package. Initially I have split the data frame into date and time. Then I have converted the variables into date and time format and used the ifelse condition.
The code and the output is as follows:-
library(tidyverse)
library(lubridate)
ab <- data.frame(ymd_hms(c("2000-11-01 2:23:15", "2028-03-25 20:47:51",
"1990-05-14 22:45:30")))
colnames(ab) <- paste(c("Date_time"))
ab <- ab %>% separate(Date_time, into = c("Date", "Time"),
sep = " ", remove = FALSE)
ab$Date <- as.Date(ab$Date)
ab$Time <- hms(ab$Time)
ab$date_condition <- ifelse(hour(ab$Time) %in% c(19,20,21,22,23),
ab$date_condition <- ab$Date -1,
ab$date_condition <- ab$Date)
ab$date_condition <- as.Date(ab$date_condition, format = "%Y-%m-%d",
origin = "1970-01-01")
ab
# Date_time Date Time date_condition
1 2000-11-01 02:23:15 2000-11-01 2H 23M 15S 2000-11-01
2 2028-03-25 20:47:51 2028-03-25 20H 47M 51S 2028-03-24
3 1990-05-14 22:45:30 1990-05-14 22H 45M 30S 1990-05-13

Cannot Correctly Subset R DF

I have pulled some data via sql into a dataframe. I am now trying to subset such data and have had no luck.
I wish to loop through each row and identify the previous hour, after which I wish to select a subset of the DF where date == previous hour. (I understand there are other ways of doing this however i wish to understand why this isn't working). When I do this it returns an empty df. However If i directly paste the value of previous hour as a string I get the result I desire.
Both variables are POSIXCT and any attempt to convert to character fails. Can someone please tell me what on earth is going on? :S
My code:
for(row in 1:3){
PreviousHour <- as.POSIXct(Data$mydate[row] - hours(1), tz = "UTC")
Date <- Data$mydate[row]
print(c(Data$mydate[row],PreviousHour))
#"2019-11-20 23:00:00 GMT" "2019-11-20 22:00:00 GMT"
print(Data$mydate[row] == PreviousHour)
#FALSE
print(subset(Data,Data$mydate == PreviousHour))
# A tibble 0x5
print(subset(Data,Data$mydate == "2019-11-20 22:00:00 GMT"))
# A tibble 1x5
}
Code if I manually create the df (This works):
mydate <- c(as.POSIXct("2019-11-20 22:00:00", tz = "UTC"),as.POSIXct("2019-11-20 21:00:00", tz = "UTC"))
Data <- data.frame(mydate)
for(row in 1:1){
PreviousHour <- as.POSIXct(Data$mydate[row] - hours(1), tz = "UTC")
Date <- Data$mydate[row]
print(c(Data$mydate[row],PreviousHour))
#"2019-11-20 22:00:00 GMT" "2019-11-20 21:00:00 GMT"
print(Data$mydate[row] == PreviousHour)
#FALSE
print(subset(Data,Data$mydate == PreviousHour))
# A tibble 1x1
}

Round time by X hours in R?

While doing predicting modeling on timestamped data, I want to write a function in R (possibly using data.table) that rounds the date by X number of hours. E.g. rounding by 2 hours should give this:
"2014-12-28 22:59:00 EDT" becomes "2014-12-28 22:00:00 EDT"
"2014-12-28 23:01:00 EDT" becomes "2014-12-29 00:00:00 EDT"
It's very easy to do when you round by 1 hour - using round.POSIXt(.date, "hour") function.
Writing a generic function, like I'm doing below using multiple if statements, becomes quite ugly however:
d7.dateRoundByHour <- function (.date, byHours) {
if (byHours == 1)
return (round.POSIXt(.date, "hour"))
hh = hour(.date); dd = mday(.date); mm = month(.date); yy = year(.date)
hh = round(hh/byHours,digits=0) * byHours
if (hh>=24) {
hh=0; dd=dd+1
}
if ((mm==2 & dd==28) |
(mm %in% c(1,3,5,7,8,10,12) & dd==31) |
(mm %in% c(2,4,6,9,11) & dd==30)) { # NB: it won't work on 29 Feb leap year.
dd=1; mm=mm+1
}
if (mm==13) {
mm=1; yy=yy+1
}
str = sprintf("%i-%02.0f-%02.0f %02.0f:%02.0f:%02.0f EDT", yy,mm,dd, hh,0,0)
as.POSIXct(str, format="%Y-%m-%d %H:%M:%S")
}
Anyone can show a better way to do that?
(perhaps by converting to numeric and back to POSIXt or some other POSIXt functions?)
Use the round_date function from the lubridate package. Assuming you had a data.table with a column named date you could do the following:
dt[, date := round_date(date, '2 hours')]
A quick example will give you exactly the results you were looking for:
x <- as.POSIXct("2014-12-28 22:59:00 EDT")
round_date(x, '2 hours')
This is actually really easy with just base R. The basic idea for round by "odd lots" that you
scale down by an appropriate scale factor
round down to integer in the downscaled unit
scale back up and re-convert
Or in two R code statements:
R> pt <- as.POSIXct(c("2014-12-28 22:59:00", "2014-12-28 23:01:00 EDT"))
R> pt # just to check
[1] "2014-12-28 22:59:00 CST" "2014-12-28 23:01:00 CST"
R>
R> scalefactor <- 60*60*2 # 2 hours of 60 minutes times 60 seconds
R>
R> as.POSIXct(round(as.numeric(pt)/scalefactor) * scalefactor, origin="1970-01-01")
[1] "2014-12-28 22:00:00 CST" "2014-12-29 00:00:00 CST"
R>
The key last line just does what I outlined: convert the POSIXct to a numeric representation, scales it down, then rounds before scaling back up and converting to a POSIXct again.

Length of lubridate interval

What's the best way to get the length of time represented by an interval in lubridate, in specified units? All I can figure out is something like the following messy thing:
> ival
[1] 2011-01-01 03:00:46 -- 2011-10-21 18:33:44
> difftime(attr(ival, "start") + as.numeric(ival), attr(ival, "start"), 'days')
Time difference of 293.6479 days
(I also added this as a feature request at https://github.com/hadley/lubridate/issues/105, under the assumption that there's no better way available - but maybe someone here knows of one.)
Update - apparently the difftime function doesn't handle this either. Here's an example.
> (d1 <- as.POSIXct("2011-03-12 12:00:00", 'America/Chicago'))
[1] "2011-03-12 12:00:00 CST"
> (d2 <- d1 + days(1)) # Gives desired result
[1] "2011-03-13 12:00:00 CDT"
> (i2 <- d2 - d1)
[1] 2011-03-12 12:00:00 -- 2011-03-13 12:00:00
> difftime(attr(i2, "start") + as.numeric(i2), attr(i2, "start"), 'days')
Time difference of 23 hours
As I mention below, I think one nice way to handle this would be to implement a /.interval function that doesn't first cast its input to a period.
The as.duration function is what lubridate provides. The interval class is represented internally as the number of seconds from the start, so if you wanted the number of hours you could simply divide as.numeric(ival) by 3600, or by (3600*24) for days.
If you want worked examples of functions applied to your object, you should provide the output of dput(ival). I did my testing on the objects created on the help(duration) page which is where ?interval sent me.
date <- as.POSIXct("2009-03-08 01:59:59") # DST boundary
date2 <- as.POSIXct("2000-02-29 12:00:00")
span <- date2 - date #creates interval
span
#[1] 2000-02-29 12:00:00 -- 2009-03-08 01:59:59
str(span)
#Classes 'interval', 'numeric' atomic [1:1] 2.85e+08
# ..- attr(*, "start")= POSIXct[1:1], format: "2000-02-29 12:00:00"
as.duration(span)
#[1] 284651999s (9.02y)
as.numeric(span)/(3600*24)
#[1] 3294.583
# A check against the messy method:
difftime(attr(span, "start") + as.numeric(span), attr(span, "start"), 'days')
# Time difference of 3294.583 days
This question is really old, but I'm adding an update because this question has been viewed many times and when I needed to do something like this today, I found this page. In lubridate you can now do the following:
d1 <- ymd_hms("2011-03-12 12:00:00", tz = 'America/Chicago')
d2 <- ymd_hms("2011-03-13 12:00:00", tz = 'America/Chicago')
(d1 %--% d2)/dminutes(1)
(d1 %--% d2)/dhours(1)
(d1 %--% d2)/ddays(1)
(d1 %--% d2)/dweeks(1)
Ken, Dividing by days(1) will give you what you want. Lubridate doesn't coerce periods to durations when you divide intervals by periods. (Although the algorithm for finding the exact number of whole periods in the interval does begin with an estimate that uses the interval divided by the analagous number of durations, which might be what you are noticing).
The end result is the number of whole periods that fit in the interval. The warning message alerts the user that it is an estimate because there will be some fraction of a period that is dropped from the answer. Its not sensible to do math with a fraction of a period since we can't modify a clock time with it unless we convert it to multiples of a shorter period - but there won't be a consistent way to make the conversion. For example, the day you mention would be equal to 23 hours, but other days would be equal to 24 hours. You are thinking the right way - periods are an attempt to respect the variations caused by DST, leap years, etc. but they only do this as whole units.
I can't reproduce the error in subtraction that you mention above. It seems to work for me.
three <- force_tz(ymd_hms("2011-03-12 12:00:00"), "")
# note: here in TX, "" *is* CST
(four <- three + days(1))
> [1] "2011-03-13 12:00:00 CDT"
four - days(1)
> [1] "2011-03-12 12:00:00 CST"
Be careful when divinding time in seconds to obtain days as then you are no longer working with abstract representations of time but in bare numbers, which can lead to the following:
> date_f <- now()
> date_i <- now() - days(23)
> as.duration(date_f - date_i)/ddays(1)
[1] 22.95833
> interval(date_i,date_f)/ddays(1)
[1] 22.95833
> int_length(interval(date_i,date_f))/as.numeric(ddays(1))
[1] 22.95833
Which leads to consider that days or months are events in a calendar, not time amounts that can be measured in seconds, miliseconds, etc.
The best way to calculate differences in days is avoiding the transformation into seconds and work with days as a unit:
> e <- now()
> s <- now() - days(23)
> as.numeric(as.Date(s))
[1] 18709
> as.numeric(as.Date(e) - as.Date(s))
[1] 23
However, if you are considering a day as a pure 86400 seconds time span, as ddays() does, the previous approach can lead to the following:
> e <- ymd_hms("2021-03-13 00:00:10", tz = 'UTC')
> s <- ymd_hms("2021-03-12 23:59:50", tz = 'UTC')
> as.duration(e - s)
[1] "20s"
> as.duration(e - s)/ddays(1)
[1] 0.0002314815
> as.numeric(as.Date(e) - as.Date(s))
[1] 1
Hence, it depends on what you are looking for: time difference or calendar difference.

Resources