Discretize a date-time variable to "in-hours" and "after-hours" - r

I have date-times like:
x = c("2015-09-12 03:52:00", "2017-06-15 21:37:28", "2017-04-08 20:44:11")
I want to create two categories: If the time is between 6.30pm and 8 am I want to return "after-hours"`, otherwise it returns "in-hours".
I tried to solve this first by extracting the time part, but that converted it to a character which meant, ifelse was not working.
Thank you in advance.

base R
Cheating a little, converting to %H%M as an integer on a 24h clock.
vec <- as.POSIXct(c("2015-09-12 03:52:00", "2017-06-15 21:37:28", "2017-04-08 20:44:11"))
hhmm <- as.integer(format(vec, format = "%H%M"))
ifelse(hhmm < 0800 | hhmm > 1830, "after-hours", "in-hours")
# [1] "after-hours" "after-hours" "after-hours"
lubridate
Similar, but using decimal hours instead of fake-hour/minute.
library(lubridate)
hhmm2 <- hour(vec) + minute(vec)/60
ifelse(hhmm2 < 8 | hhmm2 > 18.5, "after-hours", "in-hours")
# [1] "after-hours" "after-hours" "after-hours"

times_as_char = c("2015-09-12 03:52:00", "2017-06-15 21:37:28", "2017-04-08 20:44:11")
# Converting character to date-time
times_as_datetimes <- lubridate::ymd_hms(times_as_char)
# We can use decimal hours to make time comparisons easier
times_as_hour_dec <- lubridate::hour(times_as_datetimes) +
lubridate::minute(times_as_datetimes)/60
time_status <- ifelse(times_as_hour_dec < 8 | times_as_hour_dec >= 18.5,
"after-hours",
"in hours")

Related

How can I create a Month-Day vector in R?

I want to create a vector in R, which contains month abbreviation and date:
Jan1, Jan2, Jan3, ..., Dec29, Dec30, Dec31.
How can I create such a vector?
I have tried different approaches. Using paste0(month.abb,1:31) gives me a vector containing Jan1,Feb2,Mar3.
I also created 12 different vectors for each month (df_jan <- paste0("Jan",1:31) and so on). Then I attempted to rbind the 12 vectors, but that also doesn't help.
Can you suggest any way to do this?
The solution is dependent on the year of the dates. A leap year would have Feb29 in it.
Here's a function which takes year as an argument and return dates in the required format.
monthday <- function(year) {
format(seq(as.Date(paste0(year, '-01-01')),
as.Date(paste0(year, '-12-31')), by = 'day'), '%b%d')
}
monthday(2021)
# [1] "Jan01" "Jan02" "Jan03" "Jan04" "Jan05" "Jan06" "Jan07" "Jan08" "Jan09"
#...
#[55] "Feb24" "Feb25" "Feb26" "Feb27" "Feb28" "Mar01" "Mar02" "Mar03" "Mar04"
#....
#[361] "Dec27" "Dec28" "Dec29" "Dec30" "Dec31"
monthday(2020)
# [1] "Jan01" "Jan02" "Jan03" "Jan04" "Jan05" "Jan06" "Jan07" "Jan08" "Jan09"
#....
#[55] "Feb24" "Feb25" "Feb26" "Feb27" "Feb28" "Feb29" "Mar01" "Mar02" "Mar03"
#....
#[361] "Dec26" "Dec27" "Dec28" "Dec29" "Dec30" "Dec31"
You can create a sequence of dates and convert it using format For more details on POSIX standard format use ?strptime
whole_year_dates <- seq.Date(as.Date("2021-01-01"), as.Date("2021-12-31"), by = "day")
whole_year_dates_abbr <- format(whole_year_dates, format= "%b%d")

How to create a sequence of *%Year%Week* from numeric?

From my inputs, which is numeric format and represent the year and the week number, I need to create a sequence, from one input to the other.
Inputs example :
input.from <- 202144
input.to <- 202208
Desired output would be :
c(202144:202152, 202201:202208)
According to me, it is a little more complex, because of these constraints :
Years with 53 weeks : I tried lubridate::isoweek(), the %W or %v format, ...
Always keep two digits for the week : I tried "%02d", ...
I also tried to convert my input to date, ...
Anyway, many attemps without success to create my function.
Thanks for your help !
In case it would be useful to someone one day, here is finally the function I wrote, which respects ISO 8601 :
library(ISOweek)
foo <- function(pdeb, pfin) {
from <- ISOweek::ISOweek2date(paste0(substr(pdeb, 1, 4), "-W", substr(pdeb, 5, 6), "-1"))
to <- ISOweek::ISOweek2date(paste0(substr(pfin, 1, 4), "-W", substr(pfin, 5, 6), "-1"))
res <- seq.Date(from, to, by = "week")
return(format(res, format = "%G%V"))
}
foo(201950, 202205)
Step #1 : tranform input to character : YYYY-"W"WW-1
Step #2 : capture the ISOweek
Step #3 : sequence by week
Step #4 : return the sequence to the format "%G%V", still to respect ISO 8601 and YYYYWW
I'd go with
x <- c("202144", "202208")
out <- do.call(seq, c(as.list(as.Date(paste0(x, "1"), format="%Y%U%u")), by = "week"))
out
# [1] "2021-11-01" "2021-11-08" "2021-11-15" "2021-11-22" "2021-11-29" "2021-12-06" "2021-12-13" "2021-12-20" "2021-12-27"
# [10] "2022-01-03" "2022-01-10" "2022-01-17" "2022-01-24" "2022-01-31" "2022-02-07" "2022-02-14" "2022-02-21"
If you really want to keep them in the %Y%W format, then
format(out, format = "%Y%W")
# [1] "202144" "202145" "202146" "202147" "202148" "202149" "202150" "202151" "202152" "202201" "202202" "202203" "202204"
# [14] "202205" "202206" "202207" "202208"
(This answer heavily informed by Transform year/week to date object)
We could do some mathematics.
f <- function(from, to) {
r <- from:to
r[r %% 100 > 0 & r %% 100 < 53]
}
input.from <- 202144; input.to <- 202208
f(input.from, input.to)
# [1] 202144 202145 202146 202147 202148 202149 202150 202151 202152
# [10] 202201 202202 202203 202204 202205 202206 202207 202208

Print all hours:minutes from 00:00 to 23:59

I would like to print all the hours: minutes in a day from 00:00 to 23:59.
This part goes beyond the question, but if you want to help me, this is the whole idea:
Once that is done, I would like to calculate all the "curious" times that can be interpreted as serendipities. Patterns like: 00:00, 22:22, 01:10, 12:34, 11:44, and the like.
Later on, I would like to count all the "serendipities", and divide them to the total number of hours to know the probabilities of find a "serendipity" each time a person look at the time on his smartphone.
To be honest, I am pretty lost. There is already some months without coding. For the first part of the problem, I guess that a loop can make the task.
For the second part, an if conditional can probably make it.
For the first part of the problem I have tried loops like this
for(i in x){
for(k in y){
cat(i,":",k, ",")
}
}
For the second, something like
Assuming the digits of the time are ab:cd
if(a==b & a==c & a==d){
print(ab:cd)
TRUE
}
if(a==b & c==d){
print(ab:cd)
TRUE
}
I would like to get the whole list of numbers first. Then, the list of "serendipities", and finally the count of both to make the percentage.
I find interesting how people find patterns in numbers when they look at the time, and I would like to know how probable is to get one of these patterns out of the 24*60 = 1440
I hope I have explained myself. (I used to be better with coding and maths, but after some months, I have forgotten almost everything).
Here's a way to generate the list of all possible times.
h <- seq(from=0, to=23)
m <- seq(from=0, to=59)
h <- sprintf('%02d', h)
m <- sprintf('%02d', m)
df <- data.frame(expand.grid(h, m))
df$times <- paste0(df$Var1, ':', df$Var2)
df <- df[order(df$times), ]
df$times
Partial output
df$times[1:25]
[1] "00:00" "00:01" "00:02" "00:03" "00:04" "00:05" "00:06" "00:07" "00:08"
[10] "00:09" "00:10" "00:11" "00:12" "00:13" "00:14" "00:15" "00:16" "00:17"
[19] "00:18" "00:19" "00:20" "00:21" "00:22" "00:23" "00:24"
Length of variable
dim(df)
[1] 1440 3
We can create a sequence of 1 minute interval starting from 00:00:00 to 23:59:00 and then use format to get output in desired format.
format(seq(as.POSIXct("00:00:00", format = "%T"),
as.POSIXct("23:59:00", format = "%T"), by = "1 min"), "%H:%M")
#[1] "00:00" "00:01" "00:02" "00:03" "00:04" "00:05" "00:06" "00:07" "00:08" "00:09"
# "00:10" "00:11" "00:12" "00:13" "00:14" "00:15" "00:16" "00:17" "00:18" "00:19" ...
Yet another way of doing it:
> result <- character(1440)
> for (i in 0:1439) result[i+1L] <- sprintf("%02d:%02d",
+ i %/% 60,
+ i %% 60
+ )
> head(result)
[1] "00:00" "00:01" "00:02" "00:03" "00:04" "00:05"
> tail(result)
[1] "23:54" "23:55" "23:56" "23:57" "23:58" "23:59"

How do I change the index in a csv file to a proper time format?

I have a CSV file of 1000 daily prices
They are of this format:
1 1.6
2 2.5
3 0.2
4 ..
5 ..
6
7 ..
.
.
1700 1.3
The index is from 1:1700
But I need to specify a begin date and end date this way:
Start period is lets say, 25th january 2009
and the last 1700th value corresponds to 14th may 2013
So far Ive gotten this close to this problem:
> dseries <- ts(dseries[,1], start = ??time??, freq = 30)
How do I go about this? thanks
UPDATE:
managed to create a seperate object with dates as suggested in the answers and plotted it, but the y axis is weird, as shown in the screenshot
Something like this?
as.Date("25-01-2009",format="%d-%m-%Y") + (seq(1:1700)-1)
A better way, thanks to #AnandaMahto:
seq(as.Date("2009-01-25"), by="1 day", length.out=1700)
Plotting:
df <- data.frame(
myDate=seq(as.Date("2009-01-25"), by="1 day", length.out=1700),
myPrice=runif(1700)
)
plot(df)
R stores Date-classed objects as the integer offset from "1970-01-01" but the as.Date.numeric function needs an offset ('origin') which can be any staring date:
rDate <- as.Date.numeric(dseries[,1], origin="2009-01-24")
Testing:
> rDate <- as.Date.numeric(1:10, origin="2009-01-24")
> rDate
[1] "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29"
[6] "2009-01-30" "2009-01-31" "2009-02-01" "2009-02-02" "2009-02-03"
You didn't need to add the extension .numeric since R would automticallly seek out that function if you used the generic stem, as.Date, with an integer argument. I just put it in because as.Date.numeric has different arguments than as.Date.character.

Date sequence in R spanning B.C.E. to A.D

I would like to generate a sequence of dates from 10,000 B.C.E. to the present. This is easy for 0 C.E. (or A.D.):
ADtoNow <- seq.Date(from = as.Date("0/1/1"), to = Sys.Date(), by = "day")
But I am stumped as to how to generate dates before 0 AD. Obviously, I could do years before present but it would be nice to be able to graph something as BCE and AD.
To expand on Ricardo's suggestion, here is some testing of how things work. Or don't work for that matter.
I will repeat Joshua's warning taken from ?as.Date for future searchers in big bold letters:
"Note: Years before 1CE (aka 1AD) will probably not be handled correctly."
as.integer(as.Date("0/1/1"))
[1] -719528
as.integer(seq(as.Date("0/1/1"),length=2,by="-10000 years"))
[1] -719528 -4371953
seq(as.Date(-4371953,origin="1970-01-01"),Sys.Date(),by="1000 years")
# nonsense
[1] "0000-01-01" "'000-01-01" "(000-01-01" ")000-01-01" "*000-01-01"
[6] "+000-01-01" ",000-01-01" "-000-01-01" ".000-01-01" "/000-01-01"
[11] "0000-01-01" "1000-01-01" "2000-01-01"
> as.integer(seq(as.Date(-4371953,origin="1970-01-01"),Sys.Date(),by="1000 years"))
# also possibly nonsense
[1] -4371953 -4006710 -3641468 -3276225 -2910983 -2545740 -2180498 -1815255
[9] -1450013 -1084770 -719528 -354285 10957
Though this does seem to work for graphing somewhat:
yrs1000 <- seq(as.Date(-4371953,origin="1970-01-01"),Sys.Date(),by="1000 years")
plot(yrs1000,rep(1,length(yrs1000)),axes=FALSE,ann=FALSE)
box()
axis(2)
axis(1,at=yrs1000,labels=c(paste(seq(10000,1000,by=-1000),"BC",sep=""),"0AD","1000AD","2000AD"))
title(xlab="Year",ylab="Value")
Quite some time has gone by since this question was asked. With that time came a new R package, gregorian which can handle BCE time values in the as_gregorian method.
Here's an example of piecewise constructing a list of dates that range from -10000 BCE to the current year.
library(lubridate)
library(gregorian)
# Container for the dates
dates <- c()
starting_year <- year(now())
# Add the CE dates to the list
for (year in starting_year:0){
date <- sprintf("%s-%s-%s", year, "1", "1")
dates <- c(dates, gregorian::as_gregorian(date))
}
starting_year <- "-10000"
# Add the BCE dates to the list
for (year in starting_year:0){
start_date <- gregorian::as_gregorian("-10000-1-1")
date <- sprintf("%s-%s-%s", year, "1", "1")
dates <- c(dates, gregorian::as_gregorian(date))
}
How you use the list is up to you, just know that the relevant properties of the date objects are year and bce. For example, you can loop over list of dates, parse the year, and determine if it's BCE or not.
> gregorian_date <- gregorian::as_gregorian("-10000-1-1")
> gregorian_date$bce
[1] TRUE
> gregorian_date$year
[1] 10001
Notes on 0AD
The gregorian package assumes that when you mean Year 0, you're really talking about year 1 (shown below). I personally think an exception should be thrown, but that's the mapping users needs to keep in mind.
> gregorian::as_gregorian("0-1-1")
[1] "Monday January 1, 1 CE"
This is also the case with BCE
> gregorian::as_gregorian("-0-1-1")
[1] "Saturday January 1, 1 BCE"
As #JoshuaUlrich commented, the short answer is no.
However, you can splice out the year into a separate column and then convert to integer. Would this work for you?
The package lubridate seems to handle "negative" years ok, although it does create a year 0, which from the above comments seems to be inaccurate. Try:
library(lubridate)
start <- -10000
stop <- 2013
myrange <- NULL
for (x in start:stop) {
myrange <- c(myrange,ymd(paste0(x,'-01-01')))
}

Resources