Next week day for a given vector of dates - r

I'm trying to get the next week day for a vector of dates in R. My approach was to create a vector of weekdays and then find the date to the weekend date I have. The problem is that for Saturday and some holidays (which are a lot in my country) i end up getting the previous week day which doesn't work.
This is an example of my problem:
vecDates = as.Date(c("2011-01-11","2011-01-12","2011-01-13","2011-01-14","2011-01-17","2011-01-18",
"2011-01-19","2011-01-20","2011-01-21","2011-01-24"))
testDates = as.Date(c("2011-01-22","2011-01-23"))
findInterval(testDates,vecDates)
for both dates the correct answer should be 10 which is "2011-01-24" but I get 9.
I though of a solution where I remove all the previous dates to the date i'm analyzing, and then use findInterval. It works but it is not vectorized and therefore kind of slow which does not work for my actual purpose.

Does this do what you want?
vecDates = as.Date(c("2011-01-11","2011-01-12",
"2011-01-13","2011-01-14",
"2011-01-17","2011-01-18",
"2011-01-19","2011-01-20",
"2011-01-21","2011-01-24"))
testDates = as.Date(c("2011-01-20","2011-01-22","2011-01-23"))
get_next_biz_day <- function(testdays, bizdays){
o <- findInterval(testdays, bizdays) + 1
bizdays[o]
}
get_next_biz_day(testDates, vecDates)
#[1] "2011-01-21" "2011-01-24" "2011-01-24"

Related

Removing specific times and days every week from time dataframe

Been learning R for a couple months and stumbled across an issue that I can't seem to find yet on stackoverflow. I have a timeframe dataset dictated by:
ts <- seq.POSIXt(as.POSIXlt("2014-08-01 15:00"), as.POSIXlt("2017-08-04 19:33"), by="min")
ts <- format.POSIXct(ts,'%Y%m%d %H%M')
df <- data.frame(timestamp=ts)
I have seen how to remove specific times from every day, and how to remove complete days such as weekends/holidays but I am looking to remove subsets from every week, specifically 8:00 on every Saturday to 9:00 on every Monday throughout the entire dataset. I have tried doing the reverse, by subsetting the period I need by using lubridate (thanks #Christian):
dfc = ymd_hm(df$timestamp)
df[day(dfc) == 2 & hour(dfc) >= 9 | day(dfc) == 7 & hour(dfc) >= 8,]
but it didn't seem to work.
Cheers.
you cant subset when using lubridate with square brackets. Instead its called like a regular function. try to replace e.g. hour[dfc] with hour(dfc) and you should be fine.
edit: to subset a range you need to be aware of == is not like >=
edit2: a bit more of a pointing into the right direction
ts_sat_until_monday = seq.POSIXt(as.POSIXlt("2014-08-02 09:00"),
as.POSIXlt("2014-08-04 08:00"), by = 1)
unique(day(ts_sat_until_monday))
unique(hour(ts_sat_until_monday))
#what about sunday? up to you

as.Date produces unexpected result in a sequence of week-based dates

I am working on the transformation of week based dates to month based dates.
When checking my work, I found the following problem in my data which is the result of a simple call to as.Date()
as.Date("2016-50-4", format = "%Y-%U-%u")
as.Date("2016-50-5", format = "%Y-%U-%u")
as.Date("2016-50-6", format = "%Y-%U-%u")
as.Date("2016-50-7", format = "%Y-%U-%u") # this is the problem
The previous code yields correct date for the first 3 lines:
"2016-12-15"
"2016-12-16"
"2016-12-17"
The last line of code however, goes back 1 week:
"2016-12-11"
Can anybody explain what is happening here?
Working with week of the year can become very tricky. You may try to convert the dates using the ISOweek package:
# create date strings in the format given by the OP
wd <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1", "2016-52-7")
# convert to "normal" dates
ISOweek::ISOweek2date(stringr::str_replace(wd, "-", "-W"))
The result
#[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19" "2017-01-01"
is of class Date.
Note that the ISO week-based date format is yyyy-Www-d with a capital W preceeding the week number. This is required to distinguish it from the standard month-based date format yyyy-mm-dd.
So, in order to convert the date strings provided by the OP using ISOweek2date() it is necessary to insert a W after the first hyphen which is accomplished by replacing the first - by -W in each string.
Also note that ISO weeks start on Monday and the days of the week are numbered 1 to 7. The year which belongs to an ISO week may differ from the calendar year. This can be seen from the sample dates above where the week-based date 2016-W52-7 is converted to 2017-01-01.
About the ISOweek package
Back in 2011, the %G, %g, %u, and %V format specifications weren't available to strptime() in the Windows version of R. This was annoying as I had to prepare weekly reports including week-on-week comparisons. I spent hours to find a solution for dealing with ISO weeks, ISO weekdays, and ISO years. Finally, I ended up creating the ISOweek package and publishing it on CRAN. Today, the package still has its merits as the aforementioned formats are ignored on input (see ?strptime for details).
As #lmo said in the comments, %u stands for the weekdays as a decimal number (1–7, with Monday as 1) and %U stands for the week of the year as decimal number (00–53) using Sunday as the first day. Thus, as.Date("2016-50-7", format = "%Y-%U-%u") will result in "2016-12-11".
However, if that should give "2016-12-18", then you should use a week format that has also Monday as starting day. According to the documentation of ?strptime you would expect that the format "%Y-%V-%u" thus gives the correct output, where %V stands for the week of the year as decimal number (01–53) with monday as the first day.
Unfortunately, it doesn't:
> as.Date("2016-50-7", format = "%Y-%V-%u")
[1] "2016-01-18"
However, at the end of the explanation of %V it sais "Accepted but ignored on input" meaning that it won't work.
You can circumvent this behavior as follows to get the correct dates:
# create a vector of dates
d <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1")
# convert to the correct dates
as.Date(paste0(substr(d,1,8), as.integer(substring(d,9))-1), "%Y-%U-%w") + 1
which gives:
[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19"
The issue is because for %u, 1 is Monday and 7 is Sunday of the week. The problem is further complicated by the fact that %U assumes week begins on Sunday.
For the given input and expected behavior of format = "%Y-%U-%u", the output of line 4 is consistent with the output of previous 3 lines.
That is, if you want to use format = "%Y-%U-%u", you should pre-process your input. In this case, the fourth line would have to be as.Date("2016-51-7", format = "%Y-%U-%u") as revealed by
format(as.Date("2016-12-18"), "%Y-%U-%u")
# "2016-51-7"
Instead, you are currently passing "2016-50-7".
Better way of doing it might be to use the approach suggested in Uwe Block's answer. Since you are happy with "2016-50-4" being transformed to "2016-12-15", I suspect in your raw data, Monday is counted as 1 too. You could also create a custom function that changes the value of %U to count the week number as if week begins on Monday so that the output is as you expected.
#Function to change value of %U so that the week begins on Monday
pre_process = function(x, delim = "-"){
y = unlist(strsplit(x,delim))
# If the last day of the year is 7 (Sunday for %u),
# add 1 to the week to make it the week 00 of the next year
# I think there might be a better solution for this
if (y[2] == "53" & y[3] == "7"){
x = paste(as.integer(y[1])+1,"00",y[3],sep = delim)
} else if (y[3] == "7"){
# If the day is 7 (Sunday for %u), add 1 to the week
x = paste(y[1],as.integer(y[2])+1,y[3],sep = delim)
}
return(x)
}
And usage would be
as.Date(pre_process("2016-50-7"), format = "%Y-%U-%u")
# [1] "2016-12-18"
I'm not quite sure how to handle when the year ends on a Sunday.

sequence of monthly dates making sure it's the same day, or the last day of month in case of invalid

Given an initial date, I want to generate a sequence of dates with monthly intervals, ensuring every element has the same day as the initial date or the last day of the month in case the same day would yield an invalid date.
Sounds pretty standard, right?
Using difftime is not possible. Here's what the help file of difftime says:
Units such as "months" are not possible as they are not of constant
length. To create intervals of months, quarters or years use seq.Date
or seq.POSIXt.
But then looking at the help file of seq.POSIXt I find that:
Using "month" first advances the month without changing the day: if
this results in an invalid day of the month, it is counted forward
into the next month: see the examples.
This is the example in the help file.
seq(ISOdate(2000,1,31), by = "month", length.out = 4)
> seq(ISOdate(2000,1,31), by = "month", length.out = 4)
[1] "2000-01-31 12:00:00 GMT" "2000-03-02 12:00:00 GMT"
"2000-03-31 12:00:00 GMT" "2000-05-01 12:00:00 GMT"
So, given that the initial date is on day 31, this would yield invalid dates on February, April, etc. So, the sequence end up actually skipping those months because it "counts forward" and end up with March-02, instead of February-29.
If I start on 2000-01-31, I would like the sequence as follows:
2000-01-31
2000-02-29
2000-03-31
2000-04-30
...
And it should properly handle leap-years, so if the initial date is 2015-01-31 the sequence should be:
2015-01-31
2015-02-28
2015-03-31
2015-04-30
...
These are just examples to illustrate the problem and I do not know the initial date in advance, nor can I assume anything about it. The initial date may well be in the middle of the month (2015-01-15) in which case seq works fine. But it can also be, as in the examples, towards the end of the month on dates that using seq alone would be problematic (days 29, 30 and 31). I cannot assume either that the initial date is the last day of the month.
I have looked around trying to find a solution. In some questions here in SO (e.g. here) there is a "trick" to get the last day of a month, by getting the first day of the next month and simply subtract 1. And finding the first day is "easy" because it is just day 1.
So my solution so far is:
# Given an initial date for my sequence
initial_date <- as.Date("2015-01-31")
# Find the first day of the month
library(magrittr) # to use pipes and make the code more readable
firs_day_of_month <- initial_date %>%
format("%Y-%m") %>%
paste0("-01") %>%
as.Date()
# Generate a sequence from initial date, using seq
# This is the sequence that will have incorrect values in months that would
# have invalid dates
given_dat_seq <- seq(initial_date, by = "month", length.out = 4)
# And then generate an auxiliary sequence for the last day of the month
# I do this generating a sequence that starts the first day of the
# same month as initial date and it goes one month further
# (lenght 5 instead of 4) and substract 1 to all the elements
last_day_seq <- seq(firs_day_of_month, by = "month", length.out = 5)-1
# And finally, for each pair of elements, I take the min date of both
pmin(given_dat_seq, last_day_seq[2:5])
It works, but it is, at the same time, kinda dumb, hacky and convoluted. So I do not like it. And most importantly, I cannot believe there is no easier way to do this in R.
Can someone please point me to a simpler solution? (I guess it should have been as simple as seq(initial_date, "month", 4), but apparently it is not). I've googled it and looked here in SO and R mailing lists, but apart from the tricks I mentioned above, I couldn't find a solution.
The simplest solution is %m+% from lubridate, which solves this exact problem. So:
seq_monthly <- function(from,length.out) {
return(from %m+% months(c(0:(length.out-1))))
}
Output:
> seq_monthly(as.Date("2015-01-31"),length.out=4)
[1] "2015-01-31" "2015-02-28" "2015-03-31" "2015-04-30"
Similar to the lubridate answer, here is one using RcppBDT (which wraps the Boost Date.Time library from C++)
R> dt <- new(bdtDt, 2010, 1, 31); for (i in 1:5) { dt$addMonths(i); print(dt) }
[1] "2010-02-28"
[1] "2010-04-30"
[1] "2010-07-31"
[1] "2010-11-30"
[1] "2011-04-30"
R> dt <- new(bdtDt, 2000, 1, 31); for (i in 1:5) { dt$addMonths(i); print(dt) }
[1] "2000-02-29"
[1] "2000-04-30"
[1] "2000-07-31"
[1] "2000-11-30"
[1] "2001-04-30"
R>

Lubridate week() to find consecutive week number for multi-year periods

Within R, say I have a vector of some Lubridate dates:
> Date
"2012-01-01 UTC"
"2013-01-01 UTC"
Next, suppose I want to see what week number these days fall in:
> week(Date)
1
1
Lubridate is fantastic!
But wait...I'm dealing a time series with 10,000 rows of data...and the data spans 3 years.
I've been struggling with finding some way to make this happen:
> result of awesome R code here
1
54
The question: is there a succinct way to coax out a list of week numbers over multiyear periods within Lubridate? More directly, I would like the first week of the second year to be represented as the 54th week. And the first week in the third year to be represented as the 107th week, ad nauseum.
So far, I've attempted a number of hackney schemes but cannot seem to create something not fastened together with scotch tape. Any advice would be greatly appreciated. Thanks in advance.
To get the interval from a particular date to another date, you can just subtract...
If tda is your vector of dates, then
tda - min(tda)
will be the difference in seconds between them.
To get the units out in weeks:
(tda - min(tda))/eweeks(1)
To do it from a particular date:
tda - ymd(19960101)
This gives the number of days from 1996 to each value.
From there, you can divide by days per week, or seconds per week.
(tda - ymd(19960101))/eweeks(1)
To get only the integer part, and starting from January 2012:
trunc((tda - ymd(20111225))/eweeks(1))
Test data:
tda = ymd(c(20120101, 20120106, 20130101, 20130108))
Output:
1 1 53 54
Since eweeks() is now deprecated, I thought I'd add to #beroe's answer.
If tda is your date vector, you can get the week numbers with:
weeknos <- (interval(min(tda), tda) %/% weeks(1)) + 1
where %/% causes integer division. ( 5 / 3 = 1.667; 5 %/% 3 = 1)
You can do something like this :
week(dat) +53*(year(dat)-min(year(dat)))
Given you like lubridate (as do I)
year_week <- function(x,base) week(x) - week(base) + 52*(year(x) - year(base))
test <- ymd(c(20120101, 20120106, 20130101, 20130108))
year_week(test, "2012-01-01")
Giving
[1] 0 0 52 53

Creating a specific sequence of date/times in R

I want to create a single column with a sequence of date/time increasing every hour for one year or one month (for example). I was using a code like this to generate this sequence:
start.date<-"2012-01-15"
start.time<-"00:00:00"
interval<-60 # 60 minutes
increment.mins<-interval*60
x<-paste(start.date,start.time)
for(i in 1:365){
print(strptime(x, "%Y-%m-%d %H:%M:%S")+i*increment.mins)
}
However, I am not sure how to specify the range of the sequence of dates and hours. Also, I have been having problems dealing with the first hour "00:00:00"? Not sure what is the best way to specify the length of the date/time sequence for a month, year, etc? Any suggestion will be appreciated.
I would strongly recommend you to use the POSIXct datatype. This way you can use seq without any problems and use those data however you want.
start <- as.POSIXct("2012-01-15")
interval <- 60
end <- start + as.difftime(1, units="days")
seq(from=start, by=interval*60, to=end)
Now you can do whatever you want with your vector of timestamps.
Try this. mondate is very clever about advancing by a month. For example, it will advance the last day of Jan to last day of Feb whereas other date/time classes tend to overshoot into Mar. chron does not use time zones so you can't get the time zone bugs that code as you can using POSIXct. Here x is from the question.
library(chron)
library(mondate)
start.time.num <- as.numeric(as.chron(x))
# +1 means one month. Use +12 if you want one year.
end.time.num <- as.numeric(as.chron(paste(mondate(x)+1, start.time)))
# 1/24 means one hour. Change as needed.
hours <- as.chron(seq(start.time.num, end.time.num, 1/24))

Resources