ceiling date in R - r

I was using ceiling_date when I saw that it was behaving in a way inconsistent with floor_date. For example,
> floor_date(as.Date("05/10/2020","%m/%d/%Y"),unit="week",week_start=7)
[1] "2020-05-10"
> ceiling_date(as.Date("05/10/2020","%m/%d/%Y"),unit="week",week_start=7)
[1] "2020-05-17"
But floor(5)=ceiling(5)=5 in R.
One has to set change_on_boundary = False in the ceiling_date function to make it behave like floor_date, but I think this should be the default behavior. I read up on the rationale behind having ceiling_date behave the way it does above, and it did not make sense to me. In fact, there was a time when what I think should be default behavior was indeed the default behavior. Please see my comments in italics below against the documentation.
change_on_boundary
If NULL (the default) don't change instants on the boundary (ceiling_date(ymd_hms('2000-01-01 00:00:00')) is 2000-01-01 00:00:00), but round up Date objects to the next boundary (ceiling_date(ymd("2000-01-01"), "month") is "2000-02-01"). When TRUE, instants on the boundary are rounded up to the next boundary. When FALSE, date-time on the boundary are never rounded up (this was the default for lubridate prior to v1.6.0. See section Rounding Up Date Objects below for more details. <- So there was a time when what I indicated should be default behavior was default behavior.
By default rounding up Date objects follows 3 steps:
Convert to an instant representing lower bound of the Date: 2000-01-01 –> 2000-01-01 00:00:00
Round up to the next closest rounding unit boundary. For example, if the rounding unit is month then next closest boundary of 2000-01-01 is 2000-02-01 00:00:00.
The motivation for this is that the "partial" 2000-01-01 is conceptually an interval (2000-01-01 00:00:00 – 2000-01-02 00:00:00) and the day hasn't started clocking yet at the exact boundary 00:00:00. Thus, it seems wrong to round up a day to its lower boundary.
<-I don't follow what "and the day hasn't started clocking yet at the exact boundary 00:00:00" means, and how and why " 2000-01-01 is conceptually an interval (2000-01-01 00:00:00 – 2000-01-02 00:00:00) " is relevant.
Even if 5/10/2020 is considered as a whole day its ceiling date should, for unit=week and week_start=7, still be 5/10/2020 because ceiling_date(as.Date("05/10/2020","%m/%d/%Y"),unit=week and week_start=7) should return the earliest Sunday no earlier than 5/10/2020. And this day is clearly 5/10/2020. It is not 5/17/2020.
Can someone weigh in on this?

I understand what you're saying, but I do see what they mean in the documentation as well.
I think the easiest way to illustrate this is to imagine someone telling you "happy Monday!" You would think that what they imply is that Monday has already started, right? Therefore, "Monday" loosely refers to the time period between Monday and Tuesday. If you think of days of a week on a real line and let the integers 1 to 7 represent days Monday to Sunday, then what the documentation is trying to say is that Monday loosely refers to the interval (1, 2), Tuesday loosely refers to the interval (2, 3), and so on. So basically you wanna think of a date as an instant that's strictly between that date and the next date. Thus it's not entirely illogical that they make the default behavior of rounding up dates the way they do.

Related

Is IDL able to add / subtract from date?

As you can see the question above, I was wondering if IDL is able to add or subtract days / months / years to a given date.
For example:
given_date = anytim('01-jan-2000')
print, given_date
1-Jan-2000 00:00:00.000
When I would add 2 weeks to the given_date, then this date should appear:
15-Jan-2000 00:00:00.000
I was already looking for a solution for this problem, but I unfortunately couldn't find any solution.
Note:
I am using a normal calendar date, not the julian date.
Are you only concerned with dates after 1582? Is accuracy to the second important?
The ANYTIM routine is not part of the IDL distribution. Possibly there are third party routines to handle time increments, but I don't know of any builtin to the IDL library.
By default, which you are using, ANYTIM returns seconds from Jan 1, 1979. So to add/subtract some number of days, weeks, or years, you could calculate the number of seconds in the time interval. Of course, this does not take into account leap seconds/years (but leap years are fairly easy to take into account, leap seconds requires a database of when they were added). And adding months is going to require determining which month so to determine the number of days in it.
IDL can convert to and from Julian dates using JULDAY and CALDAT.
You may also read and write Julian dates (which are doubles or long integers) to and from strings using the format keyword to PRINT, STRING, and READS.
You'll want to use the (C()) calendar date format code.
format='(c(cdi0,"-",cMoa,"-"cyi04," ",cHi02,":",cmi02,":",csf06.3))'
date = julday(1, 1, 2000)
print, date, format=format
; 1-Jan-2000 00:00:00.000
date = date + 14
print, date, format=format
; 15-Jan-2000 00:00:00.000

Finding the Min & Max Times for Multiple Individuals

For work I have a report where I compile the number of calls, emails, and texts a person makes each day. Along with this I need to pick out the earliest (Min) and the latest (max) times for each of those actions. I'm wondering if there isn't an easier way to me to pull this data from the date column rather than scrolling down for each person and finding the information.
You are right, there is definitely an easier way. What we need to rely on is that Excel stores times as the number of days since 0th January 1900 (so 1st January 1900 is day 1). Therefore, finding the earliest and latest times is simple a matter of finding the min and max values within a specific day.
I'm assuming that your data is set out as in the following. If it isn't, you can just edit my formula as appropriate.
A B C D
1 Person Date Time
2 Steve Monday 14:00
3 Steve Monday 14:05
4 Sharon Monday 12:00
5 Steve Tuesday 09:00
6 Sharon Tuesday 15:00
What we need to do is find the minimum time for Steve, given that date = Monday. We need to use an array formula. Array formulas let us 'look up' against more than one cell at the same time. The formula I would use is:
=MIN(IF(A2:A6="Steve",IF(B2:B6="Monday",C2:C6)))
Instead of clicking 'Enter' when you use this formula, you need to click Ctrl+Shift+Enter i.e., you enter the formula above and click Ctrl+Shift+Enter and Excel will return:
={MIN(IF(A2:A6="Steve",IF(B2:B6="Monday",C2:C6)))}
Can you see how to add more constraints to the look up? I've included an example screenshot below, where I've made a bigger table and also made the 'Steve' and 'Monday' references refer out to a cell, rather than just being hardcoded into the formula.

Fix split.xts behaviour prior to the epoch (1-1-1970)

I noticed some strange xts behaviour when trying to split an object that goes back a long way. The behaviour of split changes at the epoch.
#Create some data
dates <- seq(as.Date("1960-01-01"),as.Date("1980-01-01"),"days")
x <- rnorm(length(dates))
data <- xts(x, order.by=dates)
If we split the xts object by week, it defines the last day of the week as Monday prior to 1970. Post-1970, it defines it as Sunday (expected behaviour).
#Split the data, keep the last day of the week
lastdayofweek <- do.call(rbind, lapply(split(data, "weeks"), last))
head(lastdayofweek)
tail(lastdayofweek)
1960 Calendar
1979 Calendar
This seems to only be a problem for weeks, not months or years.
#Split the data, keep the last day of the month
lastdayofmonth <- do.call(rbind, lapply(split(data, "months"), last))
head(lastdayofmonth)
tail(lastdayofmonth)
The behaviour seems likely to do with the following, though I am not sure why it would apply to weeks only. From the xts cran.
For dates prior to the epoch (1970-01-01) the ending time is aligned to the 59.0000 second. This is
due to a bug/feature in the R implementation of asPOSIXct and mktime0 at the C-source level. This
limits the precision of ranges prior to 1970 to 1 minute granularity with the current xts workaround.
My workaround has been to shift the dates before splitting the objects for data prior to 1970, if I am splitting on weeks. I expect someone else has a more elegant solution (or a way to avoid the error).
EDIT: To be clear as to what the question is, I am looking for an answer that
a) specifies why this happens (so I can understand the nature of the error better, and therefore avoid it) and/or
b) the best workaround to deal with it.
One "workaround" would be to check out Rev. 743 or earlier, as it appears to me that this broke in Rev. 744.
svn checkout svn://svn.r-forge.r-project.org/svnroot/xts/#743
But, a much better idea is to file a bug report so that you don't have to use an old version forever. (also, of course, other bugs may have been patched and/or new features added since Rev. 743)

convert difftime time to years, months and days

How can I accurately convert the products (units is in days) of the difftime below to years, months and days?
difftime(Sys.time(),"1931-04-10")
difftime(Sys.time(),"2012-04-10")
This does years and days but how could I include months?
yd.conv<-function(days, print=TRUE){
x<-days*0.00273790700698851
x2<-floor(x)
x3<-x-x2
x4<-floor(x3*365.25)
if (print) cat(x2,"years &",x4,"days\n")
invisible(c(x2, x4))
}
yd.conv(difftime(Sys.time(),"1931-04-10"))
yd.conv(difftime(Sys.time(),"2012-04-10"))
I'm not sure how to even define months either. Would 4 weeks be considered a month or the passing of the same month day. So for the later definition of a month if the initial date was 2012-01-10 and the current 2012-05-31 then we'd have 0 years, 5 months and 21 days. This works well but what if the original date was on the 31st of the month and the end date was on feb 28 would this be considered a month?
As I wrote this question the question itself evolved so I'd better clarify:
What would be the best (most logical approach) to defining months and then how to find diff time in years, months and days?
If you're doing something like
difftime(Sys.time(), someDate)
It comes as implied that you must know what someDate is. In that case, you can convert this to a POSIXct class object that gives you the ability to extract temporal information directly (package chron offers more methods, too). For instance
as.POSIXct(c(difftime(Sys.time(), someDate, units = "sec")), origin = someDate)
This will return your desired date object. If you have a timezone tz to feed into difftime, you can also pass that directly to the tz parameter in as.POSIXct.
Now that you have your date object, you can run things like months(.) and if you have chron you can do years(.) and days(.) (returns ordered factor).
From here, you could do more simple math on the difference of years, months, and days separately (converting to appropriate numeric representations). Of course, convert someDate to POSIXct will be required.
EDIT: On second thought, months(.) returns a character representation of the month, so that may not be efficient. At least, it'll require a little processing (not too difficult) to give a numeric representation.
R has not implemented these features out of ignorance. difftime objects are transitive. A 700 day difference on any arbitrary start-date can yield a differing number of years depending on whether there was a leap year or not. Similarly for months, they take between 28-31 days.
For research purposes, we use these units a lot (months and years) and pragmatically, we define a year as 365.25 days and a month as 365.25/12 = 30.4375 days.
To do arithmetic on a given difftime, you must convert this value to numeric using as.numeric(difftime.obj) which is, in default, days so R stops spouting off the units.
You can not simply convert a difftime to month, since the definition of months depends on the absolute time at which the difftime has started.
You'll need to know the start date or the end date to accurately tell the number of months.
You could then, e.g., calculate the number of months in the first year of your timespan, the number of month in the last your of the timespan, and add the number of years between times 12.
Hmm. I think the most sensible would be to look at the various units themselves. So compare the day of the month first, then compare the month of the year, then compare the year. At each point, you can introduce a carry to avoid negative values.
In other words, don't work with the product of difftime, but recode your own difftime.

ISO 8601 Repeating Interval

Wikipedia gives an example of an ISO 8601 example of a repeating interval:
R5/2008-03-01T13:00:00Z/P1Y2M10DT2H30M
This is what this means:
R5 means that the interval after the slash is repeated 5 times.
2008-03-01T13:00:00Z means that the interval begins at this given datetime.
P1Y2M10DT2H30M means that the interval lasts for
1 year
2 months
10 days
2 hours
30 minutes
My problem is that I do not know exactly what is being repeated here. Does the repetition
occur immediately after the interval ends? Can I specify that every Monday something happens from 13:00 to 14:00?
The standard itself doesn't clarify, but the only obvious interpretation here is that the interval repeats back-to-back. So this recurring interval:
R2/2008-03-01T13:00:00Z/P1Y2M10DT2H30M
Will be equivalent to these non-recurring intervals:
2008-03-01T13:00:00Z/P1Y2M10DT2H30M
2009-05-01T15:30:00Z/P1Y2M10DT2H30M
(Note: my reading is that the number of repetitions does include the first occurrence)
There is no way to represent "every Monday from 13:00 to 14:00" inside of ISO 8601, but it's natural to do for a VEVENT in the iCalendar format. (If you could do that entirely within ISO 8601, then that would give rise to a slew of further feature requests)
Yes, ISO8601 does define a regular repeating interval (or as regular as a "month" can be as one of the units).
R5/2008-03-01T13:00:00Z/P1Y2M10DT2H30M
Should generate these times:
2009-05-11T15:30:00Z
2010-07-21T18:00:00Z
2011-10-01T20:30:00Z
2012-12-11T23:00:00Z
2014-02-22T00:30:00Z
It doesn't define a "start time" and "end time" like RFC5545 (iCalendar) does, or even irregular repetition like RRULE or crontab can.
You should be able to specify a weekly repetition using the ISO Week Date as a starting point, but you'll need separate repetitions for "start" and "end" times:
R/2021-W01-1T13:00:00Z/P1W
R/2021-W01-1T14:00:00Z/P1W
The first interval is for the start times: Mondays at 13:00 (starting in 2021), and the second is for the end times: Mondays at 14:00 (starting in 2021).
I'm probably being an idiot (Long Covid Brain) but isn't the obvious extension to ISO-8601 a second duration part? In the absence of the second duration, the repeats are back to back, in its presence what is actually repeating is a smaller duration event at the start of each period. e.g.
R/2021-W01-1T13:00:00Z/P1W/P1H
indefinite weekly repeat of hour long slots every Monday 1pm starting week 1 2021.
EDIT: Maybe you could even nest them ...
R/2021-W01-1T09:00:00Z/P1W/R5/P1D/P8H
Mon to Fri, 9am to 5pm, every week? Ok I'll get my coat

Resources