Is there a way in R to do calendar arithmetic, e.g.
> as.Date('2014-03-30') - months(1)
[1] 2014-02-28
except in reality there's no such months function. This can be done with awareness of leap years and daylight savings time in SQL and Java, but I can't find a way to do it in R. I thought I'd get clever and use seq but no:
> seq(as.POSIXct('2014-03-30', tz='UTC'), by = '-1 months', length=2)[2]
[1] "2014-03-02 UTC"
Here is one way using RcppBDT which wraps
(parts of) Boost Date_Time for use by R:
R> library(RcppBDT)
R> dt <- new(bdtDt, 2014, 3, 30)
R> dt
[1] "2014-03-30"
R> dt$addMonths(-1)
R> dt
[1] "2014-02-28"
R>
This is too long for an organized comment, but months is a function, and using as.POSIXlt (as opposed to ct) can allow for easy extraction of date attributes.
test <- as.POSIXlt('2014-03-30', tz='UTC')
attributes(test)$names
test$mon
months(test)
Given that test$mon returns a numeric value, it would be easy to perform arithmetic on the months. However, subtracting 1 month from January just gives you -1 (Jan is 0), and redefining test$mon <- test$mon - 1 doesn't seem to be of much help.
Nonetheless, depending on your application, the above information may still be useful.
Related
Given an initial date, I want to generate a sequence of dates with monthly intervals, ensuring every element has the same day as the initial date or the last day of the month in case the same day would yield an invalid date.
Sounds pretty standard, right?
Using difftime is not possible. Here's what the help file of difftime says:
Units such as "months" are not possible as they are not of constant
length. To create intervals of months, quarters or years use seq.Date
or seq.POSIXt.
But then looking at the help file of seq.POSIXt I find that:
Using "month" first advances the month without changing the day: if
this results in an invalid day of the month, it is counted forward
into the next month: see the examples.
This is the example in the help file.
seq(ISOdate(2000,1,31), by = "month", length.out = 4)
> seq(ISOdate(2000,1,31), by = "month", length.out = 4)
[1] "2000-01-31 12:00:00 GMT" "2000-03-02 12:00:00 GMT"
"2000-03-31 12:00:00 GMT" "2000-05-01 12:00:00 GMT"
So, given that the initial date is on day 31, this would yield invalid dates on February, April, etc. So, the sequence end up actually skipping those months because it "counts forward" and end up with March-02, instead of February-29.
If I start on 2000-01-31, I would like the sequence as follows:
2000-01-31
2000-02-29
2000-03-31
2000-04-30
...
And it should properly handle leap-years, so if the initial date is 2015-01-31 the sequence should be:
2015-01-31
2015-02-28
2015-03-31
2015-04-30
...
These are just examples to illustrate the problem and I do not know the initial date in advance, nor can I assume anything about it. The initial date may well be in the middle of the month (2015-01-15) in which case seq works fine. But it can also be, as in the examples, towards the end of the month on dates that using seq alone would be problematic (days 29, 30 and 31). I cannot assume either that the initial date is the last day of the month.
I have looked around trying to find a solution. In some questions here in SO (e.g. here) there is a "trick" to get the last day of a month, by getting the first day of the next month and simply subtract 1. And finding the first day is "easy" because it is just day 1.
So my solution so far is:
# Given an initial date for my sequence
initial_date <- as.Date("2015-01-31")
# Find the first day of the month
library(magrittr) # to use pipes and make the code more readable
firs_day_of_month <- initial_date %>%
format("%Y-%m") %>%
paste0("-01") %>%
as.Date()
# Generate a sequence from initial date, using seq
# This is the sequence that will have incorrect values in months that would
# have invalid dates
given_dat_seq <- seq(initial_date, by = "month", length.out = 4)
# And then generate an auxiliary sequence for the last day of the month
# I do this generating a sequence that starts the first day of the
# same month as initial date and it goes one month further
# (lenght 5 instead of 4) and substract 1 to all the elements
last_day_seq <- seq(firs_day_of_month, by = "month", length.out = 5)-1
# And finally, for each pair of elements, I take the min date of both
pmin(given_dat_seq, last_day_seq[2:5])
It works, but it is, at the same time, kinda dumb, hacky and convoluted. So I do not like it. And most importantly, I cannot believe there is no easier way to do this in R.
Can someone please point me to a simpler solution? (I guess it should have been as simple as seq(initial_date, "month", 4), but apparently it is not). I've googled it and looked here in SO and R mailing lists, but apart from the tricks I mentioned above, I couldn't find a solution.
The simplest solution is %m+% from lubridate, which solves this exact problem. So:
seq_monthly <- function(from,length.out) {
return(from %m+% months(c(0:(length.out-1))))
}
Output:
> seq_monthly(as.Date("2015-01-31"),length.out=4)
[1] "2015-01-31" "2015-02-28" "2015-03-31" "2015-04-30"
Similar to the lubridate answer, here is one using RcppBDT (which wraps the Boost Date.Time library from C++)
R> dt <- new(bdtDt, 2010, 1, 31); for (i in 1:5) { dt$addMonths(i); print(dt) }
[1] "2010-02-28"
[1] "2010-04-30"
[1] "2010-07-31"
[1] "2010-11-30"
[1] "2011-04-30"
R> dt <- new(bdtDt, 2000, 1, 31); for (i in 1:5) { dt$addMonths(i); print(dt) }
[1] "2000-02-29"
[1] "2000-04-30"
[1] "2000-07-31"
[1] "2000-11-30"
[1] "2001-04-30"
R>
The title has it: how do you convert a POSIX date to day-of-year?
An alternative is to format the "POSIXt" object using strftime():
R> today <- Sys.time()
R> today
[1] "2012-10-19 19:12:04 BST"
R> doy <- strftime(today, format = "%j")
R> doy
[1] "293"
R> as.numeric(doy)
[1] 293
which is preferable to remembering that the day of the years is zero-based in the POSIX standard.
As ?POSIXlt reveals, a $yday suffix to a POSIXlt date (or even a vector of such) will convert to day of year. Beware that POSIX counts Jan 1 as day 0, so you might want to add 1 to the result.
It took me embarrassingly long to find this, so I thought I'd ask and answer my own question.
Alternatively, the excellent lubridate package provides the yday function, which is just a wrapper for the above method. It conveniently defines similar functions for other units (month, year, hour, ...).
today <- Sys.time()
yday(today)
I realize it isn't quite what the poster was looking for, but I needed to convert POSIX date-times into a fractional day of the year for time series analysis and ended up doing this:
today <- Sys.time()
doy2015f<-difftime(today,as.POSIXct(as.Date("2015-01-01 00:00", tzone="GMT")),units='days')
The data.table package also provides a yday() function.
require(data.table)
today <- Sys.time()
yday(today)
This is the way how I do it:
as.POSIXlt(c("15.4", "10.5", "15.5", "10.6"), format = "%d.%m")$yday
# [1] 104 129 134 160
Here is what I think is a possible bug:
require(lubridate)
d = as.Date("1994-03-31")
> d + months(1)
[1] "1994-05-01"
My understanding is that this should return the end of month 4 of year 1994. Please advise if this is indeed a bug.
It is not a bug, it is a documented feature. It is definitely not a bug in lubridate
as months and as.Date are both base package functions. . (Edit. months.numeric and months.integer are both non-visible functions from the lubridate package.
However, luibridate does have an answer!
It is doing exactly as specified in the help file for %m+% (which is part of the lubridate package.)
Adding months frustrates basic arithmetic because consecutive months have different lengths. With other elements, it is helpful for arithmetic to perform automatic roll over. For example, 12:00:00 + 61 seconds becomes 12:01:01. However, people often prefer that this behavior NOT occur with months. For example, we sometimes want January 31 + 1 month = February 28 and not March 3. months(n) always returns a date in the nth month after Date. If the new date would usually spill over into the n + 1th month, month. Date nth month before Date.
The function %m+% is designed to ensure the feature you want, ensuring that the month isn't rolled over
d %m+% months(1)
## [1] "1994-04-30"
Note that this feature was introduced in version 1.2.0, and thus is not documented in http://www.jstatsoft.org/v40/i03/paper, as this was written prior to the implementation
Also note that you could also use duration(1, 'months')
d + duration(1, 'months')
## [1] "1994-04-30"
How to convert between year,month,day and dates in R?
I know one can do this via strings, but I would prefer to avoid converting to strings, partly because maybe there is a performance hit?, and partly because I worry about regionalization issues, where some of the world uses "year-month-day" and some uses "year-day-month".
It looks like ISODate provides the direction year,month,day -> DateTime , although it does first converts the number to a string, so if there is a way that doesn't go via a string then I prefer.
I couldn't find anything that goes the other way, from datetimes to numerical values? I would prefer not needing to use strsplit or things like that.
Edit: just to be clear, what I have is, a data frame which looks like:
year month day hour somevalue
2004 1 1 1 1515353
2004 1 1 2 3513535
....
I want to be able to freely convert to this format:
time(hour units) somevalue
1 1515353
2 3513535
....
... and also be able to go back again.
Edit: to clear up some confusion on what 'time' (hour units) means, ultimately what I did was, and using information from How to find the difference between two dates in hours in R?:
forwards direction:
lh$time <- as.numeric( difftime(ISOdate(lh$year,lh$month,lh$day,lh$hour), ISOdate(2004,1,1,0), units="hours"))
lh$year <- NULL; lh$month <- NULL; lh$day <- NULL; lh$hour <- NULL
backwards direction:
... well, I didnt do backwards yet, but I imagine something like:
create difftime object out of lh$time (somehow...)
add ISOdate(2004,1,1,0) to difftime object
use one of the solution below to get the year,month,day, hour back
I suppose in the future, I could ask the exact problem I'm trying to solve, but I was trying to factorize my specific problem into generic reusable questions, but maybe that was a mistake?
Because there are so many ways in which a date can be passed in from files, databases etc and for the reason you mention of just being written in different orders or with different separators, representing the inputted date as a character string is a convenient and useful solution. R doesn't hold the actual dates as strings and you don't need to process them as strings to work with them.
Internally R is using the operating system to do these things in a standard way. You don't need to manipulate strings at all - just perhaps convert some things from character to their numerical equivalent. For example, it is quite easy to wrap up both operations (forwards and backwards) in simple functions you can deploy.
toDate <- function(year, month, day) {
ISOdate(year, month, day)
}
toNumerics <- function(Date) {
stopifnot(inherits(Date, c("Date", "POSIXt")))
day <- as.numeric(strftime(Date, format = "%d"))
month <- as.numeric(strftime(Date, format = "%m"))
year <- as.numeric(strftime(Date, format = "%Y"))
list(year = year, month = month, day = day)
}
I forego the a single call to strptime() and subsequent splitting on a separation character because you don't like that kind of manipulation.
> toDate(2004, 12, 21)
[1] "2004-12-21 12:00:00 GMT"
> toNumerics(toDate(2004, 12, 21))
$year
[1] 2004
$month
[1] 12
$day
[1] 21
Internally R's datetime code works well and is well tested and robust if a bit complex in places because of timezone issues etc. I find the idiom used in toNumerics() more intuitive than having a date time as a list and remembering which elements are 0-based. Building on the functionality provided would seem easier than trying to avoid string conversions etc.
I'm a bit late to the party, but one other way to convert from integers to date is the lubridate::make_date function. See the example below from R for Data Science:
library(lubridate)
library(nycflights13)
library(tidyverse)
a <- flights %>%
mutate(date = make_date(year, month, day))
Found one solution for going from date to year,month,day.
Let's say we have a date object, that we'll create here using ISOdate:
somedate <- ISOdate(2004,12,21)
Then, we can get the numerical components of this as follows:
unclass(as.POSIXlt(somedate))
Gives:
$sec
[1] 0
$min
[1] 0
$hour
[1] 12
$mday
[1] 21
$mon
[1] 11
$year
[1] 104
Then one can get what one wants for example:
unclass(as.POSIXlt(somedate))$mon
Note that $year is [actual year] - 1900, month is 0-based, mday is 1-based (as per the POSIX standard)
The title has it: how do you convert a POSIX date to day-of-year?
An alternative is to format the "POSIXt" object using strftime():
R> today <- Sys.time()
R> today
[1] "2012-10-19 19:12:04 BST"
R> doy <- strftime(today, format = "%j")
R> doy
[1] "293"
R> as.numeric(doy)
[1] 293
which is preferable to remembering that the day of the years is zero-based in the POSIX standard.
As ?POSIXlt reveals, a $yday suffix to a POSIXlt date (or even a vector of such) will convert to day of year. Beware that POSIX counts Jan 1 as day 0, so you might want to add 1 to the result.
It took me embarrassingly long to find this, so I thought I'd ask and answer my own question.
Alternatively, the excellent lubridate package provides the yday function, which is just a wrapper for the above method. It conveniently defines similar functions for other units (month, year, hour, ...).
today <- Sys.time()
yday(today)
I realize it isn't quite what the poster was looking for, but I needed to convert POSIX date-times into a fractional day of the year for time series analysis and ended up doing this:
today <- Sys.time()
doy2015f<-difftime(today,as.POSIXct(as.Date("2015-01-01 00:00", tzone="GMT")),units='days')
The data.table package also provides a yday() function.
require(data.table)
today <- Sys.time()
yday(today)
This is the way how I do it:
as.POSIXlt(c("15.4", "10.5", "15.5", "10.6"), format = "%d.%m")$yday
# [1] 104 129 134 160