How to convert in both directions between year,month,day and dates in R? - r

How to convert between year,month,day and dates in R?
I know one can do this via strings, but I would prefer to avoid converting to strings, partly because maybe there is a performance hit?, and partly because I worry about regionalization issues, where some of the world uses "year-month-day" and some uses "year-day-month".
It looks like ISODate provides the direction year,month,day -> DateTime , although it does first converts the number to a string, so if there is a way that doesn't go via a string then I prefer.
I couldn't find anything that goes the other way, from datetimes to numerical values? I would prefer not needing to use strsplit or things like that.
Edit: just to be clear, what I have is, a data frame which looks like:
year month day hour somevalue
2004 1 1 1 1515353
2004 1 1 2 3513535
....
I want to be able to freely convert to this format:
time(hour units) somevalue
1 1515353
2 3513535
....
... and also be able to go back again.
Edit: to clear up some confusion on what 'time' (hour units) means, ultimately what I did was, and using information from How to find the difference between two dates in hours in R?:
forwards direction:
lh$time <- as.numeric( difftime(ISOdate(lh$year,lh$month,lh$day,lh$hour), ISOdate(2004,1,1,0), units="hours"))
lh$year <- NULL; lh$month <- NULL; lh$day <- NULL; lh$hour <- NULL
backwards direction:
... well, I didnt do backwards yet, but I imagine something like:
create difftime object out of lh$time (somehow...)
add ISOdate(2004,1,1,0) to difftime object
use one of the solution below to get the year,month,day, hour back
I suppose in the future, I could ask the exact problem I'm trying to solve, but I was trying to factorize my specific problem into generic reusable questions, but maybe that was a mistake?

Because there are so many ways in which a date can be passed in from files, databases etc and for the reason you mention of just being written in different orders or with different separators, representing the inputted date as a character string is a convenient and useful solution. R doesn't hold the actual dates as strings and you don't need to process them as strings to work with them.
Internally R is using the operating system to do these things in a standard way. You don't need to manipulate strings at all - just perhaps convert some things from character to their numerical equivalent. For example, it is quite easy to wrap up both operations (forwards and backwards) in simple functions you can deploy.
toDate <- function(year, month, day) {
ISOdate(year, month, day)
}
toNumerics <- function(Date) {
stopifnot(inherits(Date, c("Date", "POSIXt")))
day <- as.numeric(strftime(Date, format = "%d"))
month <- as.numeric(strftime(Date, format = "%m"))
year <- as.numeric(strftime(Date, format = "%Y"))
list(year = year, month = month, day = day)
}
I forego the a single call to strptime() and subsequent splitting on a separation character because you don't like that kind of manipulation.
> toDate(2004, 12, 21)
[1] "2004-12-21 12:00:00 GMT"
> toNumerics(toDate(2004, 12, 21))
$year
[1] 2004
$month
[1] 12
$day
[1] 21
Internally R's datetime code works well and is well tested and robust if a bit complex in places because of timezone issues etc. I find the idiom used in toNumerics() more intuitive than having a date time as a list and remembering which elements are 0-based. Building on the functionality provided would seem easier than trying to avoid string conversions etc.

I'm a bit late to the party, but one other way to convert from integers to date is the lubridate::make_date function. See the example below from R for Data Science:
library(lubridate)
library(nycflights13)
library(tidyverse)
a <- flights %>%
mutate(date = make_date(year, month, day))

Found one solution for going from date to year,month,day.
Let's say we have a date object, that we'll create here using ISOdate:
somedate <- ISOdate(2004,12,21)
Then, we can get the numerical components of this as follows:
unclass(as.POSIXlt(somedate))
Gives:
$sec
[1] 0
$min
[1] 0
$hour
[1] 12
$mday
[1] 21
$mon
[1] 11
$year
[1] 104
Then one can get what one wants for example:
unclass(as.POSIXlt(somedate))$mon
Note that $year is [actual year] - 1900, month is 0-based, mday is 1-based (as per the POSIX standard)

Related

R date variable operation

I have a string/character variable contains a calendar date, eg,
x <- "2018-10-31"
I also have a variable y contains time, say 200 days.
y <- 200
How do I find out the calendar date for x + y?
I am not familiar with date type in R and struggle with how to approach this.
An add-on question, would this calculation be different if y = 4.3 months? Of course I can convert this into days, though wonder if there is more direct way to handle months without converting.
You could utilise the lubridate package, which is specifically designed for handling date time data.
library(lubridate)
x <- ymd("2018-10-31")
x + days(200)
[1] "2019-05-19"
lubridate works with 'period' objects, which require integers, so you would need to convert "4.3" months into something interpretable beforehand. "4.3" doesn't mean anything concrete in terms of date-time calculation anyways.

single digit month format

I have a date variable that includes month and year where the month is a single digit (e.g. this month/year is '62017', but October is '102017'). I need the final format to be '6/1/2017'. I have tried using as.Date to convert but it will not work as %m requires two digits.
My workaround is to add a leading zero to dates that did not start with '102' (October), '112' (November), or '122' (December). I also have a few NA that I have to ignore. Code:
index <- substr(ll$Maturity.Date,1,3) != 102 & substr(ll$Maturity.Date,1,3) != 112 & substr(ll$Maturity.Date,1,3) != 122 & !is.na(ll$Maturity.Date)
ll$Maturity.Date[index] <- paste0(0,ll$Maturity.Date[index])
From here, I can convert to other formats as needed. However, I want to know if there is a better way to do this aside from hard coding as this code will break when using historical data in the 90's or data in the next century, both of which are future possibilities.
It is probably easiest to use sprintf to pad the 0s. Here is one solution:
sprintf("%06.0f", as.numeric(temp))
[1] "062017" "102017"
Then combine this with paste0 to add the day (1) and as.Date to get
as.Date(paste0(sprintf("%06.0f", as.numeric(temp)),"-1"), "%m%Y-%d")
[1] "2017-06-01" "2017-10-01"
data
temp <- c("62017", "102017")

Creating a specific sequence of date/times in R

I want to create a single column with a sequence of date/time increasing every hour for one year or one month (for example). I was using a code like this to generate this sequence:
start.date<-"2012-01-15"
start.time<-"00:00:00"
interval<-60 # 60 minutes
increment.mins<-interval*60
x<-paste(start.date,start.time)
for(i in 1:365){
print(strptime(x, "%Y-%m-%d %H:%M:%S")+i*increment.mins)
}
However, I am not sure how to specify the range of the sequence of dates and hours. Also, I have been having problems dealing with the first hour "00:00:00"? Not sure what is the best way to specify the length of the date/time sequence for a month, year, etc? Any suggestion will be appreciated.
I would strongly recommend you to use the POSIXct datatype. This way you can use seq without any problems and use those data however you want.
start <- as.POSIXct("2012-01-15")
interval <- 60
end <- start + as.difftime(1, units="days")
seq(from=start, by=interval*60, to=end)
Now you can do whatever you want with your vector of timestamps.
Try this. mondate is very clever about advancing by a month. For example, it will advance the last day of Jan to last day of Feb whereas other date/time classes tend to overshoot into Mar. chron does not use time zones so you can't get the time zone bugs that code as you can using POSIXct. Here x is from the question.
library(chron)
library(mondate)
start.time.num <- as.numeric(as.chron(x))
# +1 means one month. Use +12 if you want one year.
end.time.num <- as.numeric(as.chron(paste(mondate(x)+1, start.time)))
# 1/24 means one hour. Change as needed.
hours <- as.chron(seq(start.time.num, end.time.num, 1/24))

Converting time format to numeric with R

In most cases, we convert numeric time to POSIXct format using R. However, if we want to compare two time points, then we would prefer the numeric time format. For example, I have a date format like "2001-03-13 10:31:00",
begin <- "2001-03-13 10:31:00"
Using R, I want to covert this into a numeric (e.g., the Julian time), perhaps something like the passing seconds between 1970-01-01 00:00:00 and 2001-03-13 10:31:00.
Do you have any suggestions?
The Julian calendar began in 45 BC (709 AUC) as a reform of the Roman calendar by Julius Caesar. It was chosen after consultation with the astronomer Sosigenes of Alexandria and was probably designed to approximate the tropical year (known at least since Hipparchus). see http://en.wikipedia.org/wiki/Julian_calendar
If you just want to remove ":" , " ", and "-" from a character vector then this will suffice:
end <- gsub("[: -]", "" , begin, perl=TRUE)
#> end
#[1] "20010313103100"
You should read the section about 1/4 of the way down in ?regex about character classes. Since the "-" is special in that context as a range operator, it needs to be placed first or last.
After your edit then the answer is clearly what #joran wrote, except that you would need first to convert to a DateTime class:
as.numeric(as.POSIXct(begin))
#[1] 984497460
The other point to make is that comparison operators do work for Date and DateTime classed variables, so the conversion may not be necessary at all. This compares 'begin' to a time one second later and correctly reports that begin is earlier:
as.POSIXct(begin) < as.POSIXct(begin) +1
#[1] TRUE
Based on the revised question this should do what you want:
begin <- "2001-03-13 10:31:00"
as.numeric(as.POSIXct(begin))
The result is a unix timestamp, the number of seconds since epoch, assuming the timestamp is in the local time zone.
Maybe this could also work:
library(lubridate)
...
df <- '24:00:00'
as.numeric(hms(df))
hms() will convert your data from one time format into another, this will let you convert it into seconds. See full documentation.
I tried this because i had trouble with data which was in that format but over 24 hours.
The example from ?as.POSIX help gives
as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S"))
so for you it would be
as.numeric(as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S")))

Can data.table functions manipulate date and time columns on the fly?

I have started using data.table. Indeed it is very fast and quite nice syntax. I am having trouble with dates. I like to use lubridate. In many of my data sets I have dates or dates and times and have used lubridate to manipulate them. Lubridate stores the instant as a POSIX class. I have seen answers here that create new variables for instance just to get the year eg. 2005. I do not like that. There are times that I will be analyzing by year and other times by quarter and other times by month and other times by durations. I would like to do something simple such as this
mydatatable[,length(medical.record.number),by=year(date.of.service)]
that should give me the number of patient encounters in a given year. The by function is not working.
Error in names(byval) = as.character(bysuborig) :
'names' attribute [2] must be the same length as the vector [1]
Can you please point me to vignettes where data.tables is used with dates and where manipulations and categorizations of those dates are done on the fly.
This uses one of the examples in the help(IDateTime) page. It shows that you canc hange to syntax for the by=argument to a character value in the form " = " or (after #Matthew Dowle's comment below) you can try to use the functional form that you were using (although I have not been able to get it to work myself. I did get the preferred form: by=list(wday=wday(idate)) to work.) Note that the key creation assumes an IDateTime class since there is no idate or itime variable. Those are attributes of the class
datetime <- seq(as.POSIXct("2001-01-01"), as.POSIXct("2001-01-03"), by = "5 hour")
(af <- data.table(IDateTime(datetime), a = rep(1:2, 5), key = "a,idate,itime"))
af[, length(a), by = "wday = wday(idate)"]
wday V1
[1,] 2 4
[2,] 3 5
[3,] 4 1

Resources