Comparing POSIXct dates in R: best practice? - r

I am seeking some insight into the code behavior below. Make a new POSIXct variable by generating the format string from another variable with a specific time zone and using that time zone to create the new POSIXct.
eventTime1Converted <- as.POSIXct(format(eventTime1, tz = "GMT", usetz = TRUE), tz = "GMT")
Strangely, these two variables don't end up being equal:
> eventTime1Converted == eventTime1
[1] FALSE
In particular my 'GMT' timestamped variable seems to be 'less than' my 'EST' timestamped variable (the original variable). So it seems like when the numeric portion is equal, the timezones are then compared? If this is the case, what's the 'correct' way to check equality on two POSIXct variables? To compare their as.numeric values?

Related

Creating multiple POSIXlt dates with multiple timezones in R

Well, first things first, I'm still a noob and am learning R. I've a a dataset with 0.9 million rows and 36 columns. Of these columns, a column, let's say DATE has dates in string format and an other column, let's say TZ has timezones as strings too.
What I'm trying to do is contract these two columns into one with type POSIXlt date, which has date, time, timezone. Here's my code for trying to get a vector of all the converted dates:
# Let's suppose my data exist in a variable "data" with dates in "DATE" column and timezones in "TZ"
indices <- NULL
dates <- NULL
zones <- unique (data$TZ)
for(i in seq_along(zones)){
indices <<- which(data$TZ==zones[i])
dates <<- c(dates, as.POSIXlt(data$DATE[indices], format = "%m/%d/%Y %H:%M:%S", tz = zones[i]))
}
Now, although there are ~1 million observations, it seems to do the job in 3-4 seconds. Only, that it "seems" to. The result I get is a list with NAs.
It does work when I try to convert a group individually, i.e., store result for every iteration in a different variable, or not run a for loop and do each iteration manually, storing each result in a different variable and, in the end, concatenate it all using c() function.
What am I doing wrong?
For anyone who might stumble here, I figured it.
You can't use c() on a POSIXlt object as it'll convert it into local timezone. (Not the reason for NAs but it's helpful.)
POSIXlt is stored as a list of different variables like mday, zone etc, due to which it's value cannot be used in a data frame element. Instead of POSIXlt, we can use POSIXct as that's internally represented as seconds from 1970-01-01.
Since we'll be replacing a data frame column with dates it's easier to do so with converting it into a tibble using dplyr::as_tibble() and then use dplyr::rbind() to combine the different results.
The reason of NAs being introduced is the lexical scoping in R. I used dates <<- c(dates, as.POSIXlt(data$DATE[indices], format = "%m/%d/%Y %H:%M:%S", tz = zones[i])) due to which, the value of i in zones[i] was NA or unknown.
So, the correct working code is -
dates <- NULL
for (i in seq_along(zones)) {
indices <- which(data$TZ==zones[i])
dts <- as.POSIXct(data$BGN_DATE[indices], format = "%m/%d/%Y %H%M", tz = zones[i])
dates <<- rbind(dates,as_tibble(dts))
}
#Further, to combine the dates into data frame
data <- arrange(data, TZ) %>% mutate(DATEandTime = dates$value) %>% select(-c("DATE","TZ"))

Formatting dates using a condition

I am using "R" to format a character variable that has two different kinds of date formats (MM-DD-YYYY & YYYY-MM-DD). The second is an excel origin date.
DateVar <- c("12-07-2017", "43229", "43137", "03-27-2018")
I created vector using grepl to identify both types and then a for loop to apply the as.date function to only the "excel origin dates".
indicator <- !grepl("-", DateVar)
for(i in indicator == TRUE){
as.date(DateVar, origin = "1899-12-30")
It is not working for me however, so I am looking if someone can point me in the right direction.
Thanks.
Couple of things: The for loop is unnecessary - just subset DateVar with [indicator]. Second, it's as.Date, not as.date (note the "D"). Third, since it's a character vector, you need to pass the origin numbers through as.integer for as.Date to be able to work with them:
as.Date(as.integer(DateVar[indicator]), origin = "1899-12-30")
(or, without the intervening indicator assignment:
as.Date(as.integer(DateVar[!grepl("-",DateVar)]), origin = "1899-12-30")
[1] "2018-05-09" "2018-02-06"
If you wish to input these dates back into DateVar, you again use the subset function:
DateVar[indicator]<-format(as.Date(as.integer(DateVar[indicator]), origin = "1899-12-30"), "%m-%d-%Y")

How to convert integer to date format in R?

I am trying to convert integer data from my data frame in R, to date format.
The data is under column named svcg_cycle within orig_svcg_filtered data frame.
The original data looking something like 200502, 200503, and so forth, and I expect to turn it into yyyy-mm-dd format.
I am trying to use this code:
as.Date(orig_svcg_filtered$svcg_cycle, origin = "2000-01-01")
but the output is not something that I expected:
[1] "2548-12-15" "2548-12-15" "2548-12-15" "2548-12-15" "2548-12-15"
while it is supposed to be 2005-02-01, 2005-03-01, and so forth.
How to solve this?
If you have
x <- c(200502, 200503)
Then
as.Date(x, origin = "2000-01-01")
tells R you want the days 200,502 and 200,503 days after 2000-01-01. From help("as.Date"):
as.Date will accept numeric data (the number of days since an epoch),
but only if origin is supplied.
So, integer data gives days after the origin supplied, not some sort of numeric code for the dates like 200502 for "2005-02-01".
What you want is
as.Date(paste(substr(x, 1, 4), substr(x, 5, 6), "01", sep = "-"))
# [1] "2005-02-01" "2005-03-01"
The
paste(substr(x, 1, 4), substr(x, 5, 6), "01", sep = "-")
part takes your integers and creates strings like
# [1] "2005-02-01" "2005-03-01"
Then as.Date() knows how to deal with them.
You could alternatively do something like
as.Date(paste0(x, "01"), format = "%Y%m%d")
# [1] "2005-02-01" "2005-03-01"
This just pastes on an "01" to each element (for the day), converts to character, and tells as.Date() what format to read the date into. (See help("as.Date") and help("strptime")).
I like to use Regex to fix these kinds of string formatting issues. as.Date by default only checks for several standard date formats like YYYY-MM-DD. origin is used when you have an integer date (i.e. seconds from some reference point), but in this case your date is actually not an integer date, rather it's just a date formatted as a string of integers.
We simply split the month and day with a dash, and add a day, in this case the first of the month, to make it a valid date (you must have a day to store it as a date object in R). The Regex bit captures the first 4 digits in group one and final two digits in group two. We then combine the two groups, separated by dashes, along with the day.
as.Date(gsub("^(\\d{4})(\\d{2})", "\\1-\\2-01", x))
[1] "2005-02-01" "2005-03-01"
You don't need to specify format in this case, because YYYY-MM-DD is one of the standard formats as.Date checks, however, the format argument is format = "%Y-%m-%d"

R apply function returns numeric value on date variables

I have a R dataframe which have sequence of dates. I want to create a dataframe from the existing one which consists of one month prior dates.
For example let x be the initial dataframe
x = data.frame(dt = c("28/02/2000","29/02/2000","1/03/2000","02/03/2000"))
My required dataframe y would be
y = c("28/01/2000","29/01/2000","1/02/2000","02/02/2000")
The list is quite big so I don't want looping. I have created a inline function which works fine when I give individual dates.
datefun <- function(x) seq(as.Date(strptime(x,format = "%d/%m/%Y")), length =2, by = "-1 month")[2]
datefun("28/02/2000") gives "28/01/2000" as an output
But while I use it inside R apply it gives random numerical values.
apply(x,1,function(x) datefun(x))
The output for this is
[1] 10984 10985 10988 10989
I don't know from where these numbers are getting generated, am I missing something.
You should not use apply since the result will be returned as a matrix. Matrices in R cannot store values of class Date. You have to use lapply instead. This returns a list of results. These results can be combined with Reduce and c to create a Date vector.
Reduce(c, lapply(x$dt, datefun))
# [1] "2000-01-28" "2000-01-29" "2000-02-01" "2000-02-02"
I believe that R internally is storing your dates as time elapsed since the UNIX epoch, which is January 1, 1970. You can easily view your updated dates as readable strings using as.Date with an apporpriate origin, e.g.
y <- apply(x,1,function(x) datefun(x))
as.Date(y, origin='1970-01-01')
[1] "2000-01-28" "2000-01-29" "2000-02-01" "2000-02-02"
The gist here is that the numerical output you saw perhaps misled you into thinking that your date information were somehow lost. To the contrary, the dates are stored in a numerical format, and it is up to you to tell R how you want to view that information as dates.
Demo
You could also skip your function with lubridate:
require(lubridate)
format(dmy(x$dt) %m+% months(-1),"%d/%m/%Y")

How to avoid date formatted values getting converted to numeric when assigned to a matrix or data frame?

I have run into an issue I do not understand, and I have not been able to find an answer to this issue on this website (I keep running into answers about how to convert dates to numeric or vice versa, but that is exactly what I do not want to know).
The issue is that R converts values that are formatted as a date (for instance "20-09-1992") to numeric values when you assign them to a matrix or data frame.
For example, we have "20-09-1992" with a date format, we have checked this using class().
as.Date("20-09-1992", format = "%d-%m-%Y")
class(as.Date("20-09-1992", format = "%d-%m-%Y"))
We now assign this value to a matrix, imaginatively called Matrix:
Matrix <- matrix(NA,1,1)
Matrix[1,1] <- as.Date("20-09-1992", format = "%d-%m-%Y")
Matrix[1,1]
class(Matrix[1,1])
Suddenly the previously date formatted "20-09-1992" has become a numeric with the value 8298. I don't want a numeric with the value 8298, I want a date that looks like "20-09-1992" in date format.
So I was wondering whether this is simply how R works, and we are not allowed to assign dates to matrices and data frames (somehow I have managed to have dates in other matrices/data frames, but it beats me why those other times were different)? Is there a special method to assigning dates to data frames and matrices that I have missed and have failed to deduce from previous (somehow successful) attempts at assigning dates to data frames/matrices?
I don't think you can store dates in a matrix. Use a data frame or data table. If you must store dates in a matrix, you can use a matrix of lists.
Matrix <- matrix(NA,1,1)
Matrix[1,1] <- as.list(as.Date("20-09-1992", format = "%d-%m-%Y"),1)
Matrix
[[1]]
[1] "1992-09-20"
Edited: I also just re-read you had this issue with data frame. I'm not sure why.
mydate<-as.Date("20-09-1992", format = "%d-%m-%Y")
mydf<-data.frame(mydate)
mydf
mydate
1 1992-09-20
Edited: This has been a learning experience for me with R and dates. Apparently the date you supplied was converted to number of days since origin. Origin is defined as Jan 1st,1970. To convert this back to a date format at some point
Matrix
[,1]
[1,] 8298
as.Date(Matrix, origin ="1970-01-01")
[1] "1992-09-20"
try the following: First specify your date vector & then use
rownames(mat) <- as.character(date_vector)
the dates will appear as a text.
This happens mostly when we are loading Excel Workbook
You need to add detectDates = TRUE in the function
DataFrame <- read.xlsx("File_Nmae", sheet = 3, detectDates = TRUE)

Resources