SAS to R datetime conversion - r

SAS documentation states the following for data and datetime values:
SAS time value: is a value representing the number of seconds since midnight of the current day. SAS time values are between 0 and 86400.
SAS datetime value: is a value representing the number of seconds between January 1, 1960 and an hour/minute/second within a specified date.
I'm willing to convert the following date and hour values with R, I have a big doubt for the hour (datetime) conversion, which one of the "HH:MM:SS" values within R_hour1 and R_hour2 is correct ?
I have to separate columns, SAS date = 20562 and SAS hour = 143659, in my table
R: R_date <- as.Date(as.integer(20562), origin="1960-01-01"); R_date
[1] "2016-04-18"
R: R_hour1 <- as.POSIXct(143659, origin = R_date); R_hour1
[1] "2016-04-19 17:54:19 CEST"
R: R_hour2 <- as.POSIXct(143659, origin = "1960-01-01"); R_hour2
[1] "1960-01-02 16:54:19 CET"

Similar to R, SAS Date and DateTime values can have whatever origin you wish them to. The default formats have a default (1/1/1960 for both), but you can use the datetime field to mean any origin you wish, and it will generally still work perfectly well with any of the datetime functions (though it will not display properly unless you write a custom format). It is very possible to have a different origin, as you show above with R_hour1.
As such, you would have to ask the person who generated the data what the meaning of the field is and what its origin should be.

Related

How do I stop implicit date conversion when using ifelse with date time data? [duplicate]

This question already has answers here:
How to prevent ifelse() from turning Date objects into numeric objects
(7 answers)
Closed 4 years ago.
I have a data frame that contains one column that is a series of dates, collected via a Google form. The date and time were collected separately. The data was entered by selecting a day from a calendar, and the date was entered manually - should have been a 24-hour clock, but the field appears to have just checked that the hour and minute were in the correct range.
I've read the file in from .csv . I converted the date time character field (as read in from the .csv) to a date time format in a new variable by using as.POSIXct(foo$When, tz="NZ", format="%Y-%m-%d %H:%M"). The dates and times were correctly constructed.
Except: I have some incorrect date/time entries in the original data. These have all been set to NA in the new field, as you expect. For those that do include a time, I have been trying to fix them while still retaining a POSIXct format.
I have been unsuccessful.
Here is an example of the data I have, and what I have tried to do:
TestDataForHelp <- data.frame(OldDateTime =
c("2013-12-04 21:10", "2013-12-15 09:07", "2014-01-01 06:27",
"2014-11-02 21:15", "2014-11-07 23:00", "2015-01-04 21:42",
"201508-11-02 20:15", "201508-11-02 20:15", "2017-11-02"))
TestDataForHelp$ActualDateTime <-
as.POSIXct(TestDataForHelp$OldDateTime, tz="NZ", format="%Y-%m-%d %H:%M")
TestDataForHelp$FixedDateTime <-
ifelse(TestDataForHelp$OldDateTime=="201508-11-02 20:15",
as.POSIXct("2015-11-02 20:15", tz="NZ", format="%Y-%m-%d %H:%M"),
TestDataForHelp$ActualDateTime)
The new variable, FixedDateTime, does not have a POSIXct type. It has been implicitly converted to a numeric type. How can I retain the POSIXct format from ActualDateTime and not have the implicit type conversion?
I would like to not have FixedDateTime but, rather, put the corrected data into ActualDateTime. The ifelse() seems to be the part of the code causing the format to shift from POSIXct to numeric. If I do:
TestDataForHelp$CopiedDateTime <- TestDataForHelp$ActualDateTime
The new variable, that is simply a copy of the original, retains the POSIXct type.
The previous question linked in the comments relates to date values only, not date time values. The data manipulation becomes more complicated with dealing with date time values, given that mine also do not include seconds. The other difference is that the original variable contains a mix of date, date-time, and incorrect date-time values, whereas that previous question had values that were all the same. It was unclear whether the non-uniform content of the variable was causing the problem.
Edit: I fixed the problem by fixing the strings before I converted them to dates. This removed the need to try to loop through the dates.
I can replicate the numeric answer, but not explain it. It is however calculating the results correctly for you. I'm not sure why it's returning as a numeric. However, the conversion from numeric to date is easy enough if you know the origin, which should be 1970-01-01. So I believe the following does the trick:
(Note, the first block is just what you already have)
TestDataForHelp$FixedDateTime <- ifelse(TestDataForHelp$OldDateTime=="201508-11-02 20:15",
as.POSIXct("2015-11-02 20:15", tz="NZ", format="%Y-%m-%d %H:%M"),
TestDataForHelp$ActualDateTime)
TestDataForHelp$FixedDateTime <- as.POSIXct(TestDataForHelp$FixedDateTime,
origin = as.POSIXct("1970-01-01", tz="NZ"))

R - Numeric to Date gives wrong value

I have a data frame DP with a column variable in numeric format which is a numeric representation of Date.
Example: 43282 corresponds to 7/1/2018 (try in excel).
But in R when I call as.Date() to convert it to date, I get the wrong date
DP$Time <- as.Date(DP$variable)
variable Time
1 43282 2088-07-02
What am I doing wrong here?
If it is based on excel, then change the origin from default 1970-01-01 to 1899-12-30
as.Date(43282, origin = '1899-12-30')
#[1] "2018-07-01"
Excel's origin of date is "1899-12-30" and R's origin of date is "1970-01-01". Since they have different origins for the date, while importing the data from Excel, you are getting a different date in R.
Specify the right origin and it will print the right values:
DP$Time <- as.Date(DP$variable, origin = "1899-12-30")

How to convert ordinal date day-month-year format using R

I have log files where the date is mentioned in the ordinal date format.
wikipedia page for ordinal date
i.e 14273 implies 273'rd day of 2014 so 14273 is 30-Sep-2014.
is there a function in R to convert ordinal date (14273) to (30-Sep-2014).
Tried the date package but didn come across a function that would do this.
Try as.Date with the indicated format:
as.Date(sprintf("%05d", 14273), format = "%y%j")
## [1] "2014-09-30"
Notes
For more information see ?strptime [link]
The 273 part is sometimes referred to as the day of the year (as opposed to the day of the month) or the day number or the julian day relative to the beginning of the year.
If the input were a character string of the form yyjjj (rather than numeric) then as.Date(x, format = "%y%j") will do.
Update Have updated to also handle years with one digit as per comments.
Data example
x<-as.character(c("14273", "09001", "07031", "01033"))
Data conversion
x1<-substr(x, start=0, stop=2)
x2<-substr(x, start=3, stop=5)
x3<-format(strptime(x2, format="%j"), format="%m-%d")
date<-as.Date(paste(x3, x1, sep="-"), format="%m-%d-%y")
You can use lubridate package as follows:
>library(lubridate)
# Create a template date object
>date <- as.POSIXlt("2009-02-10")
# Update the date using
> update(date, year=2014, yday=273)
[1] "2014-09-30 JST"

How to determine the correct argument for origin in as.Date, R

I have a data set in R that contains a column of dates in the format yyyy/mm/dd. I am trying to use as.Date to convert these dates to date objects in R. However, I cannot seem to find the correct argument for origin to input into as.Date. The following code is an example of what I have been trying. I am using a CSV file from Excel, so I used origin="1899/12/30 based on other sites I have looked at.
> as.Date(2001/04/26, origin="1899/12/30")
[1] "1900-01-18"
However, this is not working since the input date 2001/04/26 is returned as "1900-01-18". I need to convert the dates into date objects so I can then convert the dates into julian dates.
You can either is as.Date with a numeric value, or with a character value. When you type just 2001/04/26 into R, that's doing division and getting 19.24 (a numeric value). And numeric values require an origin and the number you supply is the offset from that origin. So you're getting 19 days away from your origin, ie "1900-01-18". A date like Apr 26 2001 would be
as.Date(40659, origin="1899-12-30")
# [1] "2011-04-26"
If your dates from Excel "look like" dates chances are they are character values (or factors). To convert a character value to a Date with as.Date() you want so specify a format. Here
as.Date("2001/04/26", format="%Y/%m/%d")
# [1] "2001-04-26"
see ?strptime for details on the special % variables. Now if you're read your data into a data.frame with read.table or something, there's a chance your variable may be a factor. If that's the case, you'll want do convert to character with'
as.Date(as.character(mydf$datecol), format="%Y/%m/%d")

convert string to time in r

I have an array of time strings, for example 115521.45 which corresponds to 11:55:21.45 in terms of an actual clock.
I have another array of time strings in the standard format (HH:MM:SS.0) and I need to compare the two.
I can't find any way to convert the original time format into something useable.
I've tried using strptime but all it does is add a date (the wrong date) and get rid of time decimal places. I don't care about the date and I need the decimal places:
for example
t <- strptime(105748.35, '%H%M%OS') = ... 10:57:48
using %OSn (n = 1,2 etc) gives NA.
Alternatively, is there a way to convert a time such as 10:57:48 to 105748?
Set the options to allow digits in seconds, and then add the date you wish before converting (so that the start date is meaningful).
options(digits.secs=3)
strptime(paste0('2013-01-01 ',105748.35), '%Y-%M-%d %H%M%OS')

Resources