I have the following code:
as.POSIXct(c('03/08/2015 03:08:18 AM','03/09/2014 02:01:05 AM'),
format="%m/%d/%Y %l:%M:%S %p")
[1] "2015-03-08 03:08:18 EDT" NA
Why is the 2nd time returning NA when converted?
I see you're working in the EDT (Eastern Daylight Time) time zone
On 09th March 2014 the clocks went forward one hour at 02:00:00. Therefore, the time of 02:01:05 doesn't actually exist.
First you should check the source of the data; should you actually be working in EDT?
Most likely not, so you'll want to set the tz argument to the actual timezone.
For example
as.POSIXct(
c('03/08/2015 03:08:18 AM','03/09/2014 02:01:05 AM')
, format="%m/%d/%Y %l:%M:%S %p"
, tz = "EST" ## change this to the actual timezone you need.
)
#"2015-03-08 03:08:18 EST" "2014-03-09 02:01:05 EST"
Related
I'm converting a string vector to date format with as.POSIXct().
Here is the strange thing:
as.POSIXct("2017-03-26 03:00:00.000",format="%Y-%m-%d %H")
#Gives
"2017-03-26 03:00:00 CEST"
#While
as.POSIXct("2017-03-26 02:00:00.000",format="%Y-%m-%d %H")
#Outputs
NA
This is really confusing and frustrating. It seem like the function really doesn't like the specific time:
02:00:00.000
We can specify the %T for time. In the format, there are minutes, seconds and millseconds. So, the %H is only matching the hour part
as.POSIXct("2017-03-26 02:00:00.000",format="%Y-%m-%d %T")
[1] "2017-03-26 02:00:00 EDT"
Or to take care of the milliseconds as well
as.POSIXct("2017-03-26 02:00:00.000",format="%Y-%m-%d %H:%M:%OS")
#[1] "2017-03-26 02:00:00 EDT"
Or using lubridate
library(lubridate)
ymd_hms("2017-03-26 02:00:00.000")
This was a daylight savings issue, the time:
"2017-03-26 02:00:00.000" does not exist in Sweden as we lost an hour this date when changing to "summer time".
I have a data set containing the following date, along with several others
03/12/2017 02:17:13
I want to put the whole data set into a data table, so I used read_csv and as.data.table to create DT which contained the date/time information in date.
Next I used
DT[, date := as.POSIXct(date, format = "%m/%d/%Y %H:%M:%S")]
Everything looked fine except I had some NA values where the original data had dates. The following expression returns an NA
as.POSIXct("03/12/2017 02:17:13", format = "%m/%d/%Y %H:%M:%S")
The question is why and how to fix.
Just use functions anytime() or utctime() from package anytime
R> library(anytime)
R> anytime("03/12/2017 02:17:13")
[1] "2017-03-12 01:17:13 CST"
R>
or
R> utctime("03/12/2017 02:17:13")
[1] "2017-03-11 20:17:13 CST"
R>
The real crux is that time did not exists in North America due to DST. You could parse it as UTC as UTC does not observer daylight savings:
R> utctime("03/12/2017 02:17:13", tz="UTC")
[1] "2017-03-12 02:17:13 UTC"
R>
You can express that UTC time as Mountain time, but it gets you the previous day:
R> utctime("03/12/2017 02:17:13", tz="America/Denver")
[1] "2017-03-11 19:17:13 MST"
R>
Ultimately, you (as the analyst) have to provide as to what was measured. UTC would make sense, the others may need adjustment.
My solution is below but ways to improve appreciated.
The explanation for the NA is that in the mountain time zone in the US, that date and time is in the window of the switch to daylight savings where the time doesn't exist, hence NA. While the time zone is not explicitly specified, I guess R must be picking it up from the computer's time, which is in "America/Denver"
The solution is to explicitly state the date/time string is in UTC and then convert back as follows:
time.utc <- as.POSIXct("03/12/2017 02:17:13", format = "%m/%d/%Y %H:%M:%S", tz = "UTC")
> time.utc
[1] "2017-03-12 02:17:13 UTC"
>
Next, add 6 hours to the UTC time which is the difference between UTC and MST
time.utc2 <- time.utc + 6 * 60 * 60
> time.utc2
[1] "2017-03-12 08:17:13 UTC"
>
Now convert to America/Denver time using daylight savings.
time.mdt <- format(time.utc2, usetz = TRUE, tz = "America/Denver")
> time.mdt
[1] "2017-03-12 01:17:13 MST"
>
Note that this is in standard time, because daylight savings doesn't start until 2 am.
If you change the original string from 2 am to 3 am, you get the following
> time.mdt
[1] "2017-03-12 03:17:13 MDT"
>
The hour between 2 and 3 is lost in the change from standard to daylight savings but the data are now correct.
the time my data are in EST time zone, and I try to use this time zone.
I want to count the week (in local time, not GMT), so I manually define an originTime in EDT
originTime = as.POSIXlt('2000-01-02 00:00:00 EDT')
dt2 = data.frame(time=c(as.POSIXlt('2000-01-09 00:00:05 EDT')))
dt2$week = as.integer( floor( ( as.numeric(dt2$time) - as.numeric(originTime) ) /(3600*24*7) ) )
dt2$wday = weekdays(dt2$time)
This works.
Now I want to find out, what's one week after a given time?
> as.POSIXlt( 1 * 3600*24*7 , origin = originTime)
[1] "2000-01-08 19:00:00 EST"
Here's the problem, R seems to think originTime is in GMT. Can somebody help? Thanks
Two serious problems. EDT does not really exist, and even if it did it would not be appropriate for a January date. The (US) Eastern timezone is "EST5EDT" to make it distinct from the Ozzie EST. (Furthermore these may be different on different OSes.) Safest would be tz="America/New_York". For data entry, you need to use the 'tz' parameter AND change the default (FALSE) setting of 'usetz':
(originTime = as.POSIXlt('2000-01-02 00:00:00', format="%Y-%m-%d %H:%M:%S",
tz="EST5EDT", usetz=TRUE) )
[1] "2000-01-02 EST"
A test using "%Z" which is only for output:
> format( as.POSIXlt('2000-01-02 00:00:00', format="%Y-%m-%d %H:%M:%S",
tz="America/New_York", usetz=TRUE) ,
format="%Y-%m-%d %H:%M:%S %Z")
[1] "2000-01-02 00:00:00 EST"
I've never used the origin argument in as.POSIXlt so cannot really explain why it fails to deliver the expected result I've always used +.POSIXt or seq.POSIXt to construct intervals:
format(as.POSIXlt('2000-01-02 00:00:00', tz="America/New_York", usetz=TRUE)+ 1*3600*24*7,
format="%Y-%m-%d %H:%M:%S %Z") # %Z is only used for output
# [1] "2000-01-09 00:00:00 EST" A week later at midnight.
I think the reason one needs to force a time printing with a format argument is that midnight-times are shortened to just printing the date with no further H:M:S information.
For output, the specification is %Z (see ?strptime). But for input, how does that work?
To clarify, it'd be great for the time zone abbreviation to be parsed into useful information by as.POSIXct(), but more core to be question is how to get the function to at least ignore the time zone.
Here is my best workaround, but is there a particular format code to pass to as.POSIXct() that will work for all time zones?
times <- c("Fri Jul 03 00:15:00 EDT 2015", "Fri Jul 03 00:15:00 GMT 2015")
as.POSIXct(times, format="%a %b %d %H:%M:%S %Z %Y") # nope! strptime can't handle %Z in input
formats <- paste("%a %b %d %H:%M:%S", gsub(".+ ([A-Z]{3}) [0-9]{4}$", "\\1", times),"%Y")
as.POSIXct(times, format=formats) # works
Edit: Here is the output from the last line, as well as its class (from a separate call); the output is as expected. From the console:
> as.POSIXct(times, format=formats)
[1] "2015-07-03 00:15:00 EDT" "2015-07-03 00:15:00 EDT"
> attributes(as.POSIXct(times, format=formats))
$class
[1] "POSIXct" "POSIXt"
$tzone
[1] ""
The short answer is, "no, you can't." Those are abbreviations and they are not guaranteed to uniquely identify a specific timezone.
For example, is "EST" Eastern Standard Time in the US or Australia? Is "CST" Central Standard Time in the US or Australia, or is it China Standard Time, or is it Cuba Standard Time?
I just noticed that you're not trying to parse the timezone abbreviation, you are simply trying to avoid it. I don't know of a way to tell strptime to ignore arbitrary characters. I do know that it will ignore anything in the character representation of the time after the end of the format string. For example:
R> # The year is not parsed, so the current year is used
R> as.POSIXct(times, format="%a %b %d %H:%M:%S")
[1] "2015-07-03 00:15:00 UTC" "2015-07-03 00:15:00 UTC"
Other than that, a regular expression is the only thing I can think of that solves this problem. Unlike your example, I would use the regex on the input character vector to remove all 3-5 character timezone abbreviations.
R> times_no_tz <- gsub(" [[:upper:]]{3,5} ", " ", times)
R> as.POSIXct(times_no_tz, format="%a %b %d %H:%M:%S %Y")
[1] "2015-07-03 00:15:00 UTC" "2015-07-03 00:15:00 UTC"
How can I convert local DateTime in the following format "12/31/2014 6:42:52 PM" to UTC in R? I tried this
as.POSIXct(as.Date("12/31/2014 6:42:52 PM", format="%m/%d/%Y %H:%M:%S"),tz="UTC")
but it doesn't seem to be valid.
If you want to shift a datetime from your current timezone to UTC, you need to
import in your local timezone, then just shift the display timezone to "UTC". e.g.: in Australian EST I am UTC+10.
out <- as.POSIXct("12/31/2014 6:42:52 PM", format="%m/%d/%Y %H:%M:%S")
out
#"2014-12-31 06:42:52 EST"
#(Australian Eastern Standard Time)
as.numeric(out)
#[1] 1419972172
Now shift the timezone for display purposes:
attr(out, "tzone") <- "UTC"
out
#[1] "2014-12-30 20:42:52 UTC"
# display goes 10 hours backwards as I'm UTC+10
as.numeric(out)
#[1] 1419972172
Note that this doesn't affect the underlying numeric data (seconds since 1970-01-01), it only changes what is displayed.