I'm getting inconsistent results when trying to subset data based on a date being before or after some POSIXct date and time. When I make a string of dates like this:
myDates <- c(as.POSIXct("2014-12-27 08:10:00 UTC"),
as.POSIXct("2014-12-27 08:15:00 UTC"),
as.POSIXct("2014-12-27 09:30:00 UTC"))
and then try to subset to find all the entries in myDates that were before 8:15 a.m. on Dec. 27, 2014 like this:
myDates[myDates < as.POSIXct("2014-12-27 08:15:00")]
that works fine and I get
"2014-12-27 08:10:00 PST"
(although I don't understand why it says "PST" for the time zone; that's where I am, but I set it to UTC).
However, my original date and time data were in Excel, where they were in numeric format. I imported them as a data.frame called Samples and converted the date and time column into POSIXct format by doing:
as.POSIXct(Samples$DateTime, origin = "1970-01-01", tz = "UTC")
Now, I'm having hair-pulling, head-onto-desk-bashing frustrations with subsetting those dates. Take one date in particular, x <- Samples$DateTime[34], which, according to the output R gives me, is "2014-12-27 08:10:00 UTC". If I check whether x < 2014-12-27 08:15, that should be true, and here's what I see:
x < as.POSIXct("2014-12-27 08:15:00 UTC")
TRUE
But x should NOT be less 2014-12-27 8:09:00 UTC, right? This is what I see:
X < as.POSIXct("2014-12-27 08:09:00 UTC")
TRUE
Why, for the love of Pete, does R tell me that 8:10 is before 8:09?!? This doesn't seem to be a problem for data that I just type in like above, only for data I've imported from Excel.
You probably need to get everything in the same timezone first. Try
as.numeric(as.POSIXct("2014-12-27 08:10:00 UTC", tz="UTC"))
#[1] 1419667800
# equivalent to "2014-12-27 08:10:00 UTC"
vs.
as.numeric(as.POSIXct("2014-12-27 08:10:00 UTC"))
#[1] 1419631800
# equivalent to 8:10 in local timezone - in my case Aust. EST.
# "2014-12-27 08:10:00 AEST"
You can see that they are actually numerically different.
To fix this, specify the tz= explicitly when importing as the "UTC" in your text strings will not be detected on input.
Also, be really careful with variable names. Likely you just slipped in typing it in here, but in the description of the problem and the first logical comparison you used x and in the second one you used X.
R is case sensitive, so it would not compare your date to the one stored in x. If anything else was stored in memory with X it may actually be that you were given the right answer for the question you asked.
Related
I have dates and times stored in two columns. The first has the date as "20180831." The time is stored as the number of seconds from midnight; 3am would be stored as 10,800.
I need a combined date time column and am having a hard time with something that should be simple.
I can get the dates in no problem but lubridate "hms" interprets the time field as a period, not a 'time' per se.
I tried converting the date to posix.ct format and then using that as the origin for the time field but posix.ct does not set the time for midnight, instead it sets it for either 1800 or 1900 hours depending on the date. I need it set to midnight for all rows, I don't want any daylight savings time adjustment.
Here's the code:
First I made a function because there are several date and time fields I have to do this for.
mkdate<-function(x){
a<-as.Date(as.character(x),format='%Y%m%d')
a<-as.POSIXct(a)
return(a)
}
df$date<-mkdate(df$date) #applies date making function to date field
df$datetime<-as.POSIXct(df$time,origin=df$date)
I'm sure this has to do with time zones. I'm in Central time zone and I have experimented with adding the "tz" specification into these commands in both the mkdate function and in the time code creating "datetime" column.
I've tried:
tz="America/Chicago"
tz="CST"
tz="UTC"
Help would be much appreciated!
Edited with example:
x<-c(20180831,20180710,20160511,20170105,20180101) #these are dates.
as.POSIXct(as.Date(as.character(x),format="%Y%m%d"))
Above code converts dates to seconds from the Jan 1 1970. I could convert this to numeric and add my 'seconds' value to this field BUT it is not correct. This is what I see instead as the output:
[1] "2018-08-30 19:00:00 CDT" "2018-07-09 19:00:00 CDT" "2016-05-10 19:00:00 CDT" "2017-01-04 18:00:00 CST" "2017-12-31 18:00:00 CST"
Look at the first date - it should be 8/31 but instead it is 8/30. Somewhere in there there is a timezone adjustment taking place. It's moving the clock back 5 or 6 hours because I am on central time. The first entry should be 2018-08-31 00:00:00. I would then convert it to numeric and add the seconds field on and convert back to POSIXct format. I've tried including tz specification all over the place with no luck.
Sys.getlocale("LC_TIME")
returns "English_United States.1252"
I believe the following does what you want.
My locale is the following, so the results are different from yours.
Sys.getlocale("LC_TIME")
#[1] "Portuguese_Portugal.1252"
The difference will be due to the daylight savings time, the summer hour.
As for your problem, all you have to do is to remeber that the objects of class "POSIXct are coded as the number of seconds since an origin, and that origin is usually the midnight of 1970-01-01. So you have to add your seconds since midnight to the seconds of as.Date.
x <- "20180831"
xd <- mkdate(x)
y <- 10800
as.POSIXct(as.integer(xd) + y, origin = "1970-01-01")
#[1] "2018-08-31 04:00:00 BST"
as.POSIXct(as.integer(xd) + y, origin = "1970-01-01", tz = "America/Chicago")
#[1] "2018-08-30 22:00:00 CDT"
There are very many ways to do this:
mktime = function(a, b)modifyList(strptime(a, '%Y%m%d'), list(sec = as.numeric(gsub(',', '', b))))
mktime("20180831",'10,800')
[1] "2018-08-31 03:00:00 PDT"
mktime('20180301','10800')
[1] "2018-03-01 03:00:00 PST"
mktime('20180321','10800')
[1] "2018-03-21 03:00:00 PDT"
Looking at the above code, it does not adjust for the daylight saving time. Irrespective of the date, the seconds still show that it Is 3 AM, including the dates when ST-->DT. This will also take into consideration, your LOCAL timezone.
I am working on a dataset in R with a time variable like this:
Time = data.frame("X1"=c(930,1130,914,1615))
The first one/two digits of X1 refers to hour and the last two refers to minute. I want to make R recognize it as a time variable.
I try to use lubridate hm function but it didnt work probably because a ":" is missing between the hour and minute in my data.
I also thought about using str_sub function to separate the hour and minute first and then put them together with a ":" in between and finally use the lubridate function but I dont know how to extract the hour since sometimes it is presented as one digit but sometimes it is presented as two digits.
How do I make R recognize this as a time variable?
Thanks very much!
You could 0-pad to 4 digits and then format using standard R date tools:
as.POSIXct(sprintf("%04d",Time$X1), format="%H%M")
#[1] "2018-04-22 09:30:00 AEST" "2018-04-22 11:30:00 AEST"
#[3] "2018-04-22 09:14:00 AEST" "2018-04-22 16:15:00 AEST
This converts them to chron "times" class. Internally such variables are stored as a fraction of a day and are rendered on output as shown below. The sub inserts a : before the last 2 characters and :00 after them so that they are in HH:MM:SS format which times understands.
library(chron)
times(sub("(..)$", ":\\1:00", Time$X1))
## [1] 09:30:00 11:30:00 09:14:00 16:15:00
It could also be done like this where we transform each to a fraction of a day:
with(Time, times( (X1 %/% 100) / 24 + (X1 %% 100) / (24 * 60) ))
## [1] 09:30:00 11:30:00 09:14:00 16:15:00
I have a large dataframe with a column containing date-times, encoded as a factor variable.My Sys.timezone() is "Europe/Berlin". The date-times have this format:
2015-05-05 17:27:04+05:00
where +05:00 represents the timeshift from GMT. Importantly, I have multiple timezones in my dataset, so I cannot set a specific timezone and ignore the last 6 characters of the strings. This is what I tried so far:
# Test Date
test <- "2015-05-05 17:27:04+05:00"
# Removing the ":" to make it readable by %z
A <- paste(substr(test,1,22),substr(test,24,25),sep = "");A
# Returns
# "2015-05-05 17:27:04+0500"
output <- as.POSIXct(as.character(A, "%Y-%B-%D %H:%M:%S%z"))
# Returns
# "2015-05-05 17:27:04 CEST"
The output of "CEST" for +0500 is incorrect. Moreover, when I run this code on the whole column I see that every date is coded as CEST, regardless of the offset.
How can I keep the specified timezone when converting to POSIXct?
In order to facilitate the process you can use lubridate package.
E.g.
library("lubridate")#load the package
ymd_hms("2015-05-05 17:27:04+05:00",tz="GMT")#set the date format
[1] "2015-05-05 12:27:04 GMT"
Therefore you keep the timezone info. Finally:
as.POSIXct(ymd_hms("2015-05-05 17:27:04+05:00",tz="GMT"),tz = "GMT")#transform the date into another timezone
[1] "2015-05-05 12:27:04 GMT"
I'm trying to combine two vectors of dates into a single vector. I have been using dates with the lubridate package.
First I create two vectors of dates:
library(lubridate)
mydate <- mdy("04/01/2016")
mydate_range <- mydate + (1:12)*months(1)
anotherdate_range <- mdy("05/01/2017") + (1:12)*months(1)
Inspecting mydate_range and anotherdate_range these seem to have worked fine.
But then when I try to combine these into one vector things get weird.
combineddates <- c(mydate_range, anotherdate_range)
combineddates
[1] "2016-04-30 19:00:00 CDT" "2016-05-31 19:00:00 CDT" "2016-06-30 19:00:00 CDT"
The first date of combineddates is now "2016-04-30". Before I combined them using the c() function the first date of mydate_range was "2016-05-01".
Not sure why this changed. How should I join these date vectors?
The reason for the date change is the conversion due to time zone adjustments. 2016-04-30 19:00:00 CDT is the same as 2016-05-01 GMT. Most likely your initial sequence was in GMT and somewhere along the way it got converted to local time.
I find it best to define the time zone in your initial definition and it should stay consistent throughout.
I have a data frame with roughly 8 million rows and 3 columns. I used strptime() in the following manner:
df$date.time <- strptime(df$date.time, "%m/%d/%y %I:%M:%S %p")
This works fine for all but 1104 of the rows, which I checked using
df[is.na(df$date.time), ]
When I look at these "problem" data, the date.time entries seem to be formatted in the way I would expect. For example, here is an observation that comes up as a problem, but doesn't appear to be an NA:
id date.time outcome
observation543490 2012-03-11 02:14:01 C
What could possibly be going on here that is.na(df$date.time) returns a TRUE value for this row that has apparently been converted correctly?
Here's a reproducible example (if you're in CST):
is.na(strptime("03/11/12 2:14:01 AM", "%m/%d/%y %I:%M:%S %p", "CST6CDT"))
#[1] TRUE
The problem is likely that all the times that return NA do not exist in whatever timezone you're using, due to daylight saving time.
Check with the data source to determine the timezone the data were recorded in, then set the tz argument to that value in your call to strptime.