standardize timezone in a large dataset - r

I have a large dataset corresponding to different sites and timezone.
I'd like to standardize all of the sites to "UCT". I'm struggling to transform the dates (which are a factor) to get the proper date format.
One small sample of my date looks like this:
head(data_tz)
site DatetimeEnd tzone
FR01001 2014-10-28 00:00:00 UTC
FR01001 2014-11-02 00:00:00 UTC
FR01001 2014-01-20 00:00:00 UTC
FR01001 2014-11-01 00:00:00 UTC
FR01001 2014-01-13 00:00:00 UTC
FR01001 2014-09-17 00:00:00 UTC
..........
This is a large dataset with 4 different tzone:
unique(data_tz$tzone)
"UTC" "UTC-04" "UTC+04" "UTC-03"
And DatetimeEnd is a factor, which I need to convert to POSIXct, and then each site to "UTC". I have been trying different approaches but neither of them worked.
I am using:
newdata$DatetimeEnd <- as.POSIXct(data_tz$DatetimeEnd, format="%Y-%m-%d %H:%M:%S",tz=data_tz$tzone)
But I got:
Error in strptime(x, format, tz = tz) : invalid 'tz' value
And the same when using:
newdata$DatetimeEnd <- as.POSIXct(strptime(data_tz$DatetimeEnd,
format="%Y-%m-%d %H:%M:%S",tz=data_tz$tzone))
If I use:
newdata$DatetimeEnd <- as.POSIXct(data_tz$DatetimeEnd, format="%Y-%m-%d %H:%M:%S",tz="UTC +01")
It works, but it is not what I want, since there are some columns (sites) with "UCT +02"..(different tzone)
How can I use here the tz as an argument to get the right timezone? Any idea/suggestion would be really helpful.
Thanks

You can use purrr::map2 to iterate over the rows of the columns DatetimeEnd and tzone, creating a new vector.

Related

Subsetting POSIXct date and time returns wrong date [duplicate]

This question already has answers here:
R: How to filter/subset a sequence of dates
(4 answers)
Subset dataframe based on POSIXct date and time greater than datetime using dplyr
(1 answer)
Closed 3 years ago.
I have the following data frame containing date and time in POSIXct format in time zone UTC:
date<-c("2013-12-12","2014-01-01","2014-01-01","2014-01-01")
time<-c("23:00:00","00:00:00","01:00:00","02:00:00")
x<-data.frame(date,time)
x$Date2<-as.POSIXct(paste(x$date, x$time), format="%Y-%m-%d %H:%M:%S", tz="UTC")
After subsetting the data frame with:
x<-subset(x, Date2<="2014-01-01 00:00:00")
I am not getting the correct date and time:
date time Date2
1 2013-12-12 23:00:00 2013-12-12 23:00:00
Should´t I rather get:
date time Date2
1 2014-01-01 00:00:00 2014-01-01 00:00:00
Any ideas why?
This is a time zone problem. When you compare the string to the POSIXct value it converts the string s below to a POSIXct value relative to the current default time zone of your session.
We see how the answer changes when we change the default time zone for the session:
s <- "2000-01-01 00:00:00"
Sys.setenv(TZ = "GMT")
as.POSIXct(s, tz = "GMT") == s
## [1] TRUE
Sys.setenv(TZ = "") # "" will set your TZ to your usual session default
as.POSIXct(s, tz = "GMT") == s
## [1] FALSE
Thus you can either explicitly convert your strings to POSIXct specifying the time zone or else you can set your session time zone to the same time zone as your POSIXct objects.
That is because "2014-01-01 00:00:00" is a string and you are comparing POSIXct class with a character class. Convert it into POSIXct and it should work
subset(x, Date2 <= as.POSIXct("2014-01-01 00:00:00", tz = "UTC"))
# date time Date2
#1 2013-12-12 23:00:00 2013-12-12 23:00:00
#2 2014-01-01 00:00:00 2014-01-01 00:00:00
Here, both the rows are selected since both of them are less than equal to date-time compared.

How to drop minutes in R?

I have a DateTime object in R.
tempDateTime<-as.POSIXct("2017-07-13 01:40:00 MDT")
class(tempDateTime)
[1] "POSIXct" "POSIXt"
I would like to drop the minutes from the DateTime object. ie have "2017-07-13 01:00:00 MDT"
Is there a simple way to do this?
In Base R
trunc(tempDateTime, units = "hours")
# "2017-07-13 01:00:00 AEST"
This works because the round function in base R has a method to handle POSIX objects.
From ?round.POSIXt
Round or truncate date-time objects.
As #Thelatemail points out, this returns a POSIXlt object, so you may want to wrap the result in as.POSIXct() again.
Another note, POSIXct is an object that stores the number of seconds since "1970-01-01 00:00:00" (the Unix epoch).
as.numeric(tempDateTime)
# 1499874000
So the manual way to round-down the hours would be
as.POSIXct(floor(as.numeric(tempDateTime) / 3600) * 3600, origin = "1970-01-01")
Try this:
library(lubridate)
> floor_date(tempDateTime, "hour")
[1] "2017-07-13 01:00:00 PDT"

Two Timestamp Formats in R

Im have a time stamp column that I am converting into a POSIXct. The problem is that there are two different formats in the same column, so if I use the more common conversion the other gets converted into NA.
MC$Date
12/1/15 22:00
12/1/15 23:00
12/2/15
12/2/15 1:00
12/2/15 2:00
I use the following code to convert to a POSIXct:
MC$Date <- as.POSIXct(MC$Date, tz='MST', format = '%m/%d/%Y %H:%M')
The results:
MC$Date
15-12-01 22:00:00
15-12-01 23:00:00
NA
15-12-02 01:00:00
15-12-02 02:00:00
I have tried using a logic vector to identify the issue then correct it but can't find an easy solution.
The lubridate package was designed to deal with situations like this.
dt <- c(
"12/1/15 22:00",
"12/1/15 23:00",
"12/2/15",
"12/2/15 1:00",
"12/2/15 2:00"
)
dt
[1] "12/1/15 22:00" "12/1/15 23:00" "12/2/15" "12/2/15 1:00" "12/2/15 2:00"
lubridate::mdy_hm(dt, truncated = 2)
[1] "2015-12-01 22:00:00 UTC" "2015-12-01 23:00:00 UTC" "2015-12-02 00:00:00 UTC"
[4] "2015-12-02 01:00:00 UTC" "2015-12-02 02:00:00 UTC"
The truncated parameter indicates how many formats can be missing.
You may add the tz parameter to specify which time zone to parse the date with if UTC is not suitable.
I think the logic vector approach could work. Maybe in tandem with an temporary vector for holding the parsed dates without clobbering the unparsed ones. Something like this:
dates <- as.POSIXct(MC$Date, tz='MST', format = '%m/%d/%Y %H:%M')
dates[is.na(dates)] <- as.POSIXct(MC[is.na(dates),], tz='MST', format = '%m/%d/%Y')
MC$Date <- dates
Since all of your datetimes are separated with a space between date and time, you could use strsplit to extract only the date part.
extractDate <- function(x){ strsplit(x, split = " " )[[1]][1] }
MC$Date <- sapply( MC$Date, extractDate )
Then go ahead and convert any way you like, without worrying about the time part getting in the way.

In R programming language how to convert 0815A into a 24 hour time format

I have a column in which time is in the following format: 0815A. I need it to be converted into a time format.
I have tried poxscit but there are some errors.
We can use as.POSIXct specifying the correct format
as.POSIXct(paste0(v1, "M"), format = '%I%M%p')
#[1] "2016-07-27 08:15:00 IST" "2016-07-27 21:20:00 IST"
data
v1 <- c("0815A", "0920P")

How to get the beginning of the day in POSIXct

My day starts at 2016-03-02 00:00:00. Not 2016-03-02 00:00:01.
How do I get the beginning of the day in POSIXct in local time?
My confusing probably comes from the fact that R sees this as the end-date of 2016-03-01? Given that R uses an ISO 8601?
For example if I try to find the beginning of the day using Sys.Date():
as.POSIXct(Sys.Date(), tz = "CET")
"2016-03-01 01:00:00 CET"
Which is not correct - but are there other ways?
I know I can hack my way out using a simple
as.POSIXct(paste(Sys.Date(), "00:00:00", sep = " "), tz = "CET")
But there has to be a more correct way to do this? Base R preferred.
It's a single command---but you want as.POSIXlt():
R> as.POSIXlt(Sys.Date())
[1] "2016-03-02 UTC"
R> format(as.POSIXlt(Sys.Date()), "%Y-%m-%d %H:%M:%S")
[1] "2016-03-02 00:00:00"
R>
It is only when converting to POSIXct happens that the timezone offset to UTC (six hours for me) enters:
R> as.POSIXct(Sys.Date())
[1] "2016-03-01 18:00:00 CST"
R>
Needless to say by wrapping both you get the desired type and value:
R> as.POSIXct(as.POSIXlt(Sys.Date()))
[1] "2016-03-02 UTC"
R>
Filed under once again no need for lubridate or other non-Base R packages.
Notwithstanding that you understandably prefer base R, a "smart way," for certain meaning of "smart," would be:
library(lubridate)
x <- floor_date(Sys.Date(),"day")
> format(x,"%Y-%m-%d-%H-%M-%S")
[1] "2016-03-02-00-00-00"
From ?floor_date:
floor_date takes a date-time object and rounds it down to the nearest
integer value of the specified time unit.
Pretty handy.
Your example is a bit unclear.
You are talking about a 1 minute difference for the day start, but your example shows a 1 hour difference due to the timezone.
You can try
?POSIXct
to get the functionality explained.
Using Sys.Date() withing POSIXct somehow overwrites your timezone setting.
as.POSIXct(Sys.Date(), tz="EET")
"2016-03-01 01:00:00 CET"
While entering a string gives you
as.POSIXct("2016-03-01 00:00:00", tz="EET")
"2016-03-01 EET"
It looks like 00:00:00 is actually the beginning of the day. You can conclude it from the results of the following 2 inequalities
as.POSIXct("2016-03-02 00:00:02 CET")>as.POSIXct("2016-03-02 00:00:01 CET")
TRUE
as.POSIXct("2016-03-02 00:00:01 CET")>as.POSIXct("2016-03-02 00:00:00 CET")
TRUE
So somehow this is a timezone issue. Notice that 00:00:00 is automatically removed from the as.POSIXct result.
as.POSIXct("2016-03-02 00:00:00 CET")
"2016-03-02 CET"

Resources