Change the time zone only for certain rows in a column? - r

I am working with data in R and would like to change the time zone of some POSIXct data, but only for certain rows within the columns (Survey_Start and Survey_End). Some of the data is already in the proper time zone, so converting the entire column is a problem. My code to change the time zone is:
herps2021 <- herps2021 %>%
mutate(Survey_Start = as.POSIXct(Survey_Start, format = "%H:%M:%S",
tz = "UTC"),
Survey_End = as.POSIXct(Survey_End, format = "%H:%M:%S", tz =
"UTC"),
#Change to proper time zone
Survey_Start = with_tz(Survey_Start, tzone = "America/Los_Angeles"),
Survey_End = with_tz(Survey_End , tzone = "America/Los_Angeles")
)
Is there a way to specify which rows for the columns Survey_Start and Survey_End I want to convert, so that the data already in the correct time zone is unaffected?
Thanks!

you could try using parse_date_time that allows you to parse multiple dates and times in a column.
Looks something like this:
library(lubridate)
parse_date_time(c("2016", "2016-04"), orders = c("Y", "Ym"))
#> [1] "2016-01-01 UTC" "2016-04-01 UTC"
here is the link to the documentation: https://lubridate.tidyverse.org/reference/parse_date_time.html

Related

Creating a function to change a variable type to time

I'm playing around with functions in R and want to create a function that takes a character variable and converts it to a POSIXct.
The time variable currently looks like this:
"2020-01-01T05:00:00.283236Z"
I've successfully converted the time variable in my janviews dataset with the following code:
janviews$time <- gsub('T',' ',janviews$time)
janviews$time <- as.POSIXct(janviews$time, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
Since I have to perform this on multiple datasets, I want to create a function that will perform this. I created the following function but it doesn't seem to be working and I'm not sure why:
set.time <- function(dat, variable.name){
dat$variable.name <- gsub('T', ' ', dat$variable.name)
dat$variable.name <- as.POSIXct(dat$variable.name, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
}
Here's the first four rows of the janviews dataset:
structure(list(customer_id = c("S4PpjV8AgTBx", "p5bpA9itlILN",
"nujcp24ULuxD", "cFV46KwexXoE"), product_id = c("kq4dNGB9NzwbwmiE",
"FQjLaJ4B76h0l1dM", "pCl1B4XF0iRBUuGt", "e5DN2VOdpiH1Cqg3"),
time = c("2020-01-01T05:00:00.283236Z", "2020-01-01T05:00:00.895876Z",
"2020-01-01T05:00:01.362329Z", "2020-01-01T05:00:01.873054Z"
)), row.names = c(NA, -4L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x1488180e0>)
Also, if there is a better way to convert my time variable, I am open to changing my method!
I would use the lubridate package and the as_datetime() function.
lubridate::as_datetime("2020-01-01T05:00:00.283236Z")
Returns
"2020-01-01 05:00:00 UTC"
Lubridate Info

Writing functions to reference specific columns

I have to pull different data sets from the same API regularly but for different reasons, so I have to write out the code for many different pulls. I'd like to create some functions to help with this, but I need some help.
I haven't been able to figure out how to set up the function so that I can change the data set but still pull from the same column each time. In this example, I have 3 columns with timestamps that mean different things (made up in this data). I need to change the timezone here to my local time zone. The column name will remain the same in all of my datasets, but the name of the dataset will change. I have a few places in my code where I need to do this, and I haven't been able to figure it out, so any suggestions would be much appreciated!
The second section of this example code is not included in the actual code, but it is there to set the data up correctly. The data comes out of the API in the format shown as GMT.
df <- data.frame(col_1 = c(1, 2, 3, 4),
time_1 = c("2021-01-20 23:58:21", "2021-01-20 21:21:00", "2021-01-20 17:14:04", "2021-01-20 01:05:18"),
time_2 = c("2021-01-19 23:58:21", "2021-01-19 21:21:00", "2021-01-19 17:14:04", "2021-01-19 01:05:18"),
time_3 = c("2021-01-18 23:46:21", "2021-01-18 36:21:00", "2021-01-18 15:14:04", "2021-01-18 01:05:18"),
time_4 = c("2021-01-17 23:58:21", "2021-01-17 20:21:00", "2021-01-17 18:14:04", "2021-01-17 02:05:18"))
# Not part of actual code
df$time_1 <- as.POSIXlt(df$time_1, tz = "GMT")
df$time_2 <- as.POSIXlt(df$time_2, tz = "GMT")
df$time_3 <- as.POSIXlt(df$time_3, tz = "GMT")
df$time_4 <- as.POSIXlt(df$time_4, tz = "GMT")
# What I want it to do
# df$time_1 <- lubridate::with_tz(df$time_1, tz = "America/Los_Angeles")
# df$time_2 <- lubridate::with_tz(df$time_2, tz = "America/Los_Angeles")
# df$time_3 <- lubridate::with_tz(df$time_3, tz = "America/Los_Angeles")
# df$time_4 <- lubridate::with_tz(df$time_4, tz = "America/Los_Angeles")
# Attempted function
timezone_cleanup <- function(my_df){
my_df$time_1 <- lubridate::with_tz(my_df$time_1, tz = "America/Los_Angeles")
my_df$time_2 <- lubridate::with_tz(my_df$time_2, tz = "America/Los_Angeles")
my_df$time_3 <- lubridate::with_tz(my_df$time_3, tz = "America/Los_Angeles")
my_df$time_4 <- lubridate::with_tz(my_df$time_4, tz = "America/Los_Angeles")
}
# how I'd like to use this function. Not working now. Even if I wrap it with data.frame(), it's not what I wanted.
new_df <- timezone_cleanup(df)
I think you need to return my_df in your function to get the changed dataframe back. However, you can use lapply or across to apply the same function to multiple columns.
library(dplyr)
timezone_cleanup <- function(my_df){
my_df %>%
mutate(across(starts_with('time'),
lubridate::with_tz, tz = "America/Los_Angeles"))
}
new_df <- timezone_cleanup(df)
By the way, I do recive a warning message while using this Unrecognized time zone 'America/Los_Angeles'. Are you sure you are using the correct tz value?

R as.POSIXct try two input formats

I am reading in a .csv of dates and gps positions. I need to convert the date column to a date class.
I am using:
data = data.frame(rbind(c('2016/07/19 17:52:00',3674.64416424279,354.266660979476),
c('2016/07/19 17:54:00',3674.65121597935,354.246972537617),
c('2016/07/19 17:55:00',3674.65474186293,354.237128326737),
c('2016/07/19 17:56:00',3674.65826775671,354.227284122559)))
colnames(data) = (c('GMT_DateTime','northing','easting'))
data$GMT_DateTime<-as.POSIXct(data$GMT_DateTime, tz="GMT", format = "%Y/%m/%d %H:%M:%S")
Sometimes the date in the .csv to be read is formatted as "%Y/%m/%d %H:%M:%S" and sometimes as "%m/%d/%Y %H:%M"
Is there a way to feed in two possible formats to as.POSIXct() to try both possible formats? I imagine something like this:
data$GMT_DateTime<-as.POSIXct(data$GMT_DateTime, tz="GMT", format = "%m/%d/%Y %H:%M" or "%Y/%m/%d %H:%M:%S")
Thank you!
In what follows I will use package lubridate.
I have added two extra rows to the example dataset, with date/time values in the "%m/%d/%Y %H:%M" format. Note that that column is of class character, if it is of class factor it will probably throw an error.
As for the warnings, don't worry, they are just lubridate telling you that it found several formats and cannot process them all in one go.
tmp <- data$GMT_DateTime # work on a copy
na <- is.na(ymd_hms(tmp))
data$GMT_DateTime[!na] <- ymd_hms(tmp)[!na]
data$GMT_DateTime[na] <- mdy_hm(tmp)[na]
data$GMT_DateTime <- as.POSIXct(as.numeric(data$GMT_DateTime),
format = "%Y-%m-%d",
origin = "1970-01-01", tz = "GMT")
rm(tmp) # final clean up
Data in dput() format.
data <-
structure(list(GMT_DateTime = c("2016/07/19 17:52:00", "2016/07/19 17:54:00",
"2016/07/19 17:55:00", "2016/07/19 17:56:00", "07/22/2016 17:02",
"07/23/2016 17:15"), northing = c(3674.64416424279, 3674.65121597935,
3674.65474186293, 3674.65826775671, 3674.662, 3674.665), easting = c(354.266660979476,
354.246972537617, 354.237128326737, 354.227284122559, 354.2702,
354.3123)), row.names = c(NA, -6L), class = "data.frame")

How to initialize a time variable in R?

I am trying to set a time variable in order to use it for comparison against times stored in a vector and am writing the following:
> openingTime <- as.POSIXct('08:00:00 AM', format='%H:%M:S %p', tzone = "EET")
> openingTime
[1] NA
Also
> dput(openingTime)
structure(NA_real_, class = c("POSIXct", "POSIXt"), tzone = "")
What am I doing wrong there?
As commented and pointed out by #rhertel, the proper syntax in order to get the variable working is:
openingTime <- as.POSIXct('08:00:00 AM', format='%H:%M:%S %p', tz = "EET")

Dates in R differ between Windows and Linux

I have a data frame in R names as data.
data <- as.xts(read.zoo("data1.csv",sep=",",tz="" ,header=T))
data index in the format 2004-01-04 09:44:00 IST
I applied the operation to change the index to Dates
index(data) <- as.Date(index(data))
Output should be 2004-01-04 but system produces 2004-01-03.
This works correctly in Windows but does not work on Linux.
Arun is correct that the problem is with locale. Your data has an Indian Standard Time stamp, and you have a US locale, which is at least 10.5 hours behind. Hence the time 09:44 is actually late the previous evening in your time zone.
Dates and time are horribly complicated, and R uses underlying OS capabilities to make it's calculations, which is why you see different results on different machines. Linux is POSIX compliant and understands time zones like "IST", which allows it to make the change to the previous night. Windows does not, which is why it gives the date as 01-04. To get the correct time zone update on Windows, you need to specify the full name of the time zone, "Asia/Kolkata". Wikipedia has a list of time zone names.
EDIT: Actually, R ships with a file containing all the "Continent/City" (Olson-style) names that it accepts. It is stored in
file.path(R.home("share"), "zoneinfo", "zone.tab")
and the example on the help page ?Sys.timezone tells you how to programmatically read it.
I find the lubridate package makes things a little easier to see what is happening.
library(lubridate)
x <- ymd_hms("2004-01-04 09:44:00 IST", tz = "Asia/Kolkata")
x
# [1] "2004-01-04 09:44:00 IST"
with_tz(x, "America/New_York")
# [1] "2004-01-03 23:14:00 EST"
Date <- c("2010-01-04 09:04:00", "2010-01-04 09:05:00")
Open <- c(5222.9, 5220.2)
Low <- c(5224.6, 5222.95)
High <- c(5220.1, 5218.6)
Close <- c(5220.35, 5222.95)
x <- data.frame(Date = Date, Open = Open, Low = Low, High = High, Close = Close)
as.Date(x$Date)
Output:
[1] "2010-01-04" "2010-01-04"
It seems alright to me.
Edit:
require(zoo)
data <- as.xts(read.zoo("data1.csv",sep=",",tz="" ,header=T))
> dput(data)
structure(c(5222.9, 5220.2, 5224.6, 5222.95, 5220.1, 5218.6,
5220.35, 5222.95), .Dim = c(2L, 4L), .Dimnames = list(NULL, c("Open",
"Low", "High", "Close")), index = structure(c(1262592240, 1262592300
), tzone = "", tclass = c("POSIXct", "POSIXt")), class = c("xts",
"zoo"), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct",
"POSIXt"), .indexTZ = "", tzone = "")
> as.Date(index(data))
[1] "2010-01-04" "2010-01-04"
On my Mac it works right. I suspect your system locale is set wrong. Also, you may want to check it within R.
What does this command Sys.getlocale() give you in Windows and in Linux within R?

Resources