Aggregate function and timezone - r

I have two sections of code that theoretically do the same thing:
Mn_min_max_D <- with(Mn, aggregate(Depth ~ as.Date(Date_time), FUN = function(x) c(Min = min(x), Max = max(x))))
Mn_min_max_D <- do.call(data.frame, Mn_min_max_D)
names(Mn_min_max_D)[names(Mn_min_max_D) == "as.Date.Date_time."] <- "Date"
min_max_D <- with(Mn, aggregate(Depth ~ as.Date(Date), FUN = function(x) c(Min = min(x), Max = max(x))))
min_max_D <- do.call(data.frame, min_max_D)
names(Mn_min_max_D)[names(min_max_D) == "as.Date.Date_time."] <- "Date"
However the output values are different. On inspecting the max depths, I can see that for some reason the timezone is being ignored on the first piece of code.
For example the max depth happens at '2013-10-26 22:33:00', but with the time zone correction this is actually '2013-10-27 07:33:00'.
The $Date value comes from this code:
Mn$Date_time <- as.POSIXct(Mn$Date_time, format="%Y-%m-%d %H:%M:%S", tz = "Asia/Tokyo")
Mn$Date <- format(as.POSIXct(Mn$Date_time, format="%YYYY/%m/%d %H:%M:%S"), format = "%Y/%m/%d")
Mn$Date <- as.Date(Mn$Date, "%Y/%m/%d")
It seems that maybe the process of removing the time fixes the date. I need to understand where the issue stems from to make sure i don't make a mistake in the future.
I think I may need to do a %>% mutate with a tz but don't understand how at the moment. or maybe use dplyr to aggregate instead as below, but I've tried and the result is the same.
test <- Mn %>% group_by(as.Date(Date_time))%>% dplyr::summarise(min = min(Depth), max = max(Depth))
Example data:
Date_time Depth
2013-10-14 12:30:00 64.45
2013-10-14 12:30:05 65.95
2013-10-14 12:30:10 65.95
2013-10-14 12:30:15 66.45
2013-10-14 12:30:20 67.95
2013-10-14 12:30:25 66.95

In the present format the data does not carry the time zone so the default time zone is being used. If you are aware of the time zone for those timestamps it's better to control for it explicitly.
dta <- with(
asNamespace("readr"),
read_table(
file = "
Date_time Depth
2013-10-14-12:30:00 64.45
2013-10-14-12:30:05 65.95
2013-10-14-12:30:10 65.95
2013-10-14-12:30:15 66.45
2013-10-14-12:30:20 67.95
2013-10-14-12:30:25 66.95",
col_types = cols(
Date_time = col_datetime(format = "%Y-%m-%d-%H:%M:%S"),
Depth = col_double()
)
)
)
library("lubridate")
library("tidyverse")
dta %>%
mutate(DT_tz = force_tz(Date_time, tzone = "GMT"),
DT_tz_NYC = with_tz(Date_time, tzone = "America/New_York"))
Explanation
Consider the following:
tz(now()) returns an empty string
Sys.timezone() returns local time zone, "Europe/London" in my case
tz(as.Date(now())) returns "UTC"
Without specifying time zones R falls on your local settings
as.POSIXlt(Sys.time(), "America/New_York")
# "2022-03-18 12:43:10 EDT"
as.POSIXlt(Sys.time())
# "2022-03-18 16:43:16 GMT"
This can get a little fiddly.
tz(as.POSIXlt(Sys.time()))
# [1] "Europe/London"
tz(as.Date(as.POSIXlt(Sys.time())))
# "UTC"
In particular, it's worth showing that using as.Date will strip out the time zone information.
tz(as.Date(as.POSIXlt(Sys.time())))
"UTC"
tz(as.Date(as.POSIXlt(Sys.time()), tz = "Africa/Abidjan"))
"UTC"
Solution
If dealing with timestamps it's always advisable to ensure that the timezone information is recoded within that data, or as an alternative, less robust option, stated explicitly within the script. Personally, I'm of a view that a time zone component is integral part of the timestamp and should reside with the data. Stripping time zone information from time stamp leads to confusion when localised timestamps differ. Significant differences may result in different dates (consider 2hr time zone difference and events taking place close to midnight, etc.).

Related

Convert a chr time in to actuall time in R

so I'm trying to convert a F1 Laptime that is written in a chr in to time which I can then plot into a histogram.
This is what i tried. But with no success.
lapTimes <- lapTimes %>% mutate(Time = ms(Time))
format(as.POSIXct(lapTimes$time, tz = ""), format = "%M:%S.%OS")
The time always looks like this 1:11.111, with minutes first then secunds and then milliseconds.
If anyone has a idea I would greatly apprichiate that.
Thanks in advance! :D
As stated previously, I am assuming your data looks something like this:
laptime <- c("1:11.111", "2:02.2222")
What this represents is a time interval not a date time. As such you can convert this to a difftime class and then to numeric if needed.
as.difftime(laptime, format = "%M:%S.%OS")
#Time differences in mins
#[1] 1.183333 2.033333
since you provided no example data, I assumed it is stored as a character.
laptime <- "1:11.111"
> as.POSIXlt.character(laptime, format = "%M:%S.%OS", tz = 'GMT')
[1] "2021-01-14 00:01:11 GMT"
# compute time difference from dates
laptime <- "1:11.111"
t2 <- as.POSIXlt.character(laptime, format = "%M:%S.%OS", tz = 'GMT')
t1 <- as.Date(t2)
> difftime(t2, t1)
Time difference of 1.183333 mins
You could also take a look at this link, looks very useful for your specific problem: https://rstudio-pubs-static.s3.amazonaws.com/276999_042092be8e31414f82ef4f41e31fe5c8.html

Adding milliseconds to a timestamp in R, even though the original character does not have milliseconds?

I am doing some animal movement analysis and I want to submit data to an organisation called Movebank for annotation, but they require the timestamp to have milliseconds included with 3 decimal places.
I have a column in my data frame (dat) with my timestamps as characters (without milliseconds), for example "2017-07-19 16:30:24"
To convert them to time and date format with milliseconds I am using the code:
options(digits.secs = 3)
dat$timestamp <- as.POSIXct(dat$timestamp, format = "%Y-%m-%d %H:%M:%OS", tz = "UTC")
Which works fine at converting my timestamp column to POSIXct which I can use to make tracks etc., but it does not add .000 milliseconds to the end of each timestamp which I was hoping it would.
I have also tried:
dat$timestamp <- as.POSIXct(dat$timestamp, format = "%Y-%m-%d %H:%M:%OS3", tz = "UTC")
(Note: I added .. %OS3 ...)
But this returns an NA for my for my timestamps.
Can anybody shed some light on this? I essentially need to add .000 to the end of each of my timestamps so that, using the example given above, I would have the format "2017-07-19 16:30:24.000"
The milliseconds will be dropped if there are no times with effective milliseconds.
options(digits.secs=4)
x1 <- as.POSIXct("2017-07-19 16:30:25")
as.POSIXct(paste0(x1, ".000"), format="%Y-%m-%d %H:%M:%OS")
# [1] "2017-07-19 16:30:25 UTC"
However, they will be added automatically if there are.
x2 <- as.POSIXct("2017-07-19 16:30:25.002")
c(x1, x2)
# [1] "2017-07-19 18:30:25.000 CEST" "2017-07-19 18:30:25.002 CEST"

How to format a time in R

I have been given a dataset that lists date and time separately. The dates are fine however the time is being treated as a character rather than a date/time object.
The current time column looks like "13:00", "13:05", "13:10" etc.
I tried mutating the column using as.POSIXct() however it changed the column to all NA.
This was my attempt:
data = data %>%
mutate(time = as.POSIXct(time, format = "h:m"))
I expected a similar looking column but instead of strings I wanted it to be times/dates. Thanks for any help!
The times class in chron can represent times without dates:
library(chron)
library(dplyr)
# input data
data <- data.frame(date = "2000-01-01", time = c("13:00", "13:05", "13:10"))
data %>%
mutate(date = as.chron(as.character(date)),
time = times(paste0(time, ":00")),
datetime = chron(date, time))
giving:
date time datetime
1 01/01/00 13:00:00 (01/01/00 13:00:00)
2 01/01/00 13:05:00 (01/01/00 13:05:00)
3 01/01/00 13:10:00 (01/01/00 13:10:00)
For a simple, non package solution:
I would first create a column with both the date and time in it
dateandtime <- as.character(paste(date, time, sep = ' '))
and then use the strptime function:
dateandtime <- strptime(dateandtime,
format = "%Y-%m-%d %H:%M",
tz = 'GMT')
just put the dataframe name in front of all variables, e.g.:
df$dateandtime <- as.character(paste(df$date, df$time, sep = ' '))
Hope it helps!
If you use as.POSIXct, you need to provide the format differently:
as.POSIXct("13:05", format = "%H:%M")
This however returns [1] "2019-03-26 13:05:00 CET" since date/times are represented as calendar dates plus time to the nearest second.
If you only want to use the time, you could use data.table::asITime:
data.table::as.ITime(c("13:00", "13:05", "13:10"))
This returns:
str(data.table::as.ITime(c("13:00", "13:05", "13:10")))
'ITime' int [1:3] 13:00:00 13:05:00 13:10:00

R convert date, time and time zone strings to POSIXct

Date parsing bug
I am having trouble with character to date-time conversions and would appreciate help understanding what is going wrong. To do this, I define a very simple data frame with two rows, which holds an ID, a time zone, a date, and a time for each row. I would like to add a column that contains a (say) POSIXct entry for the combined date-time including the correct time zone. (This is a synthetic example but I want to apply this to a much larger data set.)
First we try combining these features into a unified representation of the data, time and time zone using R’s base facilities.
d <- data.frame(id=c(111, 222),
tzz=c("Europe/Berlin", "US/Eastern"),
d=c("09-Sep-2017", "11-Sep-2017"),
t=c("23:42:13", "22:05:17"),
stringsAsFactors = FALSE)
d$dt <- strptime(paste(d$d, d$t), tz=d$tzz, format="%d-%b-%Y %T")
Error in strptime(paste(d$d, d$t), tz = d$tzz, format = "%d-%b-%Y %T") :
invalid 'tz' value
That approach fails, though it’s not clear to my why. For example, I can do the non-vectorized version of this easily. Also, the time zones I am using seem to be part of the officially supported list.
d$tzz %in% OlsonNames()
[1] TRUE TRUE
dt1 <- strptime(paste(d$d[1], d$t[1]), tz=d$tzz[1], format="%d-%b-%Y %T")
print(dt1)
[1] "2017-09-09 23:42:13 CEST"
print(tz(dt1))
[1] "Europe/Berlin"
dt2 <- strptime(paste(d$d[2], d$t[2]), tz=d$tzz[2], format="%d-%b-%Y %T")
print(dt2)
[1] "2017-09-11 22:05:17 EDT"
print(tz(dt2))
[1] "US/Eastern"
Also, Thinking that perhaps my problem was in misunderstanding how to use strptime, I then tried a similar approach with lubridate:
library(lubridate)
d$dt <- dmy_hms(paste(d$d, d$t), tz=d$tzz)
Error in strptime(.enclose(x), .enclose(fmt), tz) : invalid 'tz' value
but got the same error. Again, a non-vector version works fine.
dt1l <- dmy_hms(paste(d$d[1], d$t[1]), tz=d$tzz[1])
print(dt1l)
[1] "2017-09-09 23:42:13 CEST"
print(tz(dt1l))
[1] "Europe/Berlin"
Trying mutate in tidyverse yields the same problem. (Incidentally, CEST is not among the OlsonNames set.)
Help for how to do this correctly, or at least an explanation of how this is going wrong, would be much appreciated.
Try computing it row by row like this:
library(dplyr)
d %>%
rowwise() %>%
mutate(ct = as.POSIXct(paste(d, t), format = "%d-%b-%Y %H:%M:%S", tz = tzz)) %>%
ungroup
giving:
# A tibble: 2 x 5
id tzz d t ct
<dbl> <chr> <chr> <chr> <dttm>
1 111. Europe/Berlin 09-Sep-2017 23:42:13 2017-09-09 17:42:13
2 222. US/Eastern 11-Sep-2017 22:05:17 2017-09-11 22:05:17
Similar to Gabor's but with data.table using the fact that the ids are unique:
R> dt <- data.table(d)
R> dt[ , ct := as.POSIXct(paste(d, t), "%d-%b-%Y %H:%M:%S", tz=tzz), by=id][]
id tzz d t ct
1: 111 Europe/Berlin 09-Sep-2017 23:42:13 2017-09-09 17:42:13
2: 222 US/Eastern 11-Sep-2017 22:05:17 2017-09-11 22:05:17
R>

How to convert UTC timestamp to Australian time

I have a large amount of data with timestamps in the following format: 2013-11-14T23:52:29Z.
My research indicates that the timezone is UTC (denoted by a "Z" suffix).
I need to convert it to +1100 UTC (which is Australia/Sydney time), also known as "EDT" (or Eastern Daylight Time).
I have tried the following:
test_timestamp <- "2013-11-14T23:52:29Z"
as.POSIXct(test_timestamp,"Australia/Sydney")
This produces the output "2013-11-14 EST"
This does not pass a sanity test as it should roll the date over into the next calendar day (i.e. 2013-11-15 EST).
I have wasted many hours on this seemingly trivial task, so any help is greatly appreciated.
Try this, with a full format specified (see ?strptime):
format(
as.POSIXct(test_timestamp,format="%Y-%m-%dT%H:%M:%SZ",tz="UTC"),
tz="Australia/Sydney"
)
#[1] "2013-11-15 10:52:29"
Compare your attempt (essentially):
format(as.POSIXct(test_timestamp,tz="Australia/Sydney"),tz="Australia/Sydney")
#[1] "2013-11-14"
Also, this will work to non-destructively edit the data, only altering the output:
result <- as.POSIXct(test_timestamp,format="%Y-%m-%dT%H:%M:%SZ",tz="UTC")
result
#[1] "2013-11-14 23:52:29 UTC"
#dput(result)
#structure(1384473149, class = c("POSIXct","POSIXt"), tzone = "UTC")
attr(result,"tzone") <- "Australia/Sydney"
#dput(result)
#structure(1384473149, class = c("POSIXct","POSIXt"), tzone = "Australia/Sydney")
result
#[1] "2013-11-15 10:52:29 EST"

Resources