R - Prevent aggregate function from converting date time timezones to local time? - r

Is there a way to stop aggregate converting datetimes to the computer's local timezone? For example:
dtUTC <- as.POSIXct(c('2010-01-01 01:01:01', '2015-01-02 07:23:11',
'2016-06-02 05:23:41', '2018-01-08 17:57:43'), tz='UTC')
groups <- c(1,1,2,2)
result <- aggregate(dtUTC, by=list(groups), FUN=min)
The result is converted to my computers local timezone.
> dtUTC
[1] "2010-01-01 01:01:01 UTC" "2015-01-02 07:23:11 UTC" "2016-06-02 05:23:41 UTC"
[4] "2018-01-08 17:57:43 UTC"
> result$x
[1] "2010-01-01 12:01:01 AEDT" "2016-06-02 15:23:41 AEST"
I can convert it back post hoc but this is an annoying extra step to have to do. Especially if I have multiple datetime columns.
attr(result$x, 'tzone') <- 'UTC'
> result$x
[1] "2010-01-01 01:01:01 UTC" "2016-06-02 05:23:41 UTC"

I can't find anything that you can do with aggregate to change this behavior, but you can set your environment's TZ so any date-times will automatically be in UTC:
Sys.setenv(TZ='UTC') # <- set your TZ here
dtUTC <- as.POSIXct(c('2010-01-01 01:01:01', '2015-01-02 07:23:11',
'2016-06-02 05:23:41', '2018-01-08 17:57:43'))
groups <- c(1,1,2,2)
df <- data.frame(dtUTC, groups)
result <- aggregate(dtUTC ~ groups, df, min)
result$dtUTC
# [1] "2010-01-01 01:01:01 UTC" "2016-06-02 05:23:41 UTC"

you can use dplyr package to aggregate
library(lubridate)
library(dplyr)
dtUTC <- as.POSIXct(c('2010-01-01 01:01:01', '2015-01-02 07:23:11',
'2016-06-02 05:23:41', '2018-01-08 17:57:43'), tz='UTC')
groups <- c(1,1,2,2)
b<-data.frame(date= dtUTC, group = groups) %>% group_by(group) %>% dplyr::summarise(min = min(date))
b$min
> b$min
[1] "2010-01-01 01:01:01 UTC" "2016-06-02 05:23:41 UTC"

Related

Floor datetime with custom start time (lubridate)

Is there a way to floor dates using a custom start time instead of the earliest possible time?
For example, flooring hours in a day into 2 12-hour intervals starting at 8am and 8pm rather than 12am and 12pm.
Example:
x <- ymd_hms("2009-08-03 21:00:00")
y <- ymd_hms("2009-08-03 09:00:00")
floor_date(x, '12 hours')
floor_date(y, '12 hours')
# default lubridate output:
[1] "2009-08-03 12:00:00 UTC"
[1] "2009-08-03 UTC"
# what i would like to have:
[1] "2009-08-03 20:00:00 UTC"
[1] "2009-08-03 08:00:00 UTC"
You could program a small switch (without lubridate, though).
FUN <- function(x) {
s <- switch(which.min(abs(mapply(`-`, c(8, 20), as.numeric(substr(x, 12, 13))))),
"08:00:00", "20:00:00")
as.POSIXct(paste(as.Date(x), s))
}
FUN("2009-08-03 21:00:00")
# [1] "2009-08-03 20:00:00 CEST"
FUN("2009-08-03 09:00:00")
# [1] "2009-08-03 08:00:00 CEST"

Milliseconds separated by comma

I have the following data.frame called "data" (it is much larger but i just give the first lines as an example):
Timestamp Weight Degrees
1 30-09-2016 11:45:00,000 38.19 40.00
2 01-10-2016 06:19:57,860 39.12 40.00
3 01-10-2016 06:20:46,393 42.11 41.00
I would like to convert the "Timestamp" to a date/time vector including milliseconds. This seems to be a problem because the milliseconds are separated by a comma.
Also, data.frame has mode "list" and Timestamp has mode "character" which clearly aren't right...
I have tried data$Timestamp <- as.POSIXct(data$Timestamp,format='%d-%m-%Y %H:%M:%OS') but I only get "2016-09-30 11:42:00 UTC", without the milliseconds. The mode however becomes "numeric", which should be a step in the right direction. I only have set options(digits.secs=3).
I'd really appreciate your help. Thank you in advance.
x = c("30-09-2016 11:45:00,000", "01-10-2016 06:19:57,860", "01-10-2016 06:20:46,393")
format(as.POSIXct(gsub(",", ".", x), format='%d-%m-%Y %H:%M:%OS'), '%d-%m-%Y %H:%M:%OS3')
#[1] "30-09-2016 11:45:00.000" "01-10-2016 06:19:57.859" "01-10-2016 06:20:46.392"
OR
x = c("30-09-2016 11:45:00,000", "01-10-2016 06:19:57,860", "01-10-2016 06:20:46,393")
#Converting to POSIXct
options(digits.secs=3)
y = as.POSIXct(gsub(",", ".", x), format='%d-%m-%Y %H:%M:%OS', tz = "UTC")
y
#[1] "2016-09-30 11:45:00.000 UTC" "2016-10-01 06:19:57.859 UTC" "2016-10-01 06:20:46.392 UTC"
#Converting to numeric
as.numeric(y)
#[1] 1475253900 1475320798 1475320846
#Converting numeric back to POSIXct
as.POSIXct(as.numeric(y), origin = "1970-01-01", tz = "UTC")
#[1] "2016-09-30 11:45:00.000 UTC" "2016-10-01 06:19:57.859 UTC" "2016-10-01 06:20:46.392 UTC"
OR
x = c("30-09-2016 11:45:00,000", "01-10-2016 06:19:57,860", "01-10-2016 06:20:46,393")
library(lubridate)
options(digits.secs=3)
dmy_hms(gsub(",", ".", x))
#[1] "2016-09-30 11:45:00.000 UTC" "2016-10-01 06:19:57.860 UTC" "2016-10-01 06:20:46.393 UTC"

Modify dates in POSIXct format in R using lubridate

I have five dates in the following format:
five_dates <- c("2015-04-13 22:56:01 UTC", "2015-04-13 23:00:29 UTC", "2014-04-13 23:01:22 UTC", "2013-04-13 23:01:39 UTC", "2013-04-13 23:01:43 UTC")
Using the lubridate package, I processed them by doing the following:
five_dates <- lubridate::ymd_hms(five_dates)
str(five_dates)
[1] POSIXct[1:5], format: "2015-04-13 22:56:01" "2015-04-13 23:00:29" "2014-04-13 23:01:22" "2013-04-13 23:01:39" "2013-04-13 23:01:43"
I want to add one year to the dates in 2013:
five_dates <- ifelse(lubridate::year(five_dates) < 2014, five_dates + years(1), five_dates)
But doing so leads to this output:
five_dates
[1] 1428965761 1428966029 1397430082 1397430099 1397430103
How can I add one year to dates in 2013 so the output is also a date?
ifelse removes the date-formatting. You need to transform it back:
five_dates <- as.POSIXct(five_dates, origin="1970-01-01", tz = "UTC")
which gives:
> five_dates
[1] "2015-04-13 22:56:01 UTC" "2015-04-13 23:00:29 UTC"
[3] "2014-04-13 23:01:22 UTC" "2014-04-13 23:01:39 UTC"
[5] "2014-04-13 23:01:43 UTC"
An alternative for the ifelse operation which achieves the same:
five_dates <- five_dates + years(as.integer(year(five_dates) < 2014))
gives:
> five_dates
[1] "2015-04-13 22:56:01 UTC" "2015-04-13 23:00:29 UTC"
[3] "2014-04-13 23:01:22 UTC" "2014-04-13 23:01:39 UTC"
[5] "2014-04-13 23:01:43 UTC"
The problem is ifelse(). It strips attributes.
But since you are using the lubridate package anyway, why not use its year<- replacement function to replace the year with a different one? With it we can avoid ifelse() all together.
yr <- 2013
year(five_dates[year(five_dates) == yr]) <- yr + 1
five_dates
# [1] "2015-04-13 22:56:01 UTC" "2015-04-13 23:00:29 UTC"
# [3] "2014-04-13 23:01:22 UTC" "2014-04-13 23:01:39 UTC"
# [5] "2014-04-13 23:01:43 UTC"
Or using your code, you could grab the class before the ifelse() call, then assign it back.
cl <- class(five_dates)
five_dates <- ifelse(...)
class(five_dates) <- cl
Examples are shown in help(ifelse). But I think year<- will help you out more here since you are already using the lubridate package.

as.Date function not working

I have a huge data set I am working with. Some of the months are in the format 01/01/2010 and others are 1/1/2010.
When I run as.Date(Dates, format="%y/%d/%m") all of the latter dates change the year to 2020. What is going on here?
Your format statement is not correct. Try this:
d1 <- "01/01/2010"
d2 <- "1/1/2010"
> as.Date(d1, format='%d/%m/%Y')
#[1] "2010-01-01"
> as.Date(d2, format='%d/%m/%Y')
#[1] "2010-01-01"
For dates with different formats of the year, the lubridate package can be used:
library(lubridate)
d1 <- "1/1/10"
d2 <- "01/01/2010"
parse_date_time(d1, "dmy")
#[1] "2010-01-01 UTC"
parse_date_time(d2, "dmy")
#[1] "2010-01-01 UTC"

Find previous hour and next hour in R

Suppose I pass "2015-01-01 01:50:50", then it should return "2015-01-01 01:00:00" and "2015-01-01 02:00:00". How to calculate these values in R?
Assuming your time were a variable "X", you can use round or trunc.
Try:
round(X, "hour")
trunc(X, "hour")
This would still require some work to determine whether the values had actually been rounded up or down (for round). So, If you don't want to have to think about that, you can consider using the "lubridate" package:
X <- structure(c(1430050590.96162, 1430052390.96162), class = c("POSIXct", "POSIXt"))
X
# [1] "2015-04-26 17:46:30 IST" "2015-04-26 18:16:30 IST"
library(lubridate)
ceiling_date(X, "hour")
# [1] "2015-04-26 18:00:00 IST" "2015-04-26 19:00:00 IST"
floor_date(X, "hour")
# [1] "2015-04-26 17:00:00 IST" "2015-04-26 18:00:00 IST"
I would go with the following wrapper using base R (you can specify your time zone using the tz argument within the strptime function)
Myfunc <- function(x){x <- strptime(x, format = "%F %H") ; c(x, x + 3600L)}
Myfunc("2015-01-01 01:50:50")
## [1] "2015-01-01 01:00:00 IST" "2015-01-01 02:00:00 IST"

Resources