Modify dates in POSIXct format in R using lubridate - r

I have five dates in the following format:
five_dates <- c("2015-04-13 22:56:01 UTC", "2015-04-13 23:00:29 UTC", "2014-04-13 23:01:22 UTC", "2013-04-13 23:01:39 UTC", "2013-04-13 23:01:43 UTC")
Using the lubridate package, I processed them by doing the following:
five_dates <- lubridate::ymd_hms(five_dates)
str(five_dates)
[1] POSIXct[1:5], format: "2015-04-13 22:56:01" "2015-04-13 23:00:29" "2014-04-13 23:01:22" "2013-04-13 23:01:39" "2013-04-13 23:01:43"
I want to add one year to the dates in 2013:
five_dates <- ifelse(lubridate::year(five_dates) < 2014, five_dates + years(1), five_dates)
But doing so leads to this output:
five_dates
[1] 1428965761 1428966029 1397430082 1397430099 1397430103
How can I add one year to dates in 2013 so the output is also a date?

ifelse removes the date-formatting. You need to transform it back:
five_dates <- as.POSIXct(five_dates, origin="1970-01-01", tz = "UTC")
which gives:
> five_dates
[1] "2015-04-13 22:56:01 UTC" "2015-04-13 23:00:29 UTC"
[3] "2014-04-13 23:01:22 UTC" "2014-04-13 23:01:39 UTC"
[5] "2014-04-13 23:01:43 UTC"
An alternative for the ifelse operation which achieves the same:
five_dates <- five_dates + years(as.integer(year(five_dates) < 2014))
gives:
> five_dates
[1] "2015-04-13 22:56:01 UTC" "2015-04-13 23:00:29 UTC"
[3] "2014-04-13 23:01:22 UTC" "2014-04-13 23:01:39 UTC"
[5] "2014-04-13 23:01:43 UTC"

The problem is ifelse(). It strips attributes.
But since you are using the lubridate package anyway, why not use its year<- replacement function to replace the year with a different one? With it we can avoid ifelse() all together.
yr <- 2013
year(five_dates[year(five_dates) == yr]) <- yr + 1
five_dates
# [1] "2015-04-13 22:56:01 UTC" "2015-04-13 23:00:29 UTC"
# [3] "2014-04-13 23:01:22 UTC" "2014-04-13 23:01:39 UTC"
# [5] "2014-04-13 23:01:43 UTC"
Or using your code, you could grab the class before the ifelse() call, then assign it back.
cl <- class(five_dates)
five_dates <- ifelse(...)
class(five_dates) <- cl
Examples are shown in help(ifelse). But I think year<- will help you out more here since you are already using the lubridate package.

Related

R - Prevent aggregate function from converting date time timezones to local time?

Is there a way to stop aggregate converting datetimes to the computer's local timezone? For example:
dtUTC <- as.POSIXct(c('2010-01-01 01:01:01', '2015-01-02 07:23:11',
'2016-06-02 05:23:41', '2018-01-08 17:57:43'), tz='UTC')
groups <- c(1,1,2,2)
result <- aggregate(dtUTC, by=list(groups), FUN=min)
The result is converted to my computers local timezone.
> dtUTC
[1] "2010-01-01 01:01:01 UTC" "2015-01-02 07:23:11 UTC" "2016-06-02 05:23:41 UTC"
[4] "2018-01-08 17:57:43 UTC"
> result$x
[1] "2010-01-01 12:01:01 AEDT" "2016-06-02 15:23:41 AEST"
I can convert it back post hoc but this is an annoying extra step to have to do. Especially if I have multiple datetime columns.
attr(result$x, 'tzone') <- 'UTC'
> result$x
[1] "2010-01-01 01:01:01 UTC" "2016-06-02 05:23:41 UTC"
I can't find anything that you can do with aggregate to change this behavior, but you can set your environment's TZ so any date-times will automatically be in UTC:
Sys.setenv(TZ='UTC') # <- set your TZ here
dtUTC <- as.POSIXct(c('2010-01-01 01:01:01', '2015-01-02 07:23:11',
'2016-06-02 05:23:41', '2018-01-08 17:57:43'))
groups <- c(1,1,2,2)
df <- data.frame(dtUTC, groups)
result <- aggregate(dtUTC ~ groups, df, min)
result$dtUTC
# [1] "2010-01-01 01:01:01 UTC" "2016-06-02 05:23:41 UTC"
you can use dplyr package to aggregate
library(lubridate)
library(dplyr)
dtUTC <- as.POSIXct(c('2010-01-01 01:01:01', '2015-01-02 07:23:11',
'2016-06-02 05:23:41', '2018-01-08 17:57:43'), tz='UTC')
groups <- c(1,1,2,2)
b<-data.frame(date= dtUTC, group = groups) %>% group_by(group) %>% dplyr::summarise(min = min(date))
b$min
> b$min
[1] "2010-01-01 01:01:01 UTC" "2016-06-02 05:23:41 UTC"

parse_date_time() converting DayofYear in date

Hi I'm using the lubridate package and
I want to convert a vector from 1:365 (day of year) in a date format:
e.g. 60 -> 2019-03-01 UTC.
For 1-99 it works fine, but for 100-365 I get a warning massage.
lubridate::parse_date_time(99, "j")
[1] "2019-04-09 UTC"
lubridate::parse_date_time(100:365, "j")
[1] NA ...
[365] NA
Warning message:
All formats failed to parse. No formats found.
Gets anyone the same warning massage or has a solution?
If you provide character input, it works well
lubridate::parse_date_time('100', "j")
# [1] "2019-04-10 UTC"
lubridate::parse_date_time(paste(100:365), "j")
# [1] "2019-04-10 UTC" "2019-04-11 UTC" "2019-04-12 UTC" "2019-04-13 UTC" "2019-04-14 UTC" "2019-04-15 UTC" "2019-04-16 UTC" "2019-04-17 UTC"
# ...
# [265] "2019-12-30 UTC" "2019-12-31 UTC
you can easily do it with specifying origin date using
as.Date(100:365, format = "%j", origin = "01-01-2019")

Why does the lubridate::ymd_hms function add an NA observation when the "silent" argument is set TRUE?

Could any one explain why the "silent=T" argument triggers a warning and an NA observation, and tell me how to avoid this?
x <- c("2010-04-14-04-35-59", "20100401120000")
ymd_hms(x, silent=T)
[1] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC" NA
Warning message:
1 failed to parse.
R version 3.4.0, lubridate version 1.6.0
Here, lubridate tries to evaluate "silent=T" as a date format, the argument for removing message being quiet.
lubridate::ymd_hms(x, quiet=TRUE)
[1] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
This is because you can pass vector inside a lubridate function :
x <- c("2010-04-14-04-35-59", "20100401120000")
y <- c("2010-04-14-04-35-59", "20100401120000")
z <- c("2010-04-14-04-35-59", "20100401120000")
lubridate::ymd_hms(x, y, z)
[1] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
[3] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
[5] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
So here, with silent=T, you're telling lubridate that silent=T is a vector to parse. Hence the NA.
I faced this issue for cases where the format is different. Please see that all the dates are following the same format. Using parse_date_time() can solve this problem.
parse_date_time(df$date, c("y/m/d","y/m/d HMS","m/d/y","m/d/y HM"))
Please be sure that the date format is contained in the list.

Get date of timeseries object

I have a separately created time series object with daily frequency:
my.timeseries= ts(data= 1:10, start= c(2014,1,1), frequency = 365.25)
How can I get back the dates as POSIXct vector ("2014-01-01 UTC" ...) from this time series object?
Here's one potential method. I'm not really sure if it should be done this way, but it seems to work.
With your existing time series, try
p <- paste(attr(my.timeseries, "tsp")[1], my.timeseries)
as.POSIXct(as.Date(p, "%Y %j"))
# [1] "2014-01-01 UTC" "2014-01-02 UTC" "2014-01-03 UTC"
# [4] "2014-01-04 UTC" "2014-01-05 UTC" "2014-01-06 UTC"
# [7] "2014-01-07 UTC" "2014-01-08 UTC" "2014-01-09 UTC"
# [10] "2014-01-10 UTC"
As noted by G. Grothendieck in the comments, here is a more general solution
p <- paste(start(my.timeseries), seq_along(my.timeseries))
as.Date(p, "%Y %j")
# [1] "2014-01-01" "2014-01-02" "2014-01-03" "2014-01-04"
# [5] "2014-01-05" "2014-01-06" "2014-01-07" "2014-01-08"
# [9] "2014-01-09" "2014-01-10"
as.Date might be better to avoid any time-zone issues.
I strongly advise you to use xts object instead of ts.
Here is a code replicating what you want :
library(xts)
my.index = seq(from = as.Date("2014-01-01"), by = "day", length.out = 10)
my.timeseries = xts(x = 1:10, order.by = my.index)
index(my.timeseries)
Let us know if that helps :)
Romain

How to create a range of dates in R

From two integers (1, 5) one can create a range in the following way
1:5
[1] 1 2 3 4 5
How can you make a range of dates if you are give two dates ("2014-09-04 JST", "2014-09-11 JST")
The output must be
[1] ("2014-09-04 JST", "2014-09-05 JST", "2014-09-06 JST", "2014-09-07 JST", "2014-09-08 JST")
Does this help?
seq(as.Date("2014/09/04"), by = "day", length.out = 5)
# [1] "2014-09-04" "2014-09-05" "2014-09-06" "2014-09-07" "2014-09-08"
edit: adding in something about timezones
this works for my current timezone
seq(c(ISOdate(2014,4,9)), by = "DSTday", length.out = 5)
#[1] "2014-04-09 08:00:00 EDT" "2014-04-10 08:00:00 EDT" "2014-04-11 08:00:00 EDT" "2014-04-12 08:00:00 EDT"
#[5] "2014-04-13 08:00:00 EDT"
edit2:
OlsonNames() # I used this to find out what to write for the JST tz - it's "Japan"
x <- as.POSIXct("2014-09-04 23:59:59", tz="Japan")
format(seq(x, by="day", length.out=5), "%Y-%m-%d %Z")
# [1] "2014-09-04 JST" "2014-09-05 JST" "2014-09-06 JST" "2014-09-07 JST" "2014-09-08 JST"
To get a sequence of dates ( days, weeks,.. ) using only start and end dates you can use:
seq(as.Date("2014/1/1"), as.Date("2014/1/10"), "days”)
[1] "2014-01-01" "2014-01-02" "2014-01-03" "2014-01-04" "2014-01-05" "2014-01-06" "2014-01-07"
[8] "2014-01-08" "2014-01-09" "2014-01-10”
Here's an answer, admittedly worse than #jalapic's, that doesn't use seq and instead uses a for loop:
date1 <- "2014-09-04"
date2 <- "2014-09-11"
dif <- as.numeric(abs(as.Date(date1) - as.Date(date2)))
dates <- vector()
for (i in 1:dif) {
date <- (as.Date(date1) + i)
dates <- append(dates, date)
}
# [1] "2014-09-05" "2014-09-06" "2014-09-07" "2014-09-08" "2014-09-09" "2014-09-10" "2014-09-11
here's a shot though the timezone JST isn't recognized by my system
d1<-ISOdate(year=2014,month=9,day=4,tz="GMT")
seq(from=d1,by="day",length.out=5)
[1] "2014-09-04 12:00:00 GMT" "2014-09-05 12:00:00 GMT" "2014-09-06 12:00:00 GMT" "2014-09-07 12:00:00 GMT" "2014-09-08 12:00:00 GMT"
While using seq(date1, date2, "days") is by far the better option in nearly all cases, I'd just like to add, that the following works too, if you need a range of dates that are n_number of days from a date:
1:10 + as.Date("2020-01-01")
# [1] "2020-01-02" "2020-01-03" "2020-01-04" "2020-01-05"
# [5] "2020-01-06" "2020-01-07" "2020-01-08" "2020-01-09"
# [9] "2020-01-10" "2020-01-11"

Resources