vector of POSIXct and sapply - r

What if you want to apply a function other than format to a list of POSIXct objects? For instance, say I want to take a vector of times, truncate those times to the hour, and apply an arbitrary function to each one of those times.
> obs.times=as.POSIXct(c('2010-01-02 12:37:45','2010-01-02 08:45:45','2010-01-09 14:45:53'))
> obs.truncated=trunc(obs.times, units="hours")
> obs.truncated
[1] "2010-01-02 12:00:00 EST" "2010-01-02 08:00:00 EST"
[3] "2010-01-09 14:00:00 EST"
Now, I would expect the length of obs.truncated to be 3 but
> length(obs.truncated)
[1] 9
So you can see that trying to apply a function to this vector is not going to work. The class of obs.truncated is
> class(obs.truncated)
[1] "POSIXt" "POSIXlt"
Any idea what is going on here? apply and length appear to be taking the first element of the vector as its own list.

The length() of such a POSIXlt used to be reported as nine, but that got recently corrected.
Also, when I do trunc(obs.times) the wrong thing happens -- trunc() operates only once on a string of three elements. you do need apply() et al.
So here is an example of using sapply() with component-wise resetting:
> sapply(obs.times, function(.) {
+ p <- as.POSIXlt(.);
+ p$min <- p$sec <- 0;
+ format(p) })
[1] "2010-01-02 12:00:00" "2010-01-02 08:00:00" "2010-01-09 14:00:00"
>
Whereas
> trunc(obs.times, units="hours")
[1] "2010-01-02 12:00:00 CST" "2010-01-02 08:00:00 CST"
[3] "2010-01-09 14:00:00 CST"
> class(trunc(obs.times, units="hours"))
[1] "POSIXt" "POSIXlt"
> length(trunc(obs.times, units="hours"))
[1] 1
>

Related

Mean of date interval in lubridate

How do you get the 'mean' (or more accurately the midpoint) of a lubridate interval object? I've tried using the base mean() function but that return back a double.
library(lubridate)
ex = interval(ymd("2009-05-01"), ymd("2009-07-01"))
mean(ex)
[1] 5270400
One option would be to just add to the starting date the half of the duration of interval. Try out:
ex#start + as.duration(ex)/2
[1] "2009-05-31 12:00:00 UTC"
You could use the int_start() and int_end() functions to calculate the midpoint of an interval:
library(lubridate)
int_start(ex) + (int_end(ex) - int_start(ex))/2
[1] "2009-05-31 12:00:00 UTC"
You could make this into a simple function for simplicity:
int_midpoint <- function(interval) {
int_start(interval) + (int_end(interval) - int_start(interval))/2
}
int_midpoint(ex)
[1] "2009-05-31 12:00:00 UTC"
This function will also work with lapply (though not sapply) on a vector:
lapply(c(ex, ex), int_midpoint)
[[1]]
[1] "2009-05-31 12:00:00 UTC"
[[2]]
[1] "2009-05-31 12:00:00 UTC"

Converting date to as.POSIXct in R and it subtracts two dates

I'm trying to calculate a date - 1 (basically the day before the date) in R and when it converts it to a POSIXct it seems to subtract another date?
The column is of type POSIXct:
class(df$Date)
[1] "POSIXct" "POSIXt"
Here's the initial value:
> df[12,"Date"]
[1] "2016-03-09 EST"
If I just do as.Date and subtract one it works fine:
as.Date(df[12,"Date"]-1, tz="EST")
[1] "2016-03-08"
But I'm saving it back to the same column so it converts is back to as.POSIXct automatically (I think). And then I end up with March 7 in that column. And 7 pm. If I type it out here I get this:
as.POSIXct(as.Date(df[12,"Date"]-1, tz="EST"))
[1] "2016-03-07 19:00:00 EST"
I've tried using America/New York for the tz. I've tried the as.Date around just the df[12,"Date"] or around the whole thing including the -1... I have no clue what to do!
Thanks!
Use as.difftime to take away amounts specified in units of time. It will work consistently regardless of your data being in Date or POSIXct formats. E.g.:
This fails as you described:
x <- as.POSIXct(c("2016-03-09","2016-03-10"), tz="US/Eastern")
#[1] "2016-03-09 EST" "2016-03-10 EST"
x[1] <- as.Date(x[1]-1, tz="US/Eastern")
#[1] "2016-03-07 19:00:00 EST" "2016-03-10 00:00:00 EST"
This works:
x <- as.POSIXct(c("2016-03-09","2016-03-10"), tz="US/Eastern")
#[1] "2016-03-09 EST" "2016-03-10 EST"
x[1] <- x[1] - as.difftime(1, units="days")
#[1] "2016-03-08 EST" "2016-03-10 EST"
If you don't want to do what Frank mentioned in the comments,
You should consider using strptime instead of as.POSIXct
To ensure it returns in a POSIXct format use:
strptime(df[12,"Date"]-1,tz="EST",format="%Y-%m-%d")

Why does lubridate appear to change time zones for two dates combined into a vector?

I am seeing an unexpected result when using the lubridate package in R. I am simply trying to combine two dates into a vector. When I do so, the time zone changes. What is happening here?
> x <- ymd("2016-02-08")
> y <- ymd("2016-03-29")
> x
[1] "2016-02-08 UTC"
> y
[1] "2016-03-29 UTC"
> c(x,y)
[1] "2016-02-07 18:00:00 CST" "2016-03-28 19:00:00 CDT"
Using c() will remove the timezone attribute. Hence you have to reassign it:
xy <- c(x,y)
attr(xy, "tzone") <- "UTC"
> xy
[1] "2016-02-08 UTC" "2016-03-29 UTC"
Source and more information: Peter Ehlers on R Help

Subtract exactly one year from a POSIXct object

lets say we have this date "2014-05-11 14:45:00 UTC". I would like to get the exact POSIXct object for 1 year before so "2013-05-11 14:45:00 UTC".
My first thought is to create a whole new POSIXct object by subtracting one from the year bit and pasting it together with the remainder of the string and then creating a new POSIXct object with that string like so:
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
newTime <- as.POSIXct(paste(as.character(as.numeric(substr(time,1,4)) - 1),substr(time,5,19),sep=""),tz="UTC",origin="1970-01-01")
this works fine (except in case of leap years!) but the thing is I need to do this in a large data.table for each row and preferably put the results right back in data.table.
Is there any other way of subtracting a year off an object like this?
Some extra I need to apply this to a data.table like this one:
Time
1: 1349206200
2: 1349207100
3: 1349208000
4: 1349208900
5: 1349209800
6: 1349210700
7: 1349211600
8: 1349212500
9: 1349213400
10: 1349214300
11: 1349215200
but this happens when I do:
SOdata[,Time:=as.numeric(as.POSIXct(paste(as.character(as.numeric(substr(Time,1,4)) - 1),substr(Time,5,19),sep=""),tz="UTC",origin="1970-01-01"))]
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
I am guessing I need to use something like lapply, but I always mess up syntax when using that function. So does anyone know how?
lubridate is your friend.
library(lubridate)
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time-dyears(1)
#[1] "2013-05-11 14:45:00 UTC"
time+dyears(1)
#[1] "2015-05-11 14:45:00 UTC"
For leap years
> x <- as.POSIXct(c("2012-02-28", "2012-02-29"), tz="UTC",origin="1970-01-01")
> x - dyears(1)
[1] "2011-02-28 UTC" "2011-03-01 UTC"
I haven't tested the other answers, but the following should work as required regardless of leap years:
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time <- as.POSIXlt(time)
time$year <- time$year - 1
time <- as.POSIXct(time)
#[1] "2013-05-11 14:45:00 UTC"
With Gabor's leap year example:
time <- as.POSIXct("2012-02-29 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time <- as.POSIXlt(time)
time$year <- time$year - 1
time <- as.POSIXct(time)
#[1] "2011-03-01 14:45:00 UTC"
seq in base can be used:
LastYr <- function(x) seq(x, length = 2, by = "-1 year")[2]
toPOSIXct <- function(x) as.POSIXct(x, origin = "1970-01-01")
# example 1
LastYr(as.POSIXct("2012-02-28"))
## [1] "2011-02-28 EST"
# example 2 - leap year
LastYr(as.POSIXct("2012-02-29"))
## [1] "2011-03-01 EST"
# example 3 - vector case
x <- as.POSIXct(c("2012-02-28", "2012-02-29")) # test data
toPOSIXct(sapply(x, LastYr))
## [1] "2011-02-28 EST" "2011-03-01 EST"
# example 4 - data.table shown in question
DT[, Time := sapply(toPOSIXct(Time), LastYr)]
Revised simplified using functions LastYr and toPOSIXct.
or you can try, in base R :
> time + as.difftime(52*7+1,units="days")
[1] "2015-05-11 14:45:00 UTC"
> time - as.difftime(52*7+1,units="days")
[1] "2013-05-11 14:45:00 UTC"
of course, it would be easier if units could be years...

Round a POSIX date (POSIXct) with base R functionality

I'm currently playing around a lot with dates and times for a package I'm building.
Stumbling across this post reminded me again that it's generally not a bad idea to check out if something can be done with basic R features before turning to contrib packages.
Thus, is it possible to round a date of class POSIXct with base R functionality?
I checked
methods(round)
which "only" gave me
[1] round.Date round.timeDate*
Non-visible functions are asterisked
This is what I'd like to do (Pseudo Code)
x <- as.POSIXct(Sys.time())
[1] "2012-07-04 10:33:55 CEST"
round(x, atom="minute")
[1] "2012-07-04 10:34:00 CEST"
round(x, atom="hour")
[1] "2012-07-04 11:00:00 CEST"
round(x, atom="day")
[1] "2012-07-04 CEST"
I know this can be done with timeDate, lubridate etc., but I'd like to keep package dependencies down. So before going ahead and checking out the source code of the respective packages, I thought I'd ask if someone has already done something like this.
base has round.POSIXt to do this. Not sure why it doesn't come up with methods.
x <- as.POSIXct(Sys.time())
x
[1] "2012-07-04 10:01:08 BST"
round(x,"mins")
[1] "2012-07-04 10:01:00 BST"
round(x,"hours")
[1] "2012-07-04 10:00:00 BST"
round(x,"days")
[1] "2012-07-04"
On this theme with lubridate, also look into the ceiling_date() and floor_date() functions:
x <- as.POSIXct("2009-08-03 12:01:59.23")
ceiling_date(x, "second")
# "2009-08-03 12:02:00 CDT"
ceiling_date(x, "hour")
# "2009-08-03 13:00:00 CDT"
ceiling_date(x, "day")
# "2009-08-04 CDT"
ceiling_date(x, "week")
# "2009-08-09 CDT"
ceiling_date(x, "month")
# "2009-09-01 CDT"
If you don't want to call external libraries and want to keep POSIXct as I do this is one idea (inspired by this question): use strptime and paste a fake month and day. It should be possible to do it more straight forward, as said in this comment
"For strptime the input string need not specify the date completely:
it is assumed that unspecified seconds, minutes or hours are zero, and
an unspecified year, month or day is the current one."
Thus it seems that you have to use strftime to output a truncated string, paste the missing part and convert again in POSIXct.
This is how an update answer could look:
x <- as.POSIXct(Sys.time())
x
[1] "2018-12-27 10:58:51 CET"
round(x,"mins")
[1] "2018-12-27 10:59:00 CET"
round(x,"hours")
[1] "2018-12-27 11:00:00 CET"
round(x,"days")
[1] "2018-12-27 CET"
as.POSIXct(paste0(strftime(x,format="%Y-%m"),"-01")) #trunc by month
[1] "2018-12-01 CET"
as.POSIXct(paste0(strftime(x,format="%Y"),"-01-01")) #trunc by year
[1] "2018-01-01 CET"

Resources