lets say we have this date "2014-05-11 14:45:00 UTC". I would like to get the exact POSIXct object for 1 year before so "2013-05-11 14:45:00 UTC".
My first thought is to create a whole new POSIXct object by subtracting one from the year bit and pasting it together with the remainder of the string and then creating a new POSIXct object with that string like so:
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
newTime <- as.POSIXct(paste(as.character(as.numeric(substr(time,1,4)) - 1),substr(time,5,19),sep=""),tz="UTC",origin="1970-01-01")
this works fine (except in case of leap years!) but the thing is I need to do this in a large data.table for each row and preferably put the results right back in data.table.
Is there any other way of subtracting a year off an object like this?
Some extra I need to apply this to a data.table like this one:
Time
1: 1349206200
2: 1349207100
3: 1349208000
4: 1349208900
5: 1349209800
6: 1349210700
7: 1349211600
8: 1349212500
9: 1349213400
10: 1349214300
11: 1349215200
but this happens when I do:
SOdata[,Time:=as.numeric(as.POSIXct(paste(as.character(as.numeric(substr(Time,1,4)) - 1),substr(Time,5,19),sep=""),tz="UTC",origin="1970-01-01"))]
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
I am guessing I need to use something like lapply, but I always mess up syntax when using that function. So does anyone know how?
lubridate is your friend.
library(lubridate)
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time-dyears(1)
#[1] "2013-05-11 14:45:00 UTC"
time+dyears(1)
#[1] "2015-05-11 14:45:00 UTC"
For leap years
> x <- as.POSIXct(c("2012-02-28", "2012-02-29"), tz="UTC",origin="1970-01-01")
> x - dyears(1)
[1] "2011-02-28 UTC" "2011-03-01 UTC"
I haven't tested the other answers, but the following should work as required regardless of leap years:
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time <- as.POSIXlt(time)
time$year <- time$year - 1
time <- as.POSIXct(time)
#[1] "2013-05-11 14:45:00 UTC"
With Gabor's leap year example:
time <- as.POSIXct("2012-02-29 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time <- as.POSIXlt(time)
time$year <- time$year - 1
time <- as.POSIXct(time)
#[1] "2011-03-01 14:45:00 UTC"
seq in base can be used:
LastYr <- function(x) seq(x, length = 2, by = "-1 year")[2]
toPOSIXct <- function(x) as.POSIXct(x, origin = "1970-01-01")
# example 1
LastYr(as.POSIXct("2012-02-28"))
## [1] "2011-02-28 EST"
# example 2 - leap year
LastYr(as.POSIXct("2012-02-29"))
## [1] "2011-03-01 EST"
# example 3 - vector case
x <- as.POSIXct(c("2012-02-28", "2012-02-29")) # test data
toPOSIXct(sapply(x, LastYr))
## [1] "2011-02-28 EST" "2011-03-01 EST"
# example 4 - data.table shown in question
DT[, Time := sapply(toPOSIXct(Time), LastYr)]
Revised simplified using functions LastYr and toPOSIXct.
or you can try, in base R :
> time + as.difftime(52*7+1,units="days")
[1] "2015-05-11 14:45:00 UTC"
> time - as.difftime(52*7+1,units="days")
[1] "2013-05-11 14:45:00 UTC"
of course, it would be easier if units could be years...
Related
I have a date that I convert to a numeric value and want to convert back to a date afterwards.
Converting date to numeric:
date1 = as.POSIXct('2017-12-30 15:00:00')
date1_num = as.numeric(date1)
# 1514646000
Reconverting numeric to date:
as.Date(date1_num, origin = '1/1/1970')
# "4146960-12-12"
What am I missing with the reconversion? I'd expect the last command to return my original date1.
As the numeric vector is created from an object with time component, reconversion can also be in the same way i.e. first to POSIXct and then wrap with as.Date
as.Date(as.POSIXct(date1_num, origin = '1970-01-01'))
#[1] "2017-12-30"
You could use anytime() and anydate() from the anytime package:
R> pt <- anytime("2017-12-30 15:00:00")
R> pt
[1] "2017-12-30 15:00:00 CST"
R>
R> anydate(pt)
[1] "2017-12-30"
R>
R> as.numeric(pt)
[1] 1514667600
R>
R> anydate(as.numeric(pt))
[1] "2017-12-30"
R>
POSIXct counts the number of seconds since the Unix Epoch, while Date counts the number of days. So you can recover the date by dividing by (60*60*24) (let's ignore leap seconds), or convert back to POSIXct instead.
as.Date(as.numeric(date1)/(60*60*24), origin="1970-01-01")
[1] "2017-12-30"
as.POSIXct(as.numeric(date1),origin="1970-01-01")
[1] "2017-12-30 15:00:00 GMT"
Using lubridate :
lubridate::as_datetime(1514646000)
[1] "2017-12-30 15:00:00 UTC"
While doing predicting modeling on timestamped data, I want to write a function in R (possibly using data.table) that rounds the date by X number of hours. E.g. rounding by 2 hours should give this:
"2014-12-28 22:59:00 EDT" becomes "2014-12-28 22:00:00 EDT"
"2014-12-28 23:01:00 EDT" becomes "2014-12-29 00:00:00 EDT"
It's very easy to do when you round by 1 hour - using round.POSIXt(.date, "hour") function.
Writing a generic function, like I'm doing below using multiple if statements, becomes quite ugly however:
d7.dateRoundByHour <- function (.date, byHours) {
if (byHours == 1)
return (round.POSIXt(.date, "hour"))
hh = hour(.date); dd = mday(.date); mm = month(.date); yy = year(.date)
hh = round(hh/byHours,digits=0) * byHours
if (hh>=24) {
hh=0; dd=dd+1
}
if ((mm==2 & dd==28) |
(mm %in% c(1,3,5,7,8,10,12) & dd==31) |
(mm %in% c(2,4,6,9,11) & dd==30)) { # NB: it won't work on 29 Feb leap year.
dd=1; mm=mm+1
}
if (mm==13) {
mm=1; yy=yy+1
}
str = sprintf("%i-%02.0f-%02.0f %02.0f:%02.0f:%02.0f EDT", yy,mm,dd, hh,0,0)
as.POSIXct(str, format="%Y-%m-%d %H:%M:%S")
}
Anyone can show a better way to do that?
(perhaps by converting to numeric and back to POSIXt or some other POSIXt functions?)
Use the round_date function from the lubridate package. Assuming you had a data.table with a column named date you could do the following:
dt[, date := round_date(date, '2 hours')]
A quick example will give you exactly the results you were looking for:
x <- as.POSIXct("2014-12-28 22:59:00 EDT")
round_date(x, '2 hours')
This is actually really easy with just base R. The basic idea for round by "odd lots" that you
scale down by an appropriate scale factor
round down to integer in the downscaled unit
scale back up and re-convert
Or in two R code statements:
R> pt <- as.POSIXct(c("2014-12-28 22:59:00", "2014-12-28 23:01:00 EDT"))
R> pt # just to check
[1] "2014-12-28 22:59:00 CST" "2014-12-28 23:01:00 CST"
R>
R> scalefactor <- 60*60*2 # 2 hours of 60 minutes times 60 seconds
R>
R> as.POSIXct(round(as.numeric(pt)/scalefactor) * scalefactor, origin="1970-01-01")
[1] "2014-12-28 22:00:00 CST" "2014-12-29 00:00:00 CST"
R>
The key last line just does what I outlined: convert the POSIXct to a numeric representation, scales it down, then rounds before scaling back up and converting to a POSIXct again.
While converting a dataframe to xts I realized that there is something wrong with the formatter. Here's an example dataframe:
effective_date price
"1990-01-01" "100"
"1990-01-02 00:05:00" "200"
This is example output from a package that I use.
Converting this to xts is straight-forward
xts(df["price"], order_by=as.POSIXct(df["effective_date"], format="%Y-%m-%d %H:%M:%S")
However this errors out, saying NAs can't be in row names, and the result is:
<NA> 100
1990-01-02 00:05:00 200
Obviously xts can't figure out what to do with the weird date there (midnight) and it won't coerce it.
If I add tz="UTC" to as.POSIXct it doesn't work. Additionally, as.POSIXlt doesnt change anything here either.
What can I do to coerce that midnight date to the correct format?
Two issues:
1) You cannot parse a date alone as POSIXct with a given format:
R> as.POSIXct(c("2017-01-02", "2017-01-03 04:05:06"), format="%Y-%m-%d %H:%M:%S")
[1] NA "2017-01-03 04:05:06 CST"
R>
2) You can however use the anytime() function to do it:
R> anytime::anytime(c("2017-01-02", "2017-01-03 04:05:06"))
[1] "2017-01-02 00:00:00 CST" "2017-01-03 04:05:06 CST"
R>
Once you have a POSIXct, forming the xts is easy.
Also note that you have typos: you need a comma before the column indicator: df[, "price"].
Edit: Getting a little tired of #42's comment about Gabor's (fine) solution "dominating" this one, so here's minimal benchmark:
R> library(microbenchmark)
R> v <- c("2017-01-02", "2017-01-03 04:05:06")
R> library(anytime)
R> print(microbenchmark(anytime(v), do.call("c", lapply(v, as.POSIXct))), digits=3)
Unit: microseconds
expr min lq mean median uq max neval cld
anytime(v) 33.6 36.8 42.1 45.6 46.6 80.7 100 a
do.call("c", lapply(v, as.POSIXct)) 571.5 579.1 586.4 586.8 589.5 695.7 100 b
R>
so in short "not really". It is using only R Base, which is a plus, put it is a) harder read and understand, b) more limited as it deals with exactly one format (in ISO style) and c) it is about thirteen times slower.
1) To get the "POSIXct" datetime vector try converting each datetime to "POSIXct" separately and then concatenate them together:
do.call("c", lapply(df$effective_date, as.POSIXct))
2) Another base solution that is even shorter and is also substantially faster is the following which relies on the fact that as.POSIXct will ignore junk at the end.
as.POSIXct(paste(df$effective, "00:00:00"))
Most of lubridate's parsing functions have a truncated parameter that takes a number indicating the number of elements that can be missing from the end. Missing elements will be replaced by zero.
Example with the data at hand:
lubridate::ymd_hms(c("2017-01-02", "2017-01-03 04:05:06"), truncated = 3)
## [1] "2017-01-02 00:00:00 UTC" "2017-01-03 04:05:06 UTC"
Assuming you want the timestamps, preprocess with something like:
temp <- c("1990-01-01", "1990-01-02 00:05:00")
# match a date string at the end of string (indicated by $). Replace
# with the full string (indicated by \\1 and 00:00:00
temp2 <- gsub("(\\d{4}\\-\\d{2}\\-\\d{2}$)", "\\1 00:00:00", temp)
# [1] "1990-01-01 00:00:00" "1990-01-02 00:05:00"
I'm trying to calculate a date - 1 (basically the day before the date) in R and when it converts it to a POSIXct it seems to subtract another date?
The column is of type POSIXct:
class(df$Date)
[1] "POSIXct" "POSIXt"
Here's the initial value:
> df[12,"Date"]
[1] "2016-03-09 EST"
If I just do as.Date and subtract one it works fine:
as.Date(df[12,"Date"]-1, tz="EST")
[1] "2016-03-08"
But I'm saving it back to the same column so it converts is back to as.POSIXct automatically (I think). And then I end up with March 7 in that column. And 7 pm. If I type it out here I get this:
as.POSIXct(as.Date(df[12,"Date"]-1, tz="EST"))
[1] "2016-03-07 19:00:00 EST"
I've tried using America/New York for the tz. I've tried the as.Date around just the df[12,"Date"] or around the whole thing including the -1... I have no clue what to do!
Thanks!
Use as.difftime to take away amounts specified in units of time. It will work consistently regardless of your data being in Date or POSIXct formats. E.g.:
This fails as you described:
x <- as.POSIXct(c("2016-03-09","2016-03-10"), tz="US/Eastern")
#[1] "2016-03-09 EST" "2016-03-10 EST"
x[1] <- as.Date(x[1]-1, tz="US/Eastern")
#[1] "2016-03-07 19:00:00 EST" "2016-03-10 00:00:00 EST"
This works:
x <- as.POSIXct(c("2016-03-09","2016-03-10"), tz="US/Eastern")
#[1] "2016-03-09 EST" "2016-03-10 EST"
x[1] <- x[1] - as.difftime(1, units="days")
#[1] "2016-03-08 EST" "2016-03-10 EST"
If you don't want to do what Frank mentioned in the comments,
You should consider using strptime instead of as.POSIXct
To ensure it returns in a POSIXct format use:
strptime(df[12,"Date"]-1,tz="EST",format="%Y-%m-%d")
I am seeing an unexpected result when using the lubridate package in R. I am simply trying to combine two dates into a vector. When I do so, the time zone changes. What is happening here?
> x <- ymd("2016-02-08")
> y <- ymd("2016-03-29")
> x
[1] "2016-02-08 UTC"
> y
[1] "2016-03-29 UTC"
> c(x,y)
[1] "2016-02-07 18:00:00 CST" "2016-03-28 19:00:00 CDT"
Using c() will remove the timezone attribute. Hence you have to reassign it:
xy <- c(x,y)
attr(xy, "tzone") <- "UTC"
> xy
[1] "2016-02-08 UTC" "2016-03-29 UTC"
Source and more information: Peter Ehlers on R Help