While doing predicting modeling on timestamped data, I want to write a function in R (possibly using data.table) that rounds the date by X number of hours. E.g. rounding by 2 hours should give this:
"2014-12-28 22:59:00 EDT" becomes "2014-12-28 22:00:00 EDT"
"2014-12-28 23:01:00 EDT" becomes "2014-12-29 00:00:00 EDT"
It's very easy to do when you round by 1 hour - using round.POSIXt(.date, "hour") function.
Writing a generic function, like I'm doing below using multiple if statements, becomes quite ugly however:
d7.dateRoundByHour <- function (.date, byHours) {
if (byHours == 1)
return (round.POSIXt(.date, "hour"))
hh = hour(.date); dd = mday(.date); mm = month(.date); yy = year(.date)
hh = round(hh/byHours,digits=0) * byHours
if (hh>=24) {
hh=0; dd=dd+1
}
if ((mm==2 & dd==28) |
(mm %in% c(1,3,5,7,8,10,12) & dd==31) |
(mm %in% c(2,4,6,9,11) & dd==30)) { # NB: it won't work on 29 Feb leap year.
dd=1; mm=mm+1
}
if (mm==13) {
mm=1; yy=yy+1
}
str = sprintf("%i-%02.0f-%02.0f %02.0f:%02.0f:%02.0f EDT", yy,mm,dd, hh,0,0)
as.POSIXct(str, format="%Y-%m-%d %H:%M:%S")
}
Anyone can show a better way to do that?
(perhaps by converting to numeric and back to POSIXt or some other POSIXt functions?)
Use the round_date function from the lubridate package. Assuming you had a data.table with a column named date you could do the following:
dt[, date := round_date(date, '2 hours')]
A quick example will give you exactly the results you were looking for:
x <- as.POSIXct("2014-12-28 22:59:00 EDT")
round_date(x, '2 hours')
This is actually really easy with just base R. The basic idea for round by "odd lots" that you
scale down by an appropriate scale factor
round down to integer in the downscaled unit
scale back up and re-convert
Or in two R code statements:
R> pt <- as.POSIXct(c("2014-12-28 22:59:00", "2014-12-28 23:01:00 EDT"))
R> pt # just to check
[1] "2014-12-28 22:59:00 CST" "2014-12-28 23:01:00 CST"
R>
R> scalefactor <- 60*60*2 # 2 hours of 60 minutes times 60 seconds
R>
R> as.POSIXct(round(as.numeric(pt)/scalefactor) * scalefactor, origin="1970-01-01")
[1] "2014-12-28 22:00:00 CST" "2014-12-29 00:00:00 CST"
R>
The key last line just does what I outlined: convert the POSIXct to a numeric representation, scales it down, then rounds before scaling back up and converting to a POSIXct again.
Related
I have a list of a series of dates, with mixed precision. Most have the format "1930-02-06T10:00:00", but a few have the format "2130-02-06" which I want to treat as 2130-02-06T00:00:00.
When I use
df$date <- as.POSIXct(df$date,tz=Sys.timezone())
I lose times from the data because some of the datetimes are missing time. I can write a little conversion routine
fixDateTime <- function (s) {
if(nchar(s) == 10) {
return (paste(s, "00:00:00"));
} else {
return (str_replace(s,"T", " "));
}
}
and then do
df$DATET <- as.POSIXct(fixDateTime(df$date),tz=Sys.timezone())
But that doesn't work because fixDateTime is actually given an array and I don't know how to adapt for that. I'm not sure which way to try to solve this. (and I'm sure this shows how newbie I am to R)
You can work with your fixDateTime function if you use ifelse which can handle vectors instead of if/else which works for scalars. Keeping everything in base R, we can do
fixDateTime <- function (s) {
ifelse(nchar(s) == 10, paste(s, "00:00:00"), sub("T", " ", s))
}
and then use it in as.POSIXct
as.POSIXct(fixDateTime(x), tz = "UTC")
#[1] "1930-02-06 10:00:00 UTC" "2130-02-06 00:00:00 UTC"
data
x <- c("1930-02-06T10:00:00", "2130-02-06")
Turns out lubridate is all you need:
library(lubridate)
data <- c("1930-02-06T10:00:00", "2130-02-06")
ymd_hms(data, truncated = 3)
#> [1] "1930-02-06 10:00:00 UTC" "2130-02-06 00:00:00 UTC"
Created on 2019-11-15 by the reprex package (v0.3.0)
#Ronak's answer is good as it uses just base R. Another solution is offered by the anytime() function of the anytime -- it does not need any formats.
R> library(anytime)
R> anytime(c("1930-02-06T10:00:00", "2130-02-06")) # localtime by default
[1] "1930-02-06 10:00:00 CST" "2130-02-06 00:00:00 CST"
R> anytime(c("1930-02-06T10:00:00", "2130-02-06"), tz="UTC", asUTC=TRUE) #override
[1] "1930-02-06 10:00:00 UTC" "2130-02-06 00:00:00 UTC"
R>
So you can have it as UTC, or in your local time.
The main key is that not giving hours:minutes:seconds is generally seen as midnight when you parse to datetime rather than date. So you may not need a helper function
I have a date that I convert to a numeric value and want to convert back to a date afterwards.
Converting date to numeric:
date1 = as.POSIXct('2017-12-30 15:00:00')
date1_num = as.numeric(date1)
# 1514646000
Reconverting numeric to date:
as.Date(date1_num, origin = '1/1/1970')
# "4146960-12-12"
What am I missing with the reconversion? I'd expect the last command to return my original date1.
As the numeric vector is created from an object with time component, reconversion can also be in the same way i.e. first to POSIXct and then wrap with as.Date
as.Date(as.POSIXct(date1_num, origin = '1970-01-01'))
#[1] "2017-12-30"
You could use anytime() and anydate() from the anytime package:
R> pt <- anytime("2017-12-30 15:00:00")
R> pt
[1] "2017-12-30 15:00:00 CST"
R>
R> anydate(pt)
[1] "2017-12-30"
R>
R> as.numeric(pt)
[1] 1514667600
R>
R> anydate(as.numeric(pt))
[1] "2017-12-30"
R>
POSIXct counts the number of seconds since the Unix Epoch, while Date counts the number of days. So you can recover the date by dividing by (60*60*24) (let's ignore leap seconds), or convert back to POSIXct instead.
as.Date(as.numeric(date1)/(60*60*24), origin="1970-01-01")
[1] "2017-12-30"
as.POSIXct(as.numeric(date1),origin="1970-01-01")
[1] "2017-12-30 15:00:00 GMT"
Using lubridate :
lubridate::as_datetime(1514646000)
[1] "2017-12-30 15:00:00 UTC"
I have several variables that exist in the following format:
/Date(1353020400000+0100)/
I want to convert this format to ddmmyyyy. I found this solution for the same problem using php, but I don't know anything about php, so I'm unable to convert that solution to what I need, which is a solution that I can use in R.
Any suggestions?
Thanks.
If the format is milliseconds since the epoch then anytime() or as.POSIXct() can help you:
R> anytime(1353020400000/1000)
[1] "2012-11-15 17:00:00 CST"
R> anytime(1353020400.000)
[1] "2012-11-15 17:00:00 CST"
R>
anytime() converts to local time, which is Chicago for me. You would have to deal with the UTC offset separately.
Base R can do it too, but you need the dreaded origin:
R> as.POSIXct(1353020400.000, origin="1970-01-01")
[1] "2012-11-15 17:00:00 CST"
R>
As far as I can tell from the linked question, this is milliseconds since the epoch:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "[()+]")
as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
#[1] "2012-11-15 23:00:00 UTC"
If you want to pick up the timezone difference as well, here's an attempt:
x <- "/Date(1353020400000+0100)/"
spl <- strsplit(x, "(?=[+-])|[()]", perl=TRUE)
tzo <- sapply(spl, function(x) paste(x[3:4],collapse="") )
dt <- as.POSIXct(as.numeric(sapply(spl,`[[`,2)) / 1000, origin="1970-01-01", tz="UTC")
as.POSIXct(paste(format(dt), tzo), tz="UTC", format = '%F %T %z')
#[1] "2012-11-15 22:00:00 UTC"
The package lubridate can come to the rescue as follows:
as.Date("1970-01-01") + lubridate::milliseconds(1353020400000)
Read: Number of milliseconds since epoch (= 1. January 1970, UTC + 0)
A parsing function can now be made using regular expressions:
parse.myDate <- function(text) {
num <- as.numeric(stringr::str_extract(text, "(?<=/Date\\()\\d+"))
as.Date("1970-01-01") + lubridate::milliseconds(num)
}
finally, format the Date with
format(theDate, "%d/%m/%Y %H:%M")
If you also need the time zone information, you can use this instead:
parse.myDate <- function(text) {
parts <- stringr::str_match(text, "^/Date\\((\\d+)([+-])(\\d{4})\\)/$")
as.POSIXct(as.numeric(parts[,2])/1000, origin = "1970-01-01", tz = paste0("Etc/GMT", parts[,3], as.integer(parts[,4])/100))
}
lets say we have this date "2014-05-11 14:45:00 UTC". I would like to get the exact POSIXct object for 1 year before so "2013-05-11 14:45:00 UTC".
My first thought is to create a whole new POSIXct object by subtracting one from the year bit and pasting it together with the remainder of the string and then creating a new POSIXct object with that string like so:
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
newTime <- as.POSIXct(paste(as.character(as.numeric(substr(time,1,4)) - 1),substr(time,5,19),sep=""),tz="UTC",origin="1970-01-01")
this works fine (except in case of leap years!) but the thing is I need to do this in a large data.table for each row and preferably put the results right back in data.table.
Is there any other way of subtracting a year off an object like this?
Some extra I need to apply this to a data.table like this one:
Time
1: 1349206200
2: 1349207100
3: 1349208000
4: 1349208900
5: 1349209800
6: 1349210700
7: 1349211600
8: 1349212500
9: 1349213400
10: 1349214300
11: 1349215200
but this happens when I do:
SOdata[,Time:=as.numeric(as.POSIXct(paste(as.character(as.numeric(substr(Time,1,4)) - 1),substr(Time,5,19),sep=""),tz="UTC",origin="1970-01-01"))]
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
I am guessing I need to use something like lapply, but I always mess up syntax when using that function. So does anyone know how?
lubridate is your friend.
library(lubridate)
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time-dyears(1)
#[1] "2013-05-11 14:45:00 UTC"
time+dyears(1)
#[1] "2015-05-11 14:45:00 UTC"
For leap years
> x <- as.POSIXct(c("2012-02-28", "2012-02-29"), tz="UTC",origin="1970-01-01")
> x - dyears(1)
[1] "2011-02-28 UTC" "2011-03-01 UTC"
I haven't tested the other answers, but the following should work as required regardless of leap years:
time <- as.POSIXct("2014-05-11 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time <- as.POSIXlt(time)
time$year <- time$year - 1
time <- as.POSIXct(time)
#[1] "2013-05-11 14:45:00 UTC"
With Gabor's leap year example:
time <- as.POSIXct("2012-02-29 14:45:00 UTC",tz="UTC",origin="1970-01-01")
time <- as.POSIXlt(time)
time$year <- time$year - 1
time <- as.POSIXct(time)
#[1] "2011-03-01 14:45:00 UTC"
seq in base can be used:
LastYr <- function(x) seq(x, length = 2, by = "-1 year")[2]
toPOSIXct <- function(x) as.POSIXct(x, origin = "1970-01-01")
# example 1
LastYr(as.POSIXct("2012-02-28"))
## [1] "2011-02-28 EST"
# example 2 - leap year
LastYr(as.POSIXct("2012-02-29"))
## [1] "2011-03-01 EST"
# example 3 - vector case
x <- as.POSIXct(c("2012-02-28", "2012-02-29")) # test data
toPOSIXct(sapply(x, LastYr))
## [1] "2011-02-28 EST" "2011-03-01 EST"
# example 4 - data.table shown in question
DT[, Time := sapply(toPOSIXct(Time), LastYr)]
Revised simplified using functions LastYr and toPOSIXct.
or you can try, in base R :
> time + as.difftime(52*7+1,units="days")
[1] "2015-05-11 14:45:00 UTC"
> time - as.difftime(52*7+1,units="days")
[1] "2013-05-11 14:45:00 UTC"
of course, it would be easier if units could be years...
I am using the fasttime package for its fastPOSIXct function that can read character datetimes very efficiently. My problem is that it can only read character datetimes THAT ARE EXPRESSED IN GMT.
R) fastPOSIXct("2010-03-15 12:37:17.223",tz="GMT") #very fast
[1] "2010-03-15 12:31:16.223 GMT"
R) as.POSIXct("2010-03-15 12:37:17.223",tz="GMT") #very slow
[1] "2010-03-15 12:31:16.223 GMT"
Now, say I have a file with datetimes expressed in "America/Montral" timezone, the plan is to load them (implicitely pretending they are in GMT) and modifying subsequently the timezone attribute without changing the underlying value.
If I use this function, refered in another post:
forceTZ = function(x,tz){
return(as.POSIXct(as.numeric(x), origin=as.POSIXct("1970-01-01",tz=tz), tz=tz))
}
I am seeing a bug ...
R) forceTZ(as.POSIXct("2010-03-15 12:37:17.223",tz="GMT"),"America/Montreal")
[1] "2010-03-15 13:37:17.223 EDT"
... because I would like it to be
R) as.POSIXct("2010-03-15 12:37:17.223",format="%Y-%m-%d %H:%M:%OS",tz="America/Montreal")
[1] "2010-03-15 12:37:17.223 EDT"
Is there a workaround ?
EDIT: I know about lubridate::force_tz but it is too slow (not point using fasttime::fastPOSIXct anymore )
The smart thing to do here is almost certainly to write readable, easy-to-maintain code, and throw more hardware at the problem if your code is too slow.
If you are desperate for a code speedup, then you could write a custom time-zone adjustment function. It isn't pretty, so if you have to convert between many time zones, you'll end up with spaghetti code. Here's my solution for the specific case of converting from GMT to Montreal time.
First precompute a list of dates for daylight savings time. You'll need to extend this to before 2010/after 2013 in order to fit your dataset. I found the dates here
http://www.timeanddate.com/worldclock/timezone.html?n=165
montreal_tz_data <- cbind(
start = fastPOSIXct(
c("2010-03-14 07:00:00", "2011-03-13 07:00:00", "2012-03-11 07:00:00", "2013-03-10 07:00:00")
),
end = fastPOSIXct(
c("2010-11-07 06:00:00", "2011-11-06 06:00:00", "2012-11-04 06:00:00", "2013-11-03 06:00:00")
)
)
For speed, the function to change time zones treats the times as numbers.
to_montreal_tz <- function(x)
{
x <- as.numeric(x)
is_dst <- logical(length(x)) #initialise as FALSE
#Loop over DST periods in each year
for(row in seq_len(nrow(montreal_tz_data)))
{
is_dst[x > montreal_tz_data[row, 1] & x < montreal_tz_data[row, 2]] <- TRUE
}
#Hard-coded numbers are 4/5 hours in seconds
ans <- ifelse(is_dst, x + 14400, x + 18000)
class(ans) <- c("POSIXct", "POSIXt")
ans
}
Now, to compare times:
#A million dates
ch <- rep("2010-03-15 12:37:17.223", 1e6)
#The easy way (no conversion of time zones afterwards)
system.time(as.POSIXct(ch, tz="America/Montreal"))
# user system elapsed
# 28.96 0.05 29.00
#A slight performance gain by specifying the format
system.time(as.POSIXct(ch, format = "%Y-%m-%d %H:%M:%S", tz="America/Montreal"))
# user system elapsed
# 13.77 0.01 13.79
#Using the fast functions
library(fasttime)
system.time(to_montreal_tz(fastPOSIXct(ch)))
# user system elapsed
# 0.51 0.02 0.53
As with all optimisation tricks, you've either got a 27-fold speedup (yay!) or you've saved 13 seconds processing time but added 3 days of code-maintenance time from an obscure bug when you DST table runs out in 2035 (boo!).
It's a daylight savings issue: http://www.timeanddate.com/time/dst/2010a.html
In 2010 it began on the 14th March in Canada, but not until the 28th March in the UK.
You can use POSIXlt objects to modify timezones directly:
lt <- as.POSIXlt(as.POSIXct("2010-03-15 12:37:17.223",tz="GMT"))
attr(lt,"tzone") <- "America/Montreal"
as.POSIXct(lt)
[1] "2010-03-15 12:37:17 EDT"
Or you could use format to convert to a string and set the timezone in a call to as.POSIXct. You can therefore modify forceTZ:
forceTZ <- function(x,tz)
{
return(as.POSIXct(format(x),tz=tz))
}
forceTZ(as.POSIXct("2010-03-15 12:37:17.223",tz="GMT"),"America/Montreal")
[1] "2010-03-15 12:37:17 EDT"
Could you not just add the appropriate number of seconds to correct the offset from GMT?
# Original problem
fastPOSIXct("2010-03-15 12:37:17.223",tz="America/Montreal")
# [1] "2010-03-15 08:37:17 EDT"
# Add 4 hours worth of seconds to the data. This should be very quick.
fastPOSIXct("2010-03-15 12:37:17.223",tz="America/Montreal") + 14400
# [1] "2010-03-15 12:37:17 EDT"