seq.Date giving problems - r

I have recently updated R to version 3.2.3 and now I found a problem using seq with dates:
date1<-as.POSIXct("2014-01-30 02:00:00")
date2<-as.POSIXct("2014-12-24 11:00:00")
seq(date1,date2,by="month")
#[1] "2014-01-30 02:00:00 CET" "2014-03-02 02:00:00 CET"
#[3] NA "2014-04-30 02:00:00 CEST"
#[5] "2014-05-30 02:00:00 CEST" "2014-06-30 02:00:00 CEST"
#[7] "2014-07-30 02:00:00 CEST" "2014-08-30 02:00:00 CEST"
#[9] "2014-09-30 02:00:00 CEST" "2014-10-30 02:00:00 CET"
#[11] "2014-11-30 02:00:00 CET"
I don't understand where the NA comes from. I have tried on different machines with both the same R version as mine or a previous one and in the place of that NA they correctly give "2014-03-30". Furthermore, if I change the year in the dates from 2014 to 2015, no NAs are returned!
I guess that during the installation something in my locale was modified but I cannot understand how to fix the problem.
Sys.getlocale() returns:
"en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
and my system is a Mac Book Pro with Maverick.
Thanks for any help!

I was guessing Germany and here's what the situation was in that CET timezone on Mar 30 (a Sunday)
http://www.timetemperature.com/utc-converter/utc-20140330-germany-12.html
UTC or GMT Time Germany
Sunday 30th March 2014 00:00:00 Sun 01:00 AM
Sunday 30th March 2014 01:00:00 Sun 03:00 AM*
Changing the setting to Italy, I get the same result:
UTC or GMT Time Italy
Sunday 30th March 2014 00:00:00 Sun 01:00 AM
Sunday 30th March 2014 01:00:00 Sun 03:00 AM*
The key here is to be suspicious of weirdness when the time is in the early morning hours of a Spring or Fall date, or when calculations of intervals crosses such dates. The rules change from year to year, and since countries often do the switch on a Sunday or Saturday morning, the exact dates jump around.
The changes vary by country and in the US they may vary by state or even "sub-state" boundaries: In Washington State in 2014 you find the change happening onhte second Sunday of March:
http://www.timetemperature.com/utc-converter/utc-20140309-us-washington+state-12.html
UTC or GMT Time US-Washington State
snipped several times
Sunday 9th March 2014 07:00:00 Sat 11:00 PM
Sunday 9th March 2014 08:00:00 Sun 12:00 AM
Sunday 9th March 2014 09:00:00 Sun 01:00 AM
Sunday 9th March 2014 10:00:00 Sun 03:00 AM*
Sunday 9th March 2014 11:00:00 Sun 04:00 AM*
I'm in the same TZ as Washington state. With a Sys.timezone set, one can reproduce the NA, at least on a Mac. The implementation of times and timezones is OS-specific, so it's possible to see variations in how these weirdities get visible:
> Sys.timezone(location = TRUE)
[1] "America/Los_Angeles"
> date1<-as.POSIXct("2014-01-09 02:00:00")
> date2<-as.POSIXct("2014-12-09 11:00:00")
> seq(date1,date2,by="month")
[1] "2014-01-09 02:00:00 PST" "2014-02-09 02:00:00 PST"
[3] NA "2014-04-09 02:00:00 PDT"
[5] "2014-05-09 02:00:00 PDT" "2014-06-09 02:00:00 PDT"
[7] "2014-07-09 02:00:00 PDT" "2014-08-09 02:00:00 PDT"
[9] "2014-09-09 02:00:00 PDT" "2014-10-09 02:00:00 PDT"
[11] "2014-11-09 02:00:00 PST" "2014-12-09 02:00:00 PST"

By inspecting the relevant code in seq.POSIXt there appears that a call to seq with by="month" works as follows
[some manipulation of the data]
conversion of data1 & data2 to POSIXlt
creation of a sequence of months numbers spanning the interval from data1 to data2 (in this case 0,...,11)
manual update of data1$mon to this sequence of months (and up to this point the dates are all properly handled)
finally, the resulting dates are converted to POSIXct and here the NA shows up
while the resulting NA is technically correct, since it is trying to convert an invalid date ("2014-01-30 02:00:00 CET", which does not exist) to POSIXct, could the issue be possibly worked around by passing through difftimes? [*]
not sure it is worth, though...
[*] here by difftimes I mean to add the correct number of seconds to the dates instead of just adding the months...

Related

How to convert a column of UTC timestamps into several different timezones?

I have a dataset with dates stored in the DB as UTC, however, the timezone is actually different.
mydat <- data.frame(
time_stamp=c("2022-08-01 05:00:00 UTC","2022-08-01 17:00:00 UTC","2022-08-02 22:30:00 UTC","2022-08-04 05:00:00 UTC","2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago","America/New_York","America/Los_Angeles","America/Denver","America/New_York")
)
I want to apply the timezone to the UTC saved timestamps, over the entire column.
I looked into the with_tz function in the lubridate package, but I don't see how to reference the "timezone" column, rather than hardcoding in a value.
Such as if I try
with_tz(mydat$time_stamp, tzone = mydat$timezone)
I get the following error
Error in as.POSIXlt.POSIXct(x, tz) : invalid 'tz' value`
However, if I try
mydat$time_stamp2 <- with_tz(mydat$time_stamp,"America/New_York")
that will render a new column without issue. How can I do this just referencing column values?
Welcome to StackOverflow. This is nice, common, and tricky problem! The following should do what you ask for:
Code
mydat <- data.frame(time_stamp=c("2022-08-01 05:00:00 UTC",
"2022-08-01 17:00:00 UTC",
"2022-08-02 22:30:00 UTC",
"2022-08-04 05:00:00 UTC",
"2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago", "America/New_York",
"America/Los_Angeles", "America/Denver",
"America/New_York"))
mydat$utc <- anytime::utctime(mydat$time_stamp, tz="UTC")
mydat$format <- ""
for (i in seq_len(nrow(mydat)))
mydat[i, "format"] <- strftime(mydat[i,"utc"],
"%Y-%m-%d %H:%M:%S",
tz=mydat[i,"timezone"])
Output
> mydat
time_stamp timezone utc format
1 2022-08-01 05:00:00 UTC America/Chicago 2022-08-01 05:00:00 2022-08-01 00:00:00
2 2022-08-01 17:00:00 UTC America/New_York 2022-08-01 17:00:00 2022-08-01 13:00:00
3 2022-08-02 22:30:00 UTC America/Los_Angeles 2022-08-02 22:30:00 2022-08-02 15:30:00
4 2022-08-04 05:00:00 UTC America/Denver 2022-08-04 05:00:00 2022-08-03 23:00:00
5 2022-08-05 02:00:00 UTC America/New_York 2022-08-05 02:00:00 2022-08-04 22:00:00
>
Comment
We first parse your data as UTC, I once wrote a helper function for that in my anytime package (there are alternatives but this is how I do it...). We then need to format from the given (numeric !!) UTC representation to the give timezone. We need a loop for this as the tz argument to strftime() is not vectorized.
Dirk gave a great answer that uses (mostly) base R tooling, if that is a requirement of yours. I wanted to also add an answer that uses the clock package that I developed because it doesn't require working rowwise over your data frame. clock has a function called sys_time_info() that retrieves low level information about a UTC time point in a specific time zone. It is one of the few functions where it makes sense to have a vectorized zone argument (which you need here) and returns an offset from UTC that will be useful here for converting to a "local" time.
As others have mentioned, you won't be able to construct a date-time vector that stores multiple time zones in it, but if you just need to see what the local time would have been in those zones, this can still be useful.
library(clock)
mydat <- data.frame(
time_stamp=c("2022-08-01 05:00:00 UTC","2022-08-01 17:00:00 UTC","2022-08-02 22:30:00 UTC","2022-08-04 05:00:00 UTC","2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago","America/New_York","America/Los_Angeles","America/Denver","America/New_York")
)
# Parse into a "sys-time" type, which can be thought of as a UTC time point
mydat$time_stamp <- sys_time_parse(mydat$time_stamp, format = "%Y-%m-%d %H:%M:%S")
mydat
#> time_stamp timezone
#> 1 2022-08-01T05:00:00 America/Chicago
#> 2 2022-08-01T17:00:00 America/New_York
#> 3 2022-08-02T22:30:00 America/Los_Angeles
#> 4 2022-08-04T05:00:00 America/Denver
#> 5 2022-08-05T02:00:00 America/New_York
# "Low level" information about DST, the time zone abbreviation,
# and offset from UTC in that zone. This is one of the few functions where
# it makes sense to have a vectorized `zone` argument.
info <- sys_time_info(mydat$time_stamp, mydat$timezone)
info
#> begin end offset dst abbreviation
#> 1 2022-03-13T08:00:00 2022-11-06T07:00:00 -18000 TRUE CDT
#> 2 2022-03-13T07:00:00 2022-11-06T06:00:00 -14400 TRUE EDT
#> 3 2022-03-13T10:00:00 2022-11-06T09:00:00 -25200 TRUE PDT
#> 4 2022-03-13T09:00:00 2022-11-06T08:00:00 -21600 TRUE MDT
#> 5 2022-03-13T07:00:00 2022-11-06T06:00:00 -14400 TRUE EDT
# Add the offset to the sys-time and then convert to a character column
# (these times don't really represent sys-time anymore since they are now localized)
mydat$localized <- as.character(mydat$time_stamp + info$offset)
mydat
#> time_stamp timezone localized
#> 1 2022-08-01T05:00:00 America/Chicago 2022-08-01T00:00:00
#> 2 2022-08-01T17:00:00 America/New_York 2022-08-01T13:00:00
#> 3 2022-08-02T22:30:00 America/Los_Angeles 2022-08-02T15:30:00
#> 4 2022-08-04T05:00:00 America/Denver 2022-08-03T23:00:00
#> 5 2022-08-05T02:00:00 America/New_York 2022-08-04T22:00:00

Create a time series with a row every 15 minutes

I'm having trouble creating a time series (POSIXct or dttm column) with a row every 15 minutes.
Something that will look like this for every 15 minutes between Jan 1st 2015 and Dec 31st 2016 (here as month/day/year hour:minutes):
1/15/2015 0:00
1/15/2015 0:15
1/15/2015 0:30
1/15/2015 0:45
1/15/2015 1:00
A loop starting date of 01/01/2015 0:00 and then adding 15 minutes until 12/31/2016 23:45?
Does anyone has an idea of how this can be done easily?
Little bit easier to read
library(lubridate)
seq(ymd_hm('2015-01-01 00:00'),ymd_hm('2016-12-31 23:45'), by = '15 mins')
intervals.15.min <- 0 : (366 * 24 * 60 * 60 / 15 / 60)
res <- as.POSIXct("2015-01-01","GMT") + intervals.15.min * 15 * 60
res <- res[res < as.POSIXct("2016-01-01 00:00:00 GMT")]
head(res)
# "2015-01-01 00:00:00 GMT" "2015-01-01 00:15:00 GMT" "2015-01-01 00:30:00 GMT"
tail(res)
# "2015-12-31 23:15:00 GMT" "2015-12-31 23:30:00 GMT" "2015-12-31 23:45:00 GMT"

as.Date converts wrong date from POSIXct data

I have 3848 rows of POSIXct data - stop times of bike trips in the month of April. As you can see, all of the data is in POSIXct format and is within the range of the month of April.
length(output2_stoptime)
[1] 3848
head(output2_stoptime)
[1] "2015-04-01 17:19:27 EST" "2015-04-02 07:26:06 EST" "2015-04-08 10:09:37 EST"
[4] "2015-04-12 20:08:00 EST" "2015-04-13 17:53:11 EST" "2015-04-14 07:17:34 EST"
class(output2_stoptime)
[1] "POSIXct" "POSIXt"
range(output2_stoptime)
[1] "2015-04-01 00:34:29 EST" "2015-04-30 20:49:22 EST"
Sys.timezone()
[1] "EST"
However, when I try converting this into a table of stop times per day, I get 4 dates that are converted as the 1st of May. I thought this might be occurring due to the different system timezone as I am located in Europe at the moment, but even after setting the timezone to EST, the problem persists. For example:
by_day_output2 = as.data.frame(as.Date(output2_stoptime), tz = "EST")
colnames(by_day_output2)[1] = "SUM"
movements_Apr = as.data.frame(table(by_day_output2$SUM))
colnames(movements_Apr)[1] = "DATE"
tail(movements_Apr)
DATE Freq
26 2015-04-26 96
27 2015-04-27 125
28 2015-04-28 145
29 2015-04-29 151
30 2015-04-30 99
31 2015-05-01 4
Why are the four dates converting improperly when the time zones of the data and the system match? None of the data falls within May.

How to subtract a number in a file name in as.POSIXct in R?

I have several files :
dir<- list.files("/data/test", "*.img$", full.names = TRUE)
dir:
/data/test/data.df_df_fg.20141231.jh.ds.0930.edfr.img
/data/test/data.df_df_fg.20141231.jh.ds.1030.edfr.img
/data/test/data.df_df_fg.20141231.jh.ds.1130.edfr.img
I want to extract the date from the file names:
dt <- as.POSIXct(strptime(basename(dir),"data.df_df_fg.%Y%m%d.jh.ds.%H%M.edfr", tz = "GMT"))
dt:
[1] "2014-12-31 09:30:00 GMT"
[2]"2014-12-31 10:30:00 GMT"
[3] "2014-12-31 11:30:00 GMT"
What I need is to subtract 1 hour from dt so I get:
[1] "2014-12-31 08:30:00 GMT"
[2]"2014-12-31 09:30:00 GMT"
[3] "2014-12-31 10:30:00 GMT"
and if the hour is 2014-12-31 24:30:00 GMT , make it 23:30:00 GMT but also reduce the date to 2014-12-30.because we are already in the previous day
Try:
dt-as.difftime(1,units="hours")

Create hourly intervals without regard to day-month-year in R

I have a list of dates as this:
"2014-01-20 18:47:09 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:41 GMT"
I used this code to split the dates in four-hour intervals
data.frame(table(cut(datenormord, breaks = "4 hour")))
Results are these:
2013-07-22 06:00:00 144
2013-07-22 11:00:00 268
2013-07-22 16:00:00 331
2013-07-22 21:00:00 332
What I want is to see how many observations there are in each interval of four hours but not taking account of days months and years. For example I would like to see how many observations there are from 00:00 to 04:00 by adding observations of everyday of every year contained in my dataset
For example i want something like this:
01:00:00 1230
06:00:00 2430
11:00:00 3230
You can try removing the dates from your date using strftime then reformatting them to a date, which will just add the current day, year and month to all the datapoints. You can them break and count like you posted.
datenormord<-c("2014-01-20 01:47:09 GMT", "2014-01-20 07:46:59 GMT","2014-01-20 13:46:59 GMT" ,"2014-01-20 18:46:59 GMT" ,"2014-01-20 18:46:41 GMT")
datenormord<-strftime(as.POSIXlt(datenormord), format="%H:%M:%S")
datenormord<-as.POSIXlt(datenormord, format="%H:%M:%S")
result<-data.frame(table(cut(datenormord, breaks = "4 hour")))
You can remove the date in the final data frame as well:
result$Var1<-with(result,format(strftime(Var1,format="%H:%M")))

Resources