I have a dataset with dates stored in the DB as UTC, however, the timezone is actually different.
mydat <- data.frame(
time_stamp=c("2022-08-01 05:00:00 UTC","2022-08-01 17:00:00 UTC","2022-08-02 22:30:00 UTC","2022-08-04 05:00:00 UTC","2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago","America/New_York","America/Los_Angeles","America/Denver","America/New_York")
)
I want to apply the timezone to the UTC saved timestamps, over the entire column.
I looked into the with_tz function in the lubridate package, but I don't see how to reference the "timezone" column, rather than hardcoding in a value.
Such as if I try
with_tz(mydat$time_stamp, tzone = mydat$timezone)
I get the following error
Error in as.POSIXlt.POSIXct(x, tz) : invalid 'tz' value`
However, if I try
mydat$time_stamp2 <- with_tz(mydat$time_stamp,"America/New_York")
that will render a new column without issue. How can I do this just referencing column values?
Welcome to StackOverflow. This is nice, common, and tricky problem! The following should do what you ask for:
Code
mydat <- data.frame(time_stamp=c("2022-08-01 05:00:00 UTC",
"2022-08-01 17:00:00 UTC",
"2022-08-02 22:30:00 UTC",
"2022-08-04 05:00:00 UTC",
"2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago", "America/New_York",
"America/Los_Angeles", "America/Denver",
"America/New_York"))
mydat$utc <- anytime::utctime(mydat$time_stamp, tz="UTC")
mydat$format <- ""
for (i in seq_len(nrow(mydat)))
mydat[i, "format"] <- strftime(mydat[i,"utc"],
"%Y-%m-%d %H:%M:%S",
tz=mydat[i,"timezone"])
Output
> mydat
time_stamp timezone utc format
1 2022-08-01 05:00:00 UTC America/Chicago 2022-08-01 05:00:00 2022-08-01 00:00:00
2 2022-08-01 17:00:00 UTC America/New_York 2022-08-01 17:00:00 2022-08-01 13:00:00
3 2022-08-02 22:30:00 UTC America/Los_Angeles 2022-08-02 22:30:00 2022-08-02 15:30:00
4 2022-08-04 05:00:00 UTC America/Denver 2022-08-04 05:00:00 2022-08-03 23:00:00
5 2022-08-05 02:00:00 UTC America/New_York 2022-08-05 02:00:00 2022-08-04 22:00:00
>
Comment
We first parse your data as UTC, I once wrote a helper function for that in my anytime package (there are alternatives but this is how I do it...). We then need to format from the given (numeric !!) UTC representation to the give timezone. We need a loop for this as the tz argument to strftime() is not vectorized.
Dirk gave a great answer that uses (mostly) base R tooling, if that is a requirement of yours. I wanted to also add an answer that uses the clock package that I developed because it doesn't require working rowwise over your data frame. clock has a function called sys_time_info() that retrieves low level information about a UTC time point in a specific time zone. It is one of the few functions where it makes sense to have a vectorized zone argument (which you need here) and returns an offset from UTC that will be useful here for converting to a "local" time.
As others have mentioned, you won't be able to construct a date-time vector that stores multiple time zones in it, but if you just need to see what the local time would have been in those zones, this can still be useful.
library(clock)
mydat <- data.frame(
time_stamp=c("2022-08-01 05:00:00 UTC","2022-08-01 17:00:00 UTC","2022-08-02 22:30:00 UTC","2022-08-04 05:00:00 UTC","2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago","America/New_York","America/Los_Angeles","America/Denver","America/New_York")
)
# Parse into a "sys-time" type, which can be thought of as a UTC time point
mydat$time_stamp <- sys_time_parse(mydat$time_stamp, format = "%Y-%m-%d %H:%M:%S")
mydat
#> time_stamp timezone
#> 1 2022-08-01T05:00:00 America/Chicago
#> 2 2022-08-01T17:00:00 America/New_York
#> 3 2022-08-02T22:30:00 America/Los_Angeles
#> 4 2022-08-04T05:00:00 America/Denver
#> 5 2022-08-05T02:00:00 America/New_York
# "Low level" information about DST, the time zone abbreviation,
# and offset from UTC in that zone. This is one of the few functions where
# it makes sense to have a vectorized `zone` argument.
info <- sys_time_info(mydat$time_stamp, mydat$timezone)
info
#> begin end offset dst abbreviation
#> 1 2022-03-13T08:00:00 2022-11-06T07:00:00 -18000 TRUE CDT
#> 2 2022-03-13T07:00:00 2022-11-06T06:00:00 -14400 TRUE EDT
#> 3 2022-03-13T10:00:00 2022-11-06T09:00:00 -25200 TRUE PDT
#> 4 2022-03-13T09:00:00 2022-11-06T08:00:00 -21600 TRUE MDT
#> 5 2022-03-13T07:00:00 2022-11-06T06:00:00 -14400 TRUE EDT
# Add the offset to the sys-time and then convert to a character column
# (these times don't really represent sys-time anymore since they are now localized)
mydat$localized <- as.character(mydat$time_stamp + info$offset)
mydat
#> time_stamp timezone localized
#> 1 2022-08-01T05:00:00 America/Chicago 2022-08-01T00:00:00
#> 2 2022-08-01T17:00:00 America/New_York 2022-08-01T13:00:00
#> 3 2022-08-02T22:30:00 America/Los_Angeles 2022-08-02T15:30:00
#> 4 2022-08-04T05:00:00 America/Denver 2022-08-03T23:00:00
#> 5 2022-08-05T02:00:00 America/New_York 2022-08-04T22:00:00
I'm having trouble creating a time series (POSIXct or dttm column) with a row every 15 minutes.
Something that will look like this for every 15 minutes between Jan 1st 2015 and Dec 31st 2016 (here as month/day/year hour:minutes):
1/15/2015 0:00
1/15/2015 0:15
1/15/2015 0:30
1/15/2015 0:45
1/15/2015 1:00
A loop starting date of 01/01/2015 0:00 and then adding 15 minutes until 12/31/2016 23:45?
Does anyone has an idea of how this can be done easily?
Little bit easier to read
library(lubridate)
seq(ymd_hm('2015-01-01 00:00'),ymd_hm('2016-12-31 23:45'), by = '15 mins')
intervals.15.min <- 0 : (366 * 24 * 60 * 60 / 15 / 60)
res <- as.POSIXct("2015-01-01","GMT") + intervals.15.min * 15 * 60
res <- res[res < as.POSIXct("2016-01-01 00:00:00 GMT")]
head(res)
# "2015-01-01 00:00:00 GMT" "2015-01-01 00:15:00 GMT" "2015-01-01 00:30:00 GMT"
tail(res)
# "2015-12-31 23:15:00 GMT" "2015-12-31 23:30:00 GMT" "2015-12-31 23:45:00 GMT"
I have 3848 rows of POSIXct data - stop times of bike trips in the month of April. As you can see, all of the data is in POSIXct format and is within the range of the month of April.
length(output2_stoptime)
[1] 3848
head(output2_stoptime)
[1] "2015-04-01 17:19:27 EST" "2015-04-02 07:26:06 EST" "2015-04-08 10:09:37 EST"
[4] "2015-04-12 20:08:00 EST" "2015-04-13 17:53:11 EST" "2015-04-14 07:17:34 EST"
class(output2_stoptime)
[1] "POSIXct" "POSIXt"
range(output2_stoptime)
[1] "2015-04-01 00:34:29 EST" "2015-04-30 20:49:22 EST"
Sys.timezone()
[1] "EST"
However, when I try converting this into a table of stop times per day, I get 4 dates that are converted as the 1st of May. I thought this might be occurring due to the different system timezone as I am located in Europe at the moment, but even after setting the timezone to EST, the problem persists. For example:
by_day_output2 = as.data.frame(as.Date(output2_stoptime), tz = "EST")
colnames(by_day_output2)[1] = "SUM"
movements_Apr = as.data.frame(table(by_day_output2$SUM))
colnames(movements_Apr)[1] = "DATE"
tail(movements_Apr)
DATE Freq
26 2015-04-26 96
27 2015-04-27 125
28 2015-04-28 145
29 2015-04-29 151
30 2015-04-30 99
31 2015-05-01 4
Why are the four dates converting improperly when the time zones of the data and the system match? None of the data falls within May.
I have several files :
dir<- list.files("/data/test", "*.img$", full.names = TRUE)
dir:
/data/test/data.df_df_fg.20141231.jh.ds.0930.edfr.img
/data/test/data.df_df_fg.20141231.jh.ds.1030.edfr.img
/data/test/data.df_df_fg.20141231.jh.ds.1130.edfr.img
I want to extract the date from the file names:
dt <- as.POSIXct(strptime(basename(dir),"data.df_df_fg.%Y%m%d.jh.ds.%H%M.edfr", tz = "GMT"))
dt:
[1] "2014-12-31 09:30:00 GMT"
[2]"2014-12-31 10:30:00 GMT"
[3] "2014-12-31 11:30:00 GMT"
What I need is to subtract 1 hour from dt so I get:
[1] "2014-12-31 08:30:00 GMT"
[2]"2014-12-31 09:30:00 GMT"
[3] "2014-12-31 10:30:00 GMT"
and if the hour is 2014-12-31 24:30:00 GMT , make it 23:30:00 GMT but also reduce the date to 2014-12-30.because we are already in the previous day
Try:
dt-as.difftime(1,units="hours")
I have a list of dates as this:
"2014-01-20 18:47:09 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:41 GMT"
I used this code to split the dates in four-hour intervals
data.frame(table(cut(datenormord, breaks = "4 hour")))
Results are these:
2013-07-22 06:00:00 144
2013-07-22 11:00:00 268
2013-07-22 16:00:00 331
2013-07-22 21:00:00 332
What I want is to see how many observations there are in each interval of four hours but not taking account of days months and years. For example I would like to see how many observations there are from 00:00 to 04:00 by adding observations of everyday of every year contained in my dataset
For example i want something like this:
01:00:00 1230
06:00:00 2430
11:00:00 3230
You can try removing the dates from your date using strftime then reformatting them to a date, which will just add the current day, year and month to all the datapoints. You can them break and count like you posted.
datenormord<-c("2014-01-20 01:47:09 GMT", "2014-01-20 07:46:59 GMT","2014-01-20 13:46:59 GMT" ,"2014-01-20 18:46:59 GMT" ,"2014-01-20 18:46:41 GMT")
datenormord<-strftime(as.POSIXlt(datenormord), format="%H:%M:%S")
datenormord<-as.POSIXlt(datenormord, format="%H:%M:%S")
result<-data.frame(table(cut(datenormord, breaks = "4 hour")))
You can remove the date in the final data frame as well:
result$Var1<-with(result,format(strftime(Var1,format="%H:%M")))