R how to adjust day of week to local timezone? - r

I have series of data, in date format like "2015010119", meaning "20150101"-the date, and "19"-the time.
I need to adjust these bulk data sets into different timezones. How can I adjust the hour, and based on the hour adjust the date then the day of week?
Anyone can help? Thank you very much!

Convert the date/time into a date format R understands, then you can adjust its tzone (timezone) attribute at will and extract the correct weekday using the base R's weekdays command.
> test_time <- as.POSIXct("2015010119",format="%Y%m%d%H",tz="America/New_York")
[1] "2015-01-01 19:00:00 EST"
> weekdays(test_time)
[1] "Thursday"
> attributes(test_time)$tzone <- "Japan"
> test_time
[1] "2015-01-02 09:00:00 JST"
> weekdays(test_time)
[1] "Friday"

Related

R POSIXct returns NA with "03/12/2017 02:17:13"

I have a data set containing the following date, along with several others
03/12/2017 02:17:13
I want to put the whole data set into a data table, so I used read_csv and as.data.table to create DT which contained the date/time information in date.
Next I used
DT[, date := as.POSIXct(date, format = "%m/%d/%Y %H:%M:%S")]
Everything looked fine except I had some NA values where the original data had dates. The following expression returns an NA
as.POSIXct("03/12/2017 02:17:13", format = "%m/%d/%Y %H:%M:%S")
The question is why and how to fix.
Just use functions anytime() or utctime() from package anytime
R> library(anytime)
R> anytime("03/12/2017 02:17:13")
[1] "2017-03-12 01:17:13 CST"
R>
or
R> utctime("03/12/2017 02:17:13")
[1] "2017-03-11 20:17:13 CST"
R>
The real crux is that time did not exists in North America due to DST. You could parse it as UTC as UTC does not observer daylight savings:
R> utctime("03/12/2017 02:17:13", tz="UTC")
[1] "2017-03-12 02:17:13 UTC"
R>
You can express that UTC time as Mountain time, but it gets you the previous day:
R> utctime("03/12/2017 02:17:13", tz="America/Denver")
[1] "2017-03-11 19:17:13 MST"
R>
Ultimately, you (as the analyst) have to provide as to what was measured. UTC would make sense, the others may need adjustment.
My solution is below but ways to improve appreciated.
The explanation for the NA is that in the mountain time zone in the US, that date and time is in the window of the switch to daylight savings where the time doesn't exist, hence NA. While the time zone is not explicitly specified, I guess R must be picking it up from the computer's time, which is in "America/Denver"
The solution is to explicitly state the date/time string is in UTC and then convert back as follows:
time.utc <- as.POSIXct("03/12/2017 02:17:13", format = "%m/%d/%Y %H:%M:%S", tz = "UTC")
> time.utc
[1] "2017-03-12 02:17:13 UTC"
>
Next, add 6 hours to the UTC time which is the difference between UTC and MST
time.utc2 <- time.utc + 6 * 60 * 60
> time.utc2
[1] "2017-03-12 08:17:13 UTC"
>
Now convert to America/Denver time using daylight savings.
time.mdt <- format(time.utc2, usetz = TRUE, tz = "America/Denver")
> time.mdt
[1] "2017-03-12 01:17:13 MST"
>
Note that this is in standard time, because daylight savings doesn't start until 2 am.
If you change the original string from 2 am to 3 am, you get the following
> time.mdt
[1] "2017-03-12 03:17:13 MDT"
>
The hour between 2 and 3 is lost in the change from standard to daylight savings but the data are now correct.

Formatting Date Strings in R

I have two columns of differently formatted date strings that I need to make the same format,
the first is in the form:
vt_dev_date = "6/20/2016 7:45"
the second is in the form
vt_other = "2016-06-14 20:21:29.0"
If could get them both in the same form down to the minute that would be great. I have tried
strptime(vt_dev_date,format = "%Y-%m-%d %H:%M")
strptime(vt_other,"%Y-%m-%d %H:%M")
and for the second one, it works and I get
"2016-06-14 20:21:00 EDT"
But for the first string, it seems that because the month and hour are not padded with zeros, none of the formating tricks will work, becuase if I try
test_string <- "06/20/2016 07:45"
strptime(test_string,format = "%m/%d/%Y %H:%M")
[1] "2016-06-20 07:45:00 EDT"
It works, but I dont think going through every row in the column and padding each date is a great option. Any help would be appreciated.
Thanks,
josh
How about using lubridate , as follows :
library(lubridate)
x <- c("6/20/2016 7:45","2016-06-14 20:21:29.0")
> x
[1] "6/20/2016 7:45" "2016-06-14 20:21:29.0"
> parse_date_time(x, orders = c("mdy hm", "ymd hms"))
[1] "2016-06-20 07:45:00 UTC" "2016-06-14 20:21:29 UTC"
>

How to get the beginning of the day in POSIXct

My day starts at 2016-03-02 00:00:00. Not 2016-03-02 00:00:01.
How do I get the beginning of the day in POSIXct in local time?
My confusing probably comes from the fact that R sees this as the end-date of 2016-03-01? Given that R uses an ISO 8601?
For example if I try to find the beginning of the day using Sys.Date():
as.POSIXct(Sys.Date(), tz = "CET")
"2016-03-01 01:00:00 CET"
Which is not correct - but are there other ways?
I know I can hack my way out using a simple
as.POSIXct(paste(Sys.Date(), "00:00:00", sep = " "), tz = "CET")
But there has to be a more correct way to do this? Base R preferred.
It's a single command---but you want as.POSIXlt():
R> as.POSIXlt(Sys.Date())
[1] "2016-03-02 UTC"
R> format(as.POSIXlt(Sys.Date()), "%Y-%m-%d %H:%M:%S")
[1] "2016-03-02 00:00:00"
R>
It is only when converting to POSIXct happens that the timezone offset to UTC (six hours for me) enters:
R> as.POSIXct(Sys.Date())
[1] "2016-03-01 18:00:00 CST"
R>
Needless to say by wrapping both you get the desired type and value:
R> as.POSIXct(as.POSIXlt(Sys.Date()))
[1] "2016-03-02 UTC"
R>
Filed under once again no need for lubridate or other non-Base R packages.
Notwithstanding that you understandably prefer base R, a "smart way," for certain meaning of "smart," would be:
library(lubridate)
x <- floor_date(Sys.Date(),"day")
> format(x,"%Y-%m-%d-%H-%M-%S")
[1] "2016-03-02-00-00-00"
From ?floor_date:
floor_date takes a date-time object and rounds it down to the nearest
integer value of the specified time unit.
Pretty handy.
Your example is a bit unclear.
You are talking about a 1 minute difference for the day start, but your example shows a 1 hour difference due to the timezone.
You can try
?POSIXct
to get the functionality explained.
Using Sys.Date() withing POSIXct somehow overwrites your timezone setting.
as.POSIXct(Sys.Date(), tz="EET")
"2016-03-01 01:00:00 CET"
While entering a string gives you
as.POSIXct("2016-03-01 00:00:00", tz="EET")
"2016-03-01 EET"
It looks like 00:00:00 is actually the beginning of the day. You can conclude it from the results of the following 2 inequalities
as.POSIXct("2016-03-02 00:00:02 CET")>as.POSIXct("2016-03-02 00:00:01 CET")
TRUE
as.POSIXct("2016-03-02 00:00:01 CET")>as.POSIXct("2016-03-02 00:00:00 CET")
TRUE
So somehow this is a timezone issue. Notice that 00:00:00 is automatically removed from the as.POSIXct result.
as.POSIXct("2016-03-02 00:00:00 CET")
"2016-03-02 CET"

R format date with time stamp

I would like to convert the following dates
dates <-c(1149318000L, 1151910000L, 1154588400L, 1157266800L, 1159858800L, 1162540800L)
into date and time format
I don't know the origin of the date but I know that
1146685218 = 2006/05/03 07:00:00
** Update 1 **
I have sorted the unformatted dates and replace the sample above with a friendly sequence but I have real dates. I was thinking of using the the above key as origin, but It does not seem to work.
let
seconds_in_days <- 3600*24
(dates[2]-dates[1])/seconds_in_days
## [1] 30
If you know
1146685218 = 2006/05/03 07:00:00
then just make that the origin
dates <- c(1149318000L, 1151910000L, 1154588400L, 1157266800L, 1159858800L, 1162540800L)
orig.int <- 1146685218
orig.date <- as.POSIXct("2006/05/03 07:00:00", format="%Y/%m/%d %H:%M:%S")
as.POSIXct(dates-orig.int, origin=orig.date)
# [1] "2006-06-02 18:19:42 EDT" "2006-07-02 18:19:42 EDT" "2006-08-02 18:19:42 EDT"
# [4] "2006-09-02 18:19:42 EDT" "2006-10-02 18:19:42 EDT" "2006-11-02 18:19:42 EST"
This works assuming your "date" values are the number of seconds since a particular date/time which is how POSIXt stores it's date/time values.

Extracting time from POSIXct

How would I extract the time from a series of POSIXct objects discarding the date part?
For instance, I have:
times <- structure(c(1331086009.50098, 1331091427.42461, 1331252565.99979,
1331252675.81601, 1331262597.72474, 1331262641.11786, 1331269557.4059,
1331278779.26727, 1331448476.96126, 1331452596.13806), class = c("POSIXct",
"POSIXt"))
which corresponds to these dates:
"2012-03-07 03:06:49 CET" "2012-03-07 04:37:07 CET"
"2012-03-09 01:22:45 CET" "2012-03-09 01:24:35 CET"
"2012-03-09 04:09:57 CET" "2012-03-09 04:10:41 CET"
"2012-03-09 06:05:57 CET" "2012-03-09 08:39:39 CET"
"2012-03-11 07:47:56 CET" "2012-03-11 08:56:36 CET"
Now, I have some values for a parameter measured at those times:
val <- c(1.25343125e-05, 0.00022890575,
3.9269125e-05, 0.0002285681875,
4.26353125e-05, 5.982625e-05,
2.09575e-05, 0.0001516951251,
2.653125e-05, 0.0001021391875)
I would like to plot val vs time of the day, irrespectively of the specific day when val was measured.
Is there a specific function that would allow me to do that?
You can use strftime to convert datetimes to any character format:
> t <- strftime(times, format="%H:%M:%S")
> t
[1] "02:06:49" "03:37:07" "00:22:45" "00:24:35" "03:09:57" "03:10:41"
[7] "05:05:57" "07:39:39" "06:47:56" "07:56:36"
But that doesn't help very much, since you want to plot your data. One workaround is to strip the date element from your times, and then to add an identical date to all of your times:
> xx <- as.POSIXct(t, format="%H:%M:%S")
> xx
[1] "2012-03-23 02:06:49 GMT" "2012-03-23 03:37:07 GMT"
[3] "2012-03-23 00:22:45 GMT" "2012-03-23 00:24:35 GMT"
[5] "2012-03-23 03:09:57 GMT" "2012-03-23 03:10:41 GMT"
[7] "2012-03-23 05:05:57 GMT" "2012-03-23 07:39:39 GMT"
[9] "2012-03-23 06:47:56 GMT" "2012-03-23 07:56:36 GMT"
Now you can use these datetime objects in your plot:
plot(xx, rnorm(length(xx)), xlab="Time", ylab="Random value")
For more help, see ?DateTimeClasses
The data.table package has a function 'as.ITime', which can do this efficiently use below:
library(data.table)
x <- "2012-03-07 03:06:49 CET"
as.IDate(x) # Output is "2012-03-07"
as.ITime(x) # Output is "03:06:49"
There have been previous answers that showed the trick. In essence:
you must retain POSIXct types to take advantage of all the existing plotting functions
if you want to 'overlay' several days worth on a single plot, highlighting the intra-daily variation, the best trick is too ...
impose the same day (and month and even year if need be, which is not the case here)
which you can do by overriding the day-of-month and month components when in POSIXlt representation, or just by offsetting the 'delta' relative to 0:00:00 between the different days.
So with times and val as helpfully provided by you:
## impose month and day based on first obs
ntimes <- as.POSIXlt(times) # convert to 'POSIX list type'
ntimes$mday <- ntimes[1]$mday # and $mon if it differs too
ntimes <- as.POSIXct(ntimes) # convert back
par(mfrow=c(2,1))
plot(times,val) # old times
plot(ntimes,val) # new times
yields this contrasting the original and modified time scales:
Here's an update for those looking for a tidyverse method to extract hh:mm::ss.sssss from a POSIXct object. Note that time zone is not included in the output.
library(hms)
as_hms(times)
Many solutions have been provided, but I have not seen this one, which uses package chron:
hours = times(strftime(times, format="%T"))
plot(val~hours)
(sorry, I am not entitled to post an image, you'll have to plot it yourself)
I can't find anything that deals with clock times exactly, so I'd just use some functions from package:lubridate and work with seconds-since-midnight:
require(lubridate)
clockS = function(t){hour(t)*3600+minute(t)*60+second(t)}
plot(clockS(times),val)
You might then want to look at some of the axis code to figure out how to label axes nicely.
The time_t value for midnight GMT is always divisible by 86400 (24 * 3600). The value for seconds-since-midnight GMT is thus time %% 86400.
The hour in GMT is (time %% 86400) / 3600 and this can be used as the x-axis of the plot:
plot((as.numeric(times) %% 86400)/3600, val)
To adjust for a time zone, adjust the time before taking the modulus, by adding the number of seconds that your time zone is ahead of GMT. For example, US central daylight saving time (CDT) is 5 hours behind GMT. To plot against the time in CDT, the following expression is used:
plot(((as.numeric(times) - 5*3600) %% 86400)/3600, val)

Resources