I have a data set, rfm, which has a column named cohort, of class Date as shown below. I also have a vector of calendar dates, rather cleverly named dates, whose elements are also of class Date:
> class(rfm)
[1] "tbl_df" "data.frame"
> class(rfm$cohort)
[1] "Date"
> class(dates)
[1] "Date"
Both rfm$cohort and dates show the date of the first of the month over a certain time period, with the possibility that dates covers a more recent set of months. My problem is simple: I just want to see how many months there are between max(rfm$cohort) and max(dates).
The lubridate package makes this easy with the interval() function, but the arguments of that function must be POSIXct, not Date objects:
> as.period(interval(ymd(as.character(max(rfm$cohort))),ymd(as.character(max(dates)))), months)
[1] "1m 0d 0H 0M 0S"
But do I really need a chained call to ymd() and as.character()? Wouldn't as.POSIXct() suffice? Here's a try:
> as.period(interval(as.POSIXct(max(rfm$cohort), tz = 'GMT'),as.POSIXct(max(dates), tz = 'GMT')), months)
Error in while (any(start + est * per < end)) est[start + est * per < :
missing value where TRUE/FALSE needed
This didn't work. It seems that lubridate wants me to set the time zone for the interval, not separately for its ends. Like this:
> as.period(interval(as.POSIXct(max(rfm$cohort)),as.POSIXct(max(dates)), tz = 'GMT'), months)
[1] "1m 0d 0H 0M 0S"
I know that this is less typing, so I should just do what lubridate wants, but why wouldn't setting the time zone of interval ends separately also work? By my reading of it, ?interval suggests that the third argument of this function, tzone, should default to getting its value from the time zone of the first. Somehow as.POSIXct() won't reveal the time zone to interval(), even if it's explicitly set in the same call. What am I missing?
Related
I have a data frame which has a column labeled Time to indicate when an order was placed, all of the Time values are in a 07:16:00 format. I would like to convert these from a character to a numeric so that I can add another column that says the type of menu it was based off of the time.
Using as.numeric all the values become NA. Also tried to use strptime and values were also NA.
An approach is using lubridates hms.
Transforms a character or numeric vector into a period object with
the specified number of hours, minutes, and seconds.
library(lubridate)
times <- c("07:16:00", "07:18:12", "08:56:00")
new_time <- hms(times)
new_time - new_time[1]
[1] "0S" "2M 12S" "1H 40M 0S"
Wanted to get the interval between 2 days displayed in days. Using lubridate package.
Method 1 using interval function.
library(lubridate)
date1 <- as.Date("2022-08-08")
date2 <- as.Date("2022-09-08")
x <- interval(date1, date2)
print(x)
days(x)
Output as follows
[1] 2022-08-08 UTC--2022-09-08 UTC
[1] "2678400d 0H 0M 0S"
Question: Why the answer is not correct? 2678400 days!
Method 2 using difftime function.
y <- difftime(date1, date2, units="days")
print(y)
Output as follows
Time difference of 31 days
The thing is, I want it to display only 31 instead of the whole sentence "Time difference of 31 days"
Need some guidance here.
lubridate::days() works with numerics. You've given it a period. as.numeric(x) gives 2678400 (the number of seconds between 2022-08-08 and 2022-09-08?). You're a victim of implicit coercion.
#jay.sf has given you the solution for difftime. To get the correct answer using lubridate:
time_length(x, "days")
[1] 31
#JustJames gave the full explanation of what went wrong in their now-deleted answer:
"According to the docs
as.interval changes difftime, Duration, Period and numeric class objects to intervals that begin at the specified date-time. Numeric objects are first coerced to timespans equal to the numeric value in seconds."
Try this,
date1 <- as.Date("2022-08-08")
date2 <- as.Date("2022-09-08")
dateDiff <- as.numeric(difftime(date2, date1))
print(dateDiff)
Output
> dateDiff = as.numeric(difftime(date2, date1))
> print(dateDiff)
[1] 31
Hope this helps!
I have some MATLAB serial date number that I need to use in R but I havt to convert them to a normal date.
Matlab:
datestr(733038.6)
ans =
27-Dec-2006 14:24:00
you can see it gives the date and time.
Now we try in R:
Matlab2Rdate <- function(val) as.Date(val - 1, origin = '0000-01-01')
> Matlab2Rdate(733038.6)
[1] "2006-12-27"
It gives only the date but I need also the time? Any idea
The trick is Matlab uses "January 01, 0000", a fictional reference date, to calculate its date number. The origin of time for the "POSIXct" class in R is, ‘1970-01-01 00:00.00 UTC’. You can read about how different systems handle dates here.
Before converting, you need to account for this difference in reference from one format to another. The POSIX manual contains such an example. Here's my output:
> val<-733038.6
> as.POSIXct((val - 719529)*86400, origin = "1970-01-01", tz = "UTC")
[1] "2006-12-27 14:23:59 UTC"
Where 719529 is ‘1970-01-01 00:00.00 UTC’ in Matlab's datenum and 86400 the number of seconds in an standard UTC day.
I run discrete event simulations where the time originates from dates. I think that simulations run much faster, when I convert all the dates to integers (relative time in seconds).
What is the best way, to switch between date and seconds in a well definied way where I want to
set the reference time (e.g. "1970-01-01 00:00:00 GMT" or "2016-01-01 00:00:00 GMT") manually,
the time zone and
the origin (Not possible in lubridate?)
I thought I can use the origin for this purpose but it does not influence the result:
> as.numeric(as.POSIXct("2016-01-01 00:00:00 GMT",origin="2016-01-01",tz="GMT"))
> as.numeric(as.POSIXct("2016-01-01 00:00:00 GMT",origin="1970-01-01",tz="GMT"))
both result in [1] 1451606400.
(Only the tz argument changes the result, which is ok of course:
> as.numeric(as.POSIXct("2016-01-01 00:00:00 CEST", tz= "America/Chicago"))
[1] 1451628000)
You can use difftime() to calculate the difference between some timestamp and a reference time:
as.numeric(difftime(as.POSIXct("2016-01-01 00:00:00",tz="GMT"),
as.POSIXct("1970-01-01 00:00:00",tz="GMT"), units = "secs"))
## [1] 1451606400
By choosing another value for units, you could also get the number of minutes, hours, etc.
The reason that you get the same result for both choices of origin is that this argument is only intended to be used when converting a number into a date. Then, the number is interpreted as seconds since the origin that you pass to the function.
Internally, a POSIXct object is always stored as seconds since 1970-01-01 00:00:00, UTC, independent of the origin that you specified when doing the conversion. And accordingly, converting to numeric gives the same result for any choice of origin.
You can have a look at the documentation of as.POSIXct():
## S3 method for class 'character'
as.POSIXlt(x, tz = "", format, ...)
## S3 method for class 'numeric'
as.POSIXlt(x, tz = "", origin, ...)
As you can see, origin is only an argument for the method for numeric, but not for character.
I am attempting to eliminate the leading zero of a 12-hour time value, but for graphing purposes the result must be a POSIXlt value. Therefore, I can-not use regular expressions because they would leave the result as a character instead of a POSIXlt value.
My time value begins as a character.
a <- "02:57"
Then I use strptime to convert the character to the POSIXlt class. Within strptime, I use the conversion specification %l, which according to the strptime help, displays "12-hour clock time with single digits preceded by a blank".
b <- strptime(x = a, tz = "UTC", format = "%l")
The variable b is a POSIXlt value, and consists of "current date" + "02:57:00" + "local time zone". I can live with the date and time zone, but the leading zero of the 12-hour time value remains.
How can I eliminate the leading zero of the 12-hour time value and still retain POSIXlt class?
I appreciate any insight.
I would use lubridate, an excellent package for working with POSIXlt time objects.
library(lubridate)
a <- "02:57"
b <- hm(a)
yields
> b
[1] "2H 57M 0S"
no pesky 0 and sooo much other time goodness to boot. Good luck.