Why always get month=0 in as.period (lubridate package) in R - r

Using the as.period I like to know the exact difference in year, month and date. However, I always get month value equal to zero and it transforms the month into date as following:
library(lubridate)
as.period((dmy("01/06/1981")- dmy("30/07/1979")),units="year")
estimate only: convert difftimes to intervals for accuracy
[1] "1y 0m 306d 18H 0M 0S"
How can I get the corresponding month as well?
And how can I format output which does not show hour, minute and second?
Thanks

Create an interval, then coerce it to a period:
library(lubridate)
ii <- new_interval(dmy("01/05/1981"),dmy("30/07/1979"))
as.period(ii,units="months")
[1] "-21m -1d 0H 0M 0S"
res <- as.period(ii,units="years")
[1] "-1y -9m -1d 0H 0M 0S"
EDIT
I don't think that you can remove the hour,minutes parts. But you can write your print function:
paste0(c(res#year,res#month,res#day),c('y','m','d'),collapse=' ')
"-1y -9m -1d"

Related

A question about difference between two dates in R [duplicate]

This question already has answers here:
Get the difference between dates in terms of weeks, months, quarters, and years
(9 answers)
How do I calculate age as years, months, and days in R?
(1 answer)
Closed 3 days ago.
I have two coloumns START and DATE, I need to calculate the difference between the dates (in year, month and days) and then calculate average of the differences mean (START-END). Thanks for the help. This question is not the same to other questions which have been already posted before
Data:
start end diff
"2020-5-16" "2029-7-15" 9 years, 2 months and etc
"2024-5-16" "2028-3-13"
"2023-4-17" "2025-4-12"
"2026-7-18" "2028-5-16"
star<-c("2020-5-16" , "2024-5-16" , "2023-4-17" , "2026-7-18")
end<-c( "2029-7-15", "2028-3-13","2025-4-12","2028-5-16")
data<-data.frame(star,end)
You can use lubridate::as.period:
library(lubridate)
star<-c("2020-5-16" , "2024-5-16" , "2023-4-17" , "2026-7-18")
end<-c( "2029-7-15", "2028-3-13","2025-4-12","2028-5-16")
as.period(interval(as.Date(star), as.Date(end)))
#[1] "9y 1m 29d 0H 0M 0S" "3y 9m 26d 0H 0M 0S" "1y 11m 26d 0H 0M 0S" "1y 9m 28d 0H 0M 0S"

Converting Timestamp from chr to Time/Data-Type in R

I started working with R and I have a small problem with Timestamps.
I am cleaning my Data for a case study and I don't like to see the Timestamps classified as a chr Datatype.
$ time <chr> "00:00:00", "01:00:00", "02:00:00", "03:00:00", "04:00:00…
I want to convert them to an adequate data type.
I am first of all confused about the right data type for this information.
I saw people converting them into a date, which does not seem right to me?
I tried different strategies that I saw online but I was just not successful.
There must be an easy way to convert this data to another data type.
If someone could help me, I would be very thankful.
These could be represented as chron "times" objects (which are internally represented as a fraction of a day) or lubridate "Period" S4 objects. Externally they are rendered as shown.
ch <- c("00:00:00", "01:00:00", "02:00:00", "03:00:00")
library(chron)
times(ch)
## [1] 00:00:00 01:00:00 02:00:00 03:00:00
library(lubridate)
hms(ch)
## [1] "0S" "1H 0M 0S" "2H 0M 0S" "3H 0M 0S"

Time of day conversion into numeric and mutation in R

I am working in R. My data set has a column of times (hh:mm:ss). The column is listed as factors.
Ultimately, I would like to be able to get the diff() between the values in this column (that is, the difference between time1 and time2 in the same column). I can't do that with factors.
How would I convert the times to another form that will allow me to calculate the difference in times within the same column?
Any help would be really appreciated!!
Try this code:
library(lubridate)
library(dplyr)
id <- c(1,2,3,4)
time1 <- hms(factor(c("09:00:01","04:02:50","10:30:21","11:15:25")))
time2 <- hms(factor(c("00:00:01","01:02:00","09:30:11","14:15:25")))
df<-data.frame(id, time1, time2)
df %>%
group_by(id) %>%
summarize(t_diff = time1 - time2)
t_diff
1 9H 0M 0S
2 3H 0M 50S
3 1H 0M 10S
4 -3H 0M 0S
taking what you've suggested:
x <- factor(c("12:34:56", "12:35:45", "12:48:00"))
y <- factor(c("12:42:56", "13:22:41", "17:11:21"))
## convert to time
x <- strptime(x, format = "%H:%M:%S")
y <- strptime(y, format = "%H:%M:%S")
## now you want a vector of differences
difftime(y,x, units="hours")
Time differences in hours
[1] 0.1333333 0.7822222 4.3891667
Edit: I misread your question. You want differences within one column. Remember that you'll end up with the fencepost problem because your output will have one fewer rows than your column. Try
diff.difftime(x)
Time differences in seconds
[1] 49 735

How do I make a period or duration of a month in lubridate?

How do I make a period or duration of a month in lubridate?
minutes(2) ## period
dminutes(2) ## duration
# Does not work:
months(1) # is from base r
dmonths(1) # does not exist
What am I missing?
EDIT: I just noticed that even though it seems as if months is from base R, when ?months... if I detach lubridate and then run months(1) package lubridate is loaded... what's up with that?
edit: as the commenters clarified, this doesn't return a duration object.
library(lubridate)
months(1)
# [1] "1m 0d 0H 0M 0S"
I think this is what you're looking for. You can find more details on the lubridate vignette:https://github.com/hadley/lubridate/blob/master/vignettes/lubridate.Rmd

Setting the POSIXct time zone in lubridate::interval()

I have a data set, rfm, which has a column named cohort, of class Date as shown below. I also have a vector of calendar dates, rather cleverly named dates, whose elements are also of class Date:
> class(rfm)
[1] "tbl_df" "data.frame"
> class(rfm$cohort)
[1] "Date"
> class(dates)
[1] "Date"
Both rfm$cohort and dates show the date of the first of the month over a certain time period, with the possibility that dates covers a more recent set of months. My problem is simple: I just want to see how many months there are between max(rfm$cohort) and max(dates).
The lubridate package makes this easy with the interval() function, but the arguments of that function must be POSIXct, not Date objects:
> as.period(interval(ymd(as.character(max(rfm$cohort))),ymd(as.character(max(dates)))), months)
[1] "1m 0d 0H 0M 0S"
But do I really need a chained call to ymd() and as.character()? Wouldn't as.POSIXct() suffice? Here's a try:
> as.period(interval(as.POSIXct(max(rfm$cohort), tz = 'GMT'),as.POSIXct(max(dates), tz = 'GMT')), months)
Error in while (any(start + est * per < end)) est[start + est * per < :
missing value where TRUE/FALSE needed
This didn't work. It seems that lubridate wants me to set the time zone for the interval, not separately for its ends. Like this:
> as.period(interval(as.POSIXct(max(rfm$cohort)),as.POSIXct(max(dates)), tz = 'GMT'), months)
[1] "1m 0d 0H 0M 0S"
I know that this is less typing, so I should just do what lubridate wants, but why wouldn't setting the time zone of interval ends separately also work? By my reading of it, ?interval suggests that the third argument of this function, tzone, should default to getting its value from the time zone of the first. Somehow as.POSIXct() won't reveal the time zone to interval(), even if it's explicitly set in the same call. What am I missing?

Resources