How do I subtract Date column given as a character in R? - r

I want to add a column which is a subtraction of Store_Entry_Time from Store_Exit_Time.
For example the result for row 1 should be (2014-12-02 18:49:05.402863 - 2014-12-02 16:56:32.394052) = 1 hour 53 minutes approximately.( I want this result in just hours).
I entered class(Store_Entry_Time) and it says "character".
How do I obtain the subtracting and put it into new column as "Time Spent"?

You can use ymd_hms from lubridate to convert the column into POSIXct format and then use difftime to caluclate the difference in time.
library(dplyr)
df <- df %>%
mutate(across(c(Store_Entry_Time, Store_Exit_Time), lubridate::ymd_hms),
Time_Spent = as.numeric(difftime(Store_Exit_Time,
Store_Entry_Time, units = 'hours')))

For a base R option here, we can try using as.POSIXct:
df$Time_Spent <- as.numeric(as.POSIXct(df$Store_Exit_Time) -
as.POSIXct(df$Store_Entry_Time)
The above column would give the difference in time, measured in hours.
Example:
Store_Exit_Time <- "2014-12-02 18:49:05.402863"
Store_Entry_Time <- "2014-12-02 16:56:32.394052"
Time_Spent <- as.numeric(as.POSIXct(Store_Exit_Time) - as.POSIXct(Store_Entry_Time))
Time_Spent
[1] 1.875836

Related

How to use cut function on dates

I have the following two dates:
dates <- c("2019-02-01", "2019-06-30")
I want to create the following bins from above two dates:
2019-05-30, 2019-04-30, 2019-03-31, 2019-02-28
I used cut function along with seq,
dt <- as.Date(dates)
cut(seq(dt[1], dt[2], by = "month"), "month")
but this does not produce correct results.
Could you please shed some light on the use of cut function on dates?
We assume that what is wanted is all end of months between but not including the 2 dates in dates. In the question dates[1] is the beginning of the month and dates[2] is the end of the month but we do not assume that although if we did it might be simplified. We have produced descending series below but usually in R one uses ascending.
The first approach below uses a monthly sequence and cut and the second approach below uses a daily sequence.
No packages are used.
1) We define a first of the month function, fom, which given a Date or character date gives the Date of the first of the month using cut. Then we calculate monthly dates between the first of the months of the two dates, convert those to end of the month and then remove any dates that are not strictly between the dates in dates.
fom <- function(x) as.Date(cut(as.Date(x), "month"))
s <- seq(fom(dates[2]), fom(dates[1]), "-1 month")
ss <- fom(fom(s) + 32) - 1
ss[ss > dates[1] & ss < dates[2]]
## [1] "2019-05-31" "2019-04-30" "2019-03-31" "2019-02-28"
2) Another approach is to compute a daily sequence between the two elements of dates after converting to Date class and then only keep those for which the next day has a different month and is between the dates in dates. This does not use cut.
dt <- as.Date(dates)
s <- seq(dt[2], dt[1], "-1 day")
s[as.POSIXlt(s)$mon != as.POSIXlt(s+1)$mon & s > dt[1] & s < dt[2]]
## [1] "2019-05-31" "2019-04-30" "2019-03-31" "2019-02-28"
There is no need for cut here:
library(lubridate)
dates <- c("2019-02-01", "2019-06-30")
seq(min(ymd(dates)), max(ymd(dates)), by = "months") - 1
#> [1] "2019-01-31" "2019-02-28" "2019-03-31" "2019-04-30" "2019-05-31"
Created on 2021-11-25 by the reprex package (v2.0.1)

How to calculate time difference in R using an dataframe

Have an large data frame where there's 2 columns (POSIXct) and need to calculate length of ride.
Dates are formatted as follows:
format: "2020-10-31 19:39:43"
Can use the difftime function, correct?
Thanks
Given your data is using the correct POSIXct format you can simply subtract two dates to get the difference. No need for additional functions.
date1 <- as.POSIXct(strptime("2020-10-31 19:39:43", format = "%Y-%m-%d %H:%M:%OS"))
date2 <- as.POSIXct(strptime("2020-10-31 19:20:43", format = "%Y-%m-%d %H:%M:%OS"))
date1 - date2
Output: Time difference of 19 mins
It depends what output format you want.
For example if you want month difference between two dates, you can use the "interval" function from library "lubridate"
library(lubridate)
interval(as.Date(df$date1),as.Date(df$date2) %/% months(1))
It also works with years, weeks, days, hours

Spliting date/time data into constant time intervals

I already posted similar question here:
time split to constant daily intervals and summarise the results
Now I'm trying it in a simple version:
I have a data which contains date/time variable (call it x) of object POSIXct in the following format: yyyy-mm-dd HH:MM:SS.
The date is not really of my interest. What I'm trying to do is to split my time data into constant time intervals.
To make it clear, let's start with some reproducible example. Using dput, my x variable looks like:
structure(c(1495608914, 1495642528, 1495642529, 1495607831, 1495641488, 1495643715), class = c("POSIXct", "POSIXt"), tzone="")
I've been able to split it into time intervals using: split(x, cut((x), "30 mins"))
However, this method starts the splitting from the minimum time value I have in x; but, I'm interested in splitting the data to constant time intervals.
So, using my splitting method mentioned above, I'll get 20 groups starts at 06:37:00 with intervals of 30 minutes (and x will be splitted between 3 of that 20 groups with 2,1 and 3 observations). While I'm looking for some indication regarding the data point time interval:
x v1 v2 . . . x.ind
06:37:11 14
06:55:14 14
15:58:08 32
.
.
.
where 1 is for 00:00:00-00:30:00, 2 is for 00:30:00-01:00:00,..., 14 is for 06:30:00-07:00:00,..., 48 is for 23:30:00-00:00:00
A solution using dplyr for the join and grouping and lubridate for date rounding floor_date
library(dplyr)
library(lubridate)
observations <- data.frame(period = floor_date(x, unit = "30 minutes"), n=rep(1, length(x)))
intervals <- data.frame(period = seq.POSIXt(min(observations), max(observations), by = 30*60))
result <- intervals %>%
full_join(observations) %>%
group_by(period) %>%
summarize(n=sum(n,na.rm= TRUE))

extracting "yearmon" from character format and calculating age in R

I have two columns of data:
DoB: yyyy/mm
Reported date: yyyy/mm/dd
Both are in character format.
I'd like to calculate an age, by subtracting DoB from Reported Date, without adding a fictional day to the DoB, so that the age comes out as 28.5 (meaning 28 and a half years old).
Please can someone help me with the coding, I'm struggling!
Many thanks from an R newbie.
library(lubridate)
a <- "2010/02"
b <- "2014/12/25"
c <- ymd(b) - ymd(paste0(a, "/01")) # I don't think this can be done without adding a fictional day
c <- as(c/365.25, "numeric")
What would you want the age to be if the dates are:
DoB: 2015/01
Reported date: 2015/01/30
As suggested, lubridate is a great package for working with dates. You probably want some version using difftime. You also can still use ymd for the yyyy/mm by setting truncated=1 meaning the field can be missing.
df <- data.frame(DoB = c("1987/08", "1994/04"),
Report_Date = c("2015/03/05","2014/07/04"))
library(lubridate)
df$age_years <- with(df,
as.numeric(
difftime(ymd(Report_Date),
ymd(DoB, truncated=1)
)/365.25))
df
DoB Report_Date age_years
1 1987/08 2015/03/05 27.59206023
2 1994/04 2014/07/04 20.25735797
Unfortunately difftime doesn't have a 'years' unit so you also will need to divide the 'days' output that you get back.
Use the "yearmon" class in zoo. It represents time as years + fraction (where fraction is in the set 0, 1/12, ..., 11/12) and so does not require that fictitious days be added:
library(zoo)
as.yearmon("2012/01/10", "%Y/%m/%d") - as.yearmon("1983/07", "%Y/%m")
giving:
[1] 28.5

How do I subset every day except the last five days of zoo data?

I am trying to extract all dates except for the last five days from a zoo dataset into a single object.
This question is somewhat related to How do I subset the last week for every month of a zoo object in R?
You can reproduce the dataset with this code:
set.seed(123)
price <- rnorm(365)
data <- cbind(seq(as.Date("2013-01-01"), by = "day", length.out = 365), price)
zoodata <- zoo(data[,2], as.Date(data[,1]))
For my output, I'm hoping to get a combined dataset of everything except the last five days of each month. For example, if there are 20 days in the first month's data and 19 days in the second month's, I only want to subset the first 15 and 14 days of data respectively.
I tried using the head() function and the first() function to extract the first three weeks, but since each month will have a different amount of days according to month or leap year months, it's not ideal.
Thank you.
Here are a few approaches:
1) as.Date Let tt be the dates. Then we compute a Date vector the same length as tt which has the corresponding last date of the month. We then pick out those dates which are at least 5 days away from that:
tt <- time(zoodata)
last.date.of.month <- as.Date(as.yearmon(tt), frac = 1)
zoodata[ last.date.of.month - tt >= 5 ]
2) tapply/head For each month tapply head(x, -5) to the data and then concatenate the reduced months back together:
do.call("c", tapply(zoodata, as.yearmon(time(zoodata)), head, -5))
3) ave Define revseq which given a vector or zoo object returns sequence numbers in reverse order so that the last element corresponds to 1. Then use ave to create a vector ix the same length as zoodata which assigns such reverse sequence numbers to the days of each month. Thus the ix value for the last day of the month will be 1, for the second last day 2, etc. Finally subset zoodata to those elements corresponding to sequence numbers greater than 5:
revseq <- function(x) rev(seq_along(x))
ix <- ave(seq_along(zoodata), as.yearmon(time(zoodata)), FUN = revseq)
z <- zoodata[ ix > 5 ]
ADDED Solutions (1) and (2).
Exactly the same way as in the answer to your other question:
Split dataset by month, remove last 5 days, just add a "-":
library(xts)
xts.data <- as.xts(zoodata)
lapply(split(xts.data, "months"), last, "-5 days")
And the same way, if you want it on one single object:
do.call(rbind, lapply(split(xts.data, "months"), last, "-5 days"))

Resources