Calculating time between different rows - r

I am having issues coming up with a solution to calculate the difference in time between two dates that are not in the same row. For instance I have the following data:
dates_edited end_fast_c start_fast_c
1 4/1/21 2021-04-01 12:00:00 2021-04-01 21:30:00
2 4/2/21 2021-04-02 12:30:00 2021-04-02 23:30:00
I was using mutate(hours_fasted = difftime(start_fast_c,end_fast_c))
Which will only calculate between the same line. Is there a way for me to calculate between line 2 and 1 so that I could do the time between 2021-04-01 21:30 and 2021-04-02 12:30?

You could use difftime as already suggested by camille:
In case your datetimes are not in dttm format you could use ymd_hms function from lubridate package
Using window function lag from dplyr package gives you the possibility to calculate the difference from one row below.
adding units argument you could get the difference in minutes with mins or hours with hours etc...
library(lubridate)
library(dplyr)
df %>%
mutate(across(ends_with("_c"), ymd_hms)) %>%
mutate(time_diff_min = difftime(end_fast_c, lag(start_fast_c), units = "mins"))
dates_edited end_fast_c start_fast_c time_diff_min
<chr> <dttm> <dttm> <drtn>
1 4/1/21 2021-04-01 12:00:00 2021-04-01 21:30:00 NA mins
2 4/2/21 2021-04-02 12:30:00 2021-04-02 23:30:00 900 mins

You can generate a sheet in a spreadsheet and populate it with formulas and data, then retrieve results programmatically if a sheet is not your preferred output.
In Google Sheets (and maybe others), you can label a calculated cell as 'duration' using the '123' dropdown menu.
I entered your four time values in A1, A2 and B1 B2, changing their display formats to make sure they were entered correctly.
Then in cell C1 I entered the formula B1-A2 and got 9:00:00
Cell C1 can be formatted differently and\or used as the source of further formulas obviously.

Related

How to import date from CSV in dd/mm/yyyy hh:mm format in to general number string in R

Hope you can help i have massive CSV file i need to import in to R manipulate and export to excel, all other data is importing and manipulating fine apart from the Date format, the CSV is supplied (and cant be changed) with all dates with dd/mm/yyyy hh:mm, i need a way to strip it down to dd/mm/yyyy,(dd/mm/yy) all methods i have tried so far have altered the date to mm/dd/yyyy of give me multiple errors.
The only work around i have found is to convert the data in the CSV in to General format before importing it however the "live" CSV are to big to open and convert.
Any help would be great
One potential solution could be to use lubridate to parse the text strings of the date/time columns after import. From this you can extract the date and time (using date() and hms::as_hms()):
library(readr)
library(dplyr)
library(lubridate)
read_csv("Date_time\n
01/09/2021 19:30\n
19/12/2020 12:45\n
16/03/2019 00:15") %>%
mutate(Date_time = dmy_hm(Date_time),
Date = date(Date_time),
Time = hms::as_hms(Date_time))
#> # A tibble: 3 x 3
#> Date_time Date Time
#> <dttm> <date> <time>
#> 1 2021-09-01 19:30:00 2021-09-01 19:30
#> 2 2020-12-19 12:45:00 2020-12-19 12:45
#> 3 2019-03-16 00:15:00 2019-03-16 00:15
This at least gives you tidy and workable data imported into R, able to be formatted for printing. Does this reach your solution? If it's not working on your data then perhaps post a small sample (or representative sample) as an example to try and get working.
Created on 2021-12-07 by the reprex package (v2.0.1)

How to convert time to standard format and calculate time difference

newdf=data.frame(date=as.Date(c("2021-01-04","2021-01-05","2021-01-06","2021-01-07")),
time=c("10:32:29","11:25","12:18:42","09:58"))
This is my data frame. I want to calculate time difference between two consecutive days in hours. Could you please suggest a method to calculate? Note, some time values do not contain seconds. So, first we have to convert it to standard form. Could you please give me a method to solve all these problems. This is completely R programming.
Paste date and time together in one column, use parse_date_time to change the time value in standard format (Posixct) and use difftime to calculate difference between consecutive time in hours.
library(dplyr)
library(tidyr)
library(lubridate)
newdf %>%
unite(datetime, date, time, sep = ' ') %>%
mutate(datetime = parse_date_time(datetime, c('Ymd HMS', 'Ymd HM')),
difference_in_hours = round(as.numeric(difftime(datetime,
lag(datetime), 'hours')), 2))
# datetime difference_in_hours
#1 2021-01-04 10:32:29 NA
#2 2021-01-05 11:25:00 24.88
#3 2021-01-06 12:18:42 24.90
#4 2021-01-07 09:58:00 21.66

Issue merging dataframes in R using POSIXct

I have two dataframes (per_frame, values) - The first contains POSIXct values for a 24 hour period at 15 minute intervals.
periods = as.POSIXct(seq.POSIXt("2019-06-01 04:00:00 UTC","2019-06-02 03:45:00 UTC", by=900))
per_frame = data.frame(Period = periods)
The second contains a column for some of the time values above (but not all) and another for 'average value'.
Period
avg_value
2019-06-01 04:45:00
4
2019-06-01 05:00:00
7
2019-06-01 05:45:00
9
2019-06-01 08:45:00
2
2019-06-01 10:00:00
4
I want to create a new dataframe that adds the average values where available to the first dataframe, leaving 'missing values' where there aren't any. I thought this could be achieved easily using the below:
Combined= merge(per_frame, values, by = "Period", all.x = TRUE)
However, the new dataframe it creates has incorrect values for each Period. It is adding values to some time periods that don't have a corresponding average value in the values dataframe. I'm not sure what i'm doing incorrect here?
Apologies - I realised after some investigation that the timezones used in the two databases were different - hence the mismatch when merging. I'm not actually sure why this happened as i'm using the same data import to generate both the start and end values for the first dataframe and the values for the second. I was able to override it though using the 'tz' value in the as.POSIXct function.

difftime for multiple dates in r

I have chemistry water data taken from a river. Normally, the sample dates were on a Wednesday every two weeks. The data record starts in 1987 and ends in 2013.
Now, I want to re-check if there are any inconsistencies within the data, that is if the samples are really taken every 14 days. For that task I want to use the r function difftime. But I have no idea on how to do that for multiple dates.
Here is some data:
Date Value
1987-04-16 12:00:00 1,5
1987-04-30 12:00:00 1,2
1987-06-25 12:00:00 1,7
1987-07-14 12:00:00 1,3
Can you tell me on how to use the function difftime properly in that case or any other function that does the job. The result should be the number of days between the samplings and/or a true and false for the 14 days.
Thanks to you guys in advance. Any google-fu was to no avail!
Assuming your data.frame is named dd, you'll want to verify that the Date column is being treated as a date. Most times R will read them as a character which gets converted to a factor in a data.frame. If class(df$Date) is "character" or "factor", run
dd$Date<-as.POSIXct(as.character(dd$Date), format="%Y-%m-%d %H:%M:%S")
Then you can so a simple diff() to get the time difference in days
diff(dd$Date)
# Time differences in days
# [1] 14 56 19
# attr(,"tzone")
# [1] ""
so you can check which ones are over 14 days.

Binning time series in R?

I'm new to R. My data has 600k objects defined by three attributes: Id, Date and TimeOfCall.
TimeofCall has a 00:00:00 format and range from 00:00:00 to 23:59:59.
I want to bin the TimeOfCall attribute, into 24 bins, each one representing hourly slot (first bin 00:00:00 to 00:59:59 and so on).
Can someone talk me through how to do this? I tried using cut() but apparently my format is not numeric. Thanks in advance!
While you could convert to a formal time representation, in this case it might be easier to just use substr:
test <- c("00:00:01","02:07:01","22:30:15")
as.numeric(substr(test,1,2))
#[1] 0 2 22
Using a POSIXct time to deal with it would also work, and might be handy if you plan on further calculations (differences in time etc):
testtime <- as.POSIXct(test,format="%H:%M:%S")
#[1]"2013-12-09 00:00:01 EST" "2013-12-09 02:07:01 EST" "2013-12-09 22:30:15 EST"
as.numeric(format(testtime,"%H"))
#[1] 0 2 22
You can use cut.POsixlt function. But you should coerce your data to a valid time object. here I am using handy hms from lubridate. And strftime to get the time format.
library(lubridate)
x <- c("09:10:01", "08:10:02", "08:20:02","06:10:03 ", "Collided at 9:20:04 pm")
x.h <- strftime(cut(as.POSIXct(hms(x),origin=Sys.Date()),'hours'),
format='%H:%M:%S')
data.frame(x,x.h)
x x.h
1 09:10:01 10:00:00
2 08:10:02 09:00:00
3 08:20:02 09:00:00
4 06:10:03 07:00:00
5 Collided at 9:20:04 pm 22:00:00

Resources