How to get hours and minutes by subtracting two dates with R - r

I have two columns in R and both of them are dates with the following format..
1/2/2015 3:00:00 PM
I need to create a new column which is 'Hours' which would be the amount of hours have passed between the two dates. I've tried this but it gives me the difference in days..
col1 <- df$collection.when
col2 <- df$discardable_when
col3 <- as.Date(col1) - as.Date(col2)
head(col3)
# > Time differences in days
[1] -393 NA NA NA -485 NA
EDIT:
It seems that #HubertL answer would be the correct solution, however I cannot get complete a data.frame for an unknown reason.. heres a screenshot that shows my workflow. Any help is greatly appreciated.

Instead of as.Date, use as.POSIXct which includes time information.
You can also use difftime to specify units of output.
time1 = as.POSIXct("2015-01-01 01:00:00")
time2 = as.POSIXct("2015-03-02 05:00:00 PM")
difftime(time2, time1, units = "min")
Time difference of 86640 mins

Maybe Dates don't have time.
You probably could use POSIXct :
a <- as.POSIXct(strptime("1/2/2015 2:00:00 PM", "%m/%d/%Y %I:%M:%S %p"), tz="")
b <- as.POSIXct(strptime("1/2/2015 5:00:00 PM", "%m/%d/%Y %I:%M:%S %p"), tz="")
c <- b - a;
units(c) <- "hours"
as.numeric(c)
[1] 3

Related

Converting character to dates with hours and minutes

I'm having trouble converting character values into date (hour + minutes), I have the following codes:
start <- c("2022-01-10 9:35PM","2022-01-10 10:35PM")
end <- c("2022-01-11 7:00AM","2022-01-11 8:00AM")
dat <- data.frame(start,end)
These are all in character form. I would like to:
Convert all the datetimes into date format and into 24hr format like: "2022-01-10 9:35PM" into "2022-01-10 21:35",
and "2022-01-11 7:00AM" into "2022-01-11 7:00" because I would like to calculate the difference between the dates in hrs.
Also I would like to add an ID column with a specific ID, the desired data would like this:
ID <- c(101,101)
start <- c("2022-01-10 21:35","2022-01-10 22:35")
end <- c("2022-01-11 7:00","2022-01-11 8:00")
diff <- c(9,10) # I'm not sure how the calculations would turn out to be
dat <- data.frame(ID,start,end,diff)
I would appreciate all the help there is! Thanks!!!
You can use lubridate::ymd_hm. Don't use floor if you want the exact value.
library(dplyr)
library(lubridate)
dat %>%
mutate(ID = 101,
across(c(start, end), ymd_hm),
diff = floor(end - start))
start end ID diff
1 2022-01-10 21:35:00 2022-01-11 07:00:00 101 9 hours
2 2022-01-10 22:35:00 2022-01-11 08:00:00 101 9 hours
The base R approach with strptime is:
strptime(dat$start, "%Y-%m-%d %H:%M %p")
[1] "2022-01-10 09:35:00 CET" "2022-01-10 10:35:00 CET"

R: Date Operation Results in Empty Dates

I am using the R programming language. I am trying to take the difference between two date columns. Both dates are in the following format : 2010-01-01 12:01
When I bring my file into R, the dates are in "Factor" format. Here is my attempt to recreate the file in R:
#how my file looks like when I import it into R
date_1 = c("2010-01-01 13:01 ", "2010-01-01 14:01" )
date_2 = c("2010-01-01 15:01 ", "2010-01-01 16:01" )
file = data.frame(date_1, date_2)
file$date_1 = as.factor(file$date_1)
file$date_2 = as.factor(file$date_2)
Now, I am trying to create a new column which takes the difference between these dates (in minutes)
I first tried to convert both date variables into the appropriate "Date" formats:
#convert to date formats:
file$date_a = as.POSIXlt(file$date_1,format="%Y-%m-%dT%H:%M")
file$date_b = as.POSIXlt(file$date_2,format="%Y-%m-%dT%H:%M")
Then, I tried to take the difference :
file$diff = difftime(file$date_a, file$date_b, units="mins")
But this results in "NA's":
> file
date_1 date_2 date_a date_b diff
1 2010-01-01 13:01 2010-01-01 13:01 <NA> <NA> NA mins
2 2010-01-01 13:01 2010-01-01 13:01 <NA> <NA> NA mins
Can someone please show me what I am doing wrong?
Thanks
Reference: How to get difference (in minutes) between two date strings?
There is no T in the string. So, we need the format as
difftime(as.POSIXct(file$date_1, format = '%Y-%m-%d %H:%M'),
as.POSIXct(file$date_2, format = '%Y-%m-%d %H:%M'), units = 'mins')
#Time differences in mins
#[1] -120 -120

Change the format of date into default format and add weekday column to the respective timestamps

I have column called trip_start_timestamp with values like "01/23/2020 03:00:00 PM" and column datatype is factor. I am looking to have the column values as "2020/01/23 15:00:00" and the weekday for the specific value like "thursday".
You can use mdy_hms from lubridate to get data into POSIXct and use weekdays to get day of the week.
library(dplyr)
library(lubridate)
df %>%
mutate(trip_start_timestamp = mdy_hms(trip_start_timestamp),
weekday = weekdays(trip_start_timestamp))
# trip_start_timestamp weekday
#1 2020-01-23 15:00:00 Thursday
#2 2020-01-25 01:00:00 Saturday
In base R :
df$trip_start_timestamp <- as.POSIXct(df$trip_start_timestamp,
format = '%m/%d/%Y %I:%M:%S %p', tz = 'UTC')
df$weekday <- weekdays(df$trip_start_timestamp)
df
data
df <- data.frame(trip_start_timestamp = factor(c("01/23/2020 03:00:00 PM",
"01/25/2020 01:00:00 AM")))

Remove seconds from some observations to work in HM format using R

I have a column called "time" with some observations in "hours: minutes: seconds" and others only with "hours: minutes". I would like to remove the seconds and be left with only hours and minutes.
So far I have loaded the lubridate package and tried:
format(data$time ,format = "%H:%M")
but no change occurs.
And with:
data$time <- hm(data$time)
all the observations with h:m:s become NAs
What should I do?
You can use parse_date_time from lubridate to bring time into POSIXct format and then use format to keep the information that you need.
data <- data.frame(time = c('10:04:00', '14:00', '15:00', '12:34:56'))
data$time1 <- format(lubridate::parse_date_time(x, c('HMS', 'HM')), '%H:%M')
data
# time time1
#1 10:04:00 10:04
#2 14:00 14:00
#3 15:00 15:00
#4 12:34:56 12:34

Filling not observed observations

I want to make a time series with the frequency a date and time is observed. The raw data looked something like this:
dd-mm-yyyy hh:mm
28-2-2018 0:12
28-2-2018 11:16
28-2-2018 12:12
28-2-2018 13:22
28-2-2018 14:23
28-2-2018 14:14
28-2-2018 16:24
The date and time format is in the wrong way for R, so I had to adjust it:
extracted_times <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
I ordered the data with frequency in a table using the following code:
timeserieswithoutzeros <- table(extracted_times)
The data looks something like this now:
2018-02-28 00:11:00 2018-02-28 01:52:00 2018-02-28 03:38:00
1 2 5
2018-02-28 04:10:00 2018-02-28 04:40:00 2018-02-28 04:45:00
2 1 1
As you may see there are a lot of unobserved dates and times.
I want to add these unobserved dates and times with the frequency of 0.
I tried the complete function, but the error states that it can't best used, because I use as.POSIXct().
Any ideas?
As already mentinoned in the comments by #eric-lecoutre, you can combine your observations with a sequence begining at the earliest ending at the last date using seq and subtract 1 of the frequency table.
timeseriesWithzeros <- table(c(extracted_times, seq(min(extracted_times), max(extracted_times), "1 min")))-1
Maybe the following is what you want.
First, coerce the data to class "POSIXt" and create the sequence of all date/time between min and max by steps of 1 minute.
bedrijf.CSV$viewed_at <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
new <- seq(min(bedrijf.CSV$viewed_at),
max(bedrijf.CSV$viewed_at),
by = "1 mins")
tmp <- data.frame(viewed_at = new)
Now see if these values are in the original data.
tmp$viewed <- tmp$viewed_at %in% bedrijf.CSV$viewed_at
tbl <- xtabs(viewed ~ viewed_at, tmp)
sum(tbl != 0)
#[1] 7
Final clean up.
rm(new, tmp)

Resources