bifurcate count basis datetime in R - r

I have below-mentioned dataframe in R.
DF
ID Datetime Value
T-1 2020-01-01 15:12:14 10
T-2 2020-01-01 00:12:10 20
T-3 2020-01-01 03:11:11 25
T-4 2020-01-01 14:01:01 20
T-5 2020-01-01 18:07:11 10
T-6 2020-01-01 20:10:09 15
T-7 2020-01-01 15:45:23 15
By utilizing the above-mentioned dataframe, I want to bifurcate the count basis month and time bucket considering the Datetime.
Required Output:
Month Count Sum
Jan-20 7 115
12:00 AM to 05:00 AM 2 45
06:00 AM to 12:00 PM 0 0
12:00 PM to 03:00 PM 1 20
03:00 PM to 08:00 PM 3 35
08:00 PM to 12:00 AM 1 15

You can bin the hours of the day by using hour from the lubridate package and then cut from base R, before summarizing with dplyr.
Here, I am assuming that your Datetime column is actually in a date-time format and not just a character string or factor. If it is, ensure you have done DF$Datetime <- as.POSIXct(as.character(DF$Datetime)) first to convert it.
library(tidyverse)
DF$bins <- cut(lubridate::hour(DF$Datetime), c(-1, 5.99, 11.99, 14.99, 19.99, 24))
levels(DF$bins) <- c("00:00 to 05:59", "06:00 to 11:59", "12:00 to 14:59",
"15:00 to 19:59", "20:00 to 23:59")
newDF <- DF %>%
group_by(bins, .drop = FALSE) %>%
summarise(Count = length(Value), Total = sum(Value))
This gives the following result:
newDF
#> # A tibble: 5 x 3
#> bins Count Total
#> <fct> <int> <dbl>
#> 1 00:00 to 05:59 2 45
#> 2 06:00 to 11:59 0 0
#> 3 12:00 to 14:59 1 20
#> 4 15:00 to 19:59 3 35
#> 5 20:00 to 23:59 1 15
And if you want to add January as a first row (though I'm not sure how much sense this makes in this context) you could do:
newDF %>%
summarise(bins = "January", Count = sum(Count), Total = sum(Total)) %>% bind_rows(newDF)
#> # A tibble: 6 x 3
#> bins Count Total
#> <chr> <int> <dbl>
#> 1 January 7 115
#> 2 00:00 to 05:59 2 45
#> 3 06:00 to 11:59 0 0
#> 4 12:00 to 14:59 1 20
#> 5 15:00 to 19:59 3 35
#> 6 20:00 to 23:59 1 15
Incidentally, the reproducible version of the data I used for this was:
structure(list(ID = structure(1:7, .Label = c("T-1", "T-2", "T-3",
"T-4", "T-5", "T-6", "T-7"), class = "factor"), Datetime = structure(c(1577891534,
1577837530, 1577848271, 1577887261, 1577902031, 1577909409, 1577893523
), class = c("POSIXct", "POSIXt"), tzone = ""), Value = c(10,
20, 25, 20, 10, 15, 15)), class = "data.frame", row.names = c(NA,
-7L))

Related

Concatenate year, month, day and time problem in R

I have these columns in my dataframe, df:
year month day hour minute
2013 1 7 21 54
2013 3 20 13 59
2013 1 3 18 40
.. cols(
.. year = col_double(),
.. month = col_double(),
.. day = col_double(),
.. hour = col_double(),
.. minute = col_double(),
I want to have a new column, datetime:
datetime
2013/1/7 21:54
2013/3/20 13:59
2013/1/3 18:40
I have tried this:
library(readr)
library(dplyr)
df$datetime <- with(df, as.POSIXct(paste(year, month, day, hour, minute),
format = "%Y/%m/%d %H:%M:%S"))
and this:
df$DT <- as.POSIXct((paste(df$year, df$month, df$day, df$hour, df$minute)), format="%Y/%m/%d %H:%M:%S")
However, it gives me all NA values.
I could merge just the year, month and day with as.Date() though. How can I add times to it?
Also, how can I sort by datetime later on?
You could use your original syntax, but make sure you put the right separators between the various components of the date-time:
dat <- tibble::tribble(
~year, ~month, ~day, ~hour, ~minute,
2013, 1, 7, 21, 54,
2013, 3, 20, 13, 59,
2013, 1, 3, 18, 40)
dat$datetime <- with(dat, as.POSIXct(paste0(year, "/", month, "/", day, " ", hour, ":", minute, ":00"),
format = "%Y/%m/%d %H:%M:%S"))
dat
#> # A tibble: 3 × 6
#> year month day hour minute datetime
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dttm>
#> 1 2013 1 7 21 54 2013-01-07 21:54:00
#> 2 2013 3 20 13 59 2013-03-20 13:59:00
#> 3 2013 1 3 18 40 2013-01-03 18:40:00
Created on 2022-12-06 by the reprex package (v2.0.1)
When you tell as.POSIXct() that the format is "%Y/%m/%d %H:%M:%S", it expects to see a string that looks like that (e.g., "2013/01/03 13:59:00"). Your syntax was pasting them together with just spaces, making something like "2013 01 03 13 59" so when as.POSIXct() tried to parse the string, it didn't see the expected separators. You could also have gotten the same result by maintaining your original paste() specification and changing the format:
library(dplyr)
dat <- tibble::tribble(
~year, ~month, ~day, ~hour, ~minute,
2013, 1, 7, 21, 54,
2013, 3, 20, 13, 59,
2013, 1, 3, 18, 40)
dat$datetime <- with(dat, as.POSIXct(paste(year, month, day, hour, minute),
format = "%Y %m %d %H %M"))
arrange(dat, desc(datetime))
#> # A tibble: 3 × 6
#> year month day hour minute datetime
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dttm>
#> 1 2013 3 20 13 59 2013-03-20 13:59:00
#> 2 2013 1 7 21 54 2013-01-07 21:54:00
#> 3 2013 1 3 18 40 2013-01-03 18:40:00
Created on 2022-12-06 by the reprex package (v2.0.1)
The easiest way is to use make_datetime from lubridate. This function accepts the double inputs directly so you don't need to concatenate into a string yourself.
library(dplyr)
library(lubridate)
df |> mutate(datetime = make_datetime(year, month, day, hour, minute))
Output:
# A tibble: 3 × 6
year month day hour minute datetime
<dbl> <dbl> <dbl> <dbl> <dbl> <dttm>
1 2013 1 7 21 54 2013-01-07 21:54:00
2 2013 3 20 13 59 2013-03-20 13:59:00
3 2013 1 3 18 40 2013-01-03 18:40:00
Data:
library(readr)
df <- read_table("year month day hour minute
2013 1 7 21 54
2013 3 20 13 59
2013 1 3 18 40")
Update: This can also be sorted using arrange:
library(dplyr)
library(lubridate)
df |>
mutate(datetime = make_datetime(year, month, day, hour, minute)) |>
arrange(datetime)
Output:
# A tibble: 3 × 6
year month day hour minute datetime
<dbl> <dbl> <dbl> <dbl> <dbl> <dttm>
1 2013 1 3 18 40 2013-01-03 18:40:00
2 2013 1 7 21 54 2013-01-07 21:54:00
3 2013 3 20 13 59 2013-03-20 13:59:00
An alternative to #DaveArmstrong's answer, using tidyverse:
suppressPackageStartupMessages({
library(tidyr)
library(lubridate)
library(dplyr)
})
#> Warning: package 'lubridate' was built under R version 4.2.2
#> Warning: package 'timechange' was built under R version 4.2.2
test <- tibble::tribble(
~year, ~month, ~day, ~hour, ~minute,
2013, 1, 7, 21, 54,
2013, 3, 20, 13, 59,
2013, 1, 3, 18, 40)
test
#> # A tibble: 3 × 5
#> year month day hour minute
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2013 1 7 21 54
#> 2 2013 3 20 13 59
#> 3 2013 1 3 18 40
test |>
unite(col = datetime, everything(), sep = "-", remove = FALSE) |>
mutate(
datetime = ymd_hm(datetime)
)
#> # A tibble: 3 × 6
#> datetime year month day hour minute
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2013-01-07 21:54:00 2013 1 7 21 54
#> 2 2013-03-20 13:59:00 2013 3 20 13 59
#> 3 2013-01-03 18:40:00 2013 1 3 18 40
Created on 2022-12-06 with reprex v2.0.2
library(magrittr)
df <- tibble::tribble(
~year, ~month, ~day, ~hour, ~minute,
2013, 1, 7, 21, 54,
2013, 3, 20, 13, 59,
2013, 1, 3, 18, 40,
)
df %>%
# pad date elements with leading zeros so parsing works out
dplyr::mutate(month = stringr::str_pad(month, width = 2, pad = "0"),
day = stringr::str_pad(day, width = 2, pad = "0")) %>%
# parse as actual datetime
dplyr::mutate(datetime = lubridate::ymd_hm(paste0(year, month, day, hour, minute)))
#> # A tibble: 3 x 6
#> year month day hour minute datetime
#> <dbl> <chr> <chr> <dbl> <dbl> <dttm>
#> 1 2013 01 07 21 54 2013-01-07 21:54:00
#> 2 2013 03 20 13 59 2013-03-20 13:59:00
#> 3 2013 01 03 18 40 2013-01-03 18:40:00
Created on 2022-12-06 by the reprex package (v2.0.1)

Adding an extra column that gives a value for day of the week

I would like to add a column to my dataset that assigns to each date a number based on the week it is in
So I would have for day1 day2... etc day7 a value in the column for the days part of that week equal to 1, and for day8, day 9 etc... till day 14 a value equal to 2
what would be the best way to add that column ?
dput(head(sdata0))
structure(list(date = structure(c(18628, 18629, 18630, 18631,
18632, 18633), class = "Date"), launches = c(-0.423325435196192,
-0.95406180171082, -0.95406180171082, -0.95406180171082, 0.107410931318437,
-0.423325435196192), pledged = c(-0.242997575062835, -0.300759417946595,
-0.300759417946595, -0.300759417946595, 0.120035260531115, -0.103075942164302
), backers = c(-0.124417670254619, -0.269239525943361, -0.269239525943361,
-0.269239525943361, 0.0620404689446357, -0.0918327527246523),
total_goal = c(-0.314834573033319, -0.33600837985916, -0.33600837985916,
-0.33600837985916, -0.205436571099805, -0.283073862794557
), mean_goal = c(-0.350195946618206, -0.422316295398803,
-0.422316295398803, -0.422316295398803, -0.199945219991962,
-0.24201542344731), US = c(0.179454667531907, -0.720497098001238,
-0.720497098001238, -0.720497098001238, 0.179454667531907,
-0.720497098001238), `number of success` = c(0.23782061224498,
-0.594551530612449, -0.594551530612449, -0.594551530612449,
1.07019275510241, 0.23782061224498), duration_days = c(-0.0399540270332042,
-1.6958261375219, -1.6958261375219, -1.6958261375219, 0.0152417099830856,
-0.0399540270332042), Twitter = c(-2.35635395414648, -1.37949565613006,
-2.47410026685382, -1.21813959797556, -0.995729896195041,
-1.226861547065), replies = c(-1.11872430995012, -0.454408610464075,
-1.06845177052955, -0.874543404193084, -1.24799655417443,
-0.906861465249162), likes = c(-0.812127568832484, -0.63113030668481,
-1.40968119485432, -1.1127549475184, -1.2106558412922, -1.22498280135666
), retweets = c(-0.606241425199139, -0.766152931679175, -1.64441036779204,
-1.39868247694445, -1.31077301003134, -1.3509601949059),
group_date = c("01", "01", "01", "01", "01", "01")), row.names = c(NA,
6L), class = "data.frame")`
You can use the function week from lubridate like this:
library(dplyr)
library(lubridate)
sdata0 %>%
mutate(week_number = week(ymd(date)))
#> date launches pledged backers total_goal mean_goal US
#> 1 2021-01-01 -0.4233254 -0.2429976 -0.12441767 -0.3148346 -0.3501959 0.1794547
#> 2 2021-01-02 -0.9540618 -0.3007594 -0.26923953 -0.3360084 -0.4223163 -0.7204971
#> 3 2021-01-03 -0.9540618 -0.3007594 -0.26923953 -0.3360084 -0.4223163 -0.7204971
#> 4 2021-01-04 -0.9540618 -0.3007594 -0.26923953 -0.3360084 -0.4223163 -0.7204971
#> 5 2021-01-05 0.1074109 0.1200353 0.06204047 -0.2054366 -0.1999452 0.1794547
#> 6 2021-01-06 -0.4233254 -0.1030759 -0.09183275 -0.2830739 -0.2420154 -0.7204971
#> number of success duration_days Twitter replies likes retweets
#> 1 0.2378206 -0.03995403 -2.3563540 -1.1187243 -0.8121276 -0.6062414
#> 2 -0.5945515 -1.69582614 -1.3794957 -0.4544086 -0.6311303 -0.7661529
#> 3 -0.5945515 -1.69582614 -2.4741003 -1.0684518 -1.4096812 -1.6444104
#> 4 -0.5945515 -1.69582614 -1.2181396 -0.8745434 -1.1127549 -1.3986825
#> 5 1.0701928 0.01524171 -0.9957299 -1.2479966 -1.2106558 -1.3107730
#> 6 0.2378206 -0.03995403 -1.2268615 -0.9068615 -1.2249828 -1.3509602
#> group_date week_number
#> 1 01 1
#> 2 01 1
#> 3 01 1
#> 4 01 1
#> 5 01 1
#> 6 01 1
Created on 2022-07-30 by the reprex package (v2.0.1)
Base R approach without any dependencies:
sdata0["week_number"] <- sdata0["date"] |> format("%V")
sdata0["week_number"]
#> week_number
#> 1 53
#> 2 53
#> 3 53
#> 4 01
#> 5 01
#> 6 01
Have also a look at %U and %W in ?strptime if you need week numbers following US/UK conventions instead of ISO 8601.

Creating Labels for Dates

I am working in R. I have a data frame that consists of Sampling Date and water temperature. I have provided a sample dataframe below:
Date Temperature
2015-06-01 11
2015-08-11 13
2016-01-12 2
2016-07-01 12
2017-01-08 4
2017-08-13 14
2018-03-04 7
2018-09-19 10
2019-8-24 8
Due to the erratic nature of sampling dates (due to samplers ability to site) I am unable to classify years normally January 1 to December 31st and instead am using the beginning of the sampling period as the start of 1 year. In this case a year would start June 1st and End may 31st, that way I can accruately compare the years to one another. Thus I want 4 years to have the following labels
Year_One = "2015-06-01" - "2016-05-31"
Year_Two = "2016-06-01" - "2017-05-31"
Year_Three = "2017-06-01" - "2018-05-31"
Year_Four = "2018-06-01" - "2019-08-24"
My goal is to create an additional column with these labels but have thus far been unable to do so.
I create two columns year1 and year2 with two different approaches. The year2 approach needs that all the periods start june 1st and end may 31st (in your code the year_four ends 2019-08-24) so it may not be exactly what you need:
library(tidyverse)
library(lubridate)
dt$Date <- as.Date(dt$Date)
dt %>%
mutate(year1= case_when(between(Date, as.Date("2015-06-01") , as.Date("2016-05-31")) ~ "Year_One",
between(Date, as.Date("2016-06-01") , as.Date("2017-05-31")) ~ "Year_Two",
between(Date, as.Date("2017-06-01") , as.Date("2018-05-31")) ~ "Year_Three",
between(Date, as.Date("2018-06-01") , as.Date("2019-08-24")) ~ "Year_Four",
TRUE ~ "0")) %>%
mutate(year2 = paste0(year(Date-months(5)),"/", year(Date-months(5))+1))
The output:
# A tibble: 9 x 4
Date Temperature year1 year2
<date> <dbl> <chr> <chr>
1 2015-06-01 11 Year_One 2015/2016
2 2015-08-11 13 Year_One 2015/2016
3 2016-01-12 2 Year_One 2015/2016
4 2016-07-01 12 Year_Two 2016/2017
5 2017-01-08 4 Year_Two 2016/2017
6 2017-08-13 14 Year_Three 2017/2018
7 2018-03-04 7 Year_Three 2017/2018
8 2018-09-19 10 Year_Four 2018/2019
9 2019-08-24 8 Year_Four 2019/2020
Using strftime to get the years, then make a factor with levels on the unique values. I'd recommend numbers instead of words, because they can be coded automatically. Else, use labels=c("one", "two", ...).
d <- within(d, {
year <- strftime(Date, "%Y")
year <- paste("Year", factor(year, labels=seq(unique(year))), sep="_")
})
# Date temperature year
# 1 2017-06-01 1 Year_1
# 2 2017-09-01 2 Year_1
# 3 2017-12-01 3 Year_1
# 4 2018-03-01 4 Year_2
# 5 2018-06-01 5 Year_2
# 6 2018-09-01 6 Year_2
# 7 2018-12-01 7 Year_2
# 8 2019-03-01 8 Year_3
# 9 2019-06-01 9 Year_3
# 10 2019-09-01 10 Year_3
# 11 2019-12-01 11 Year_3
# 12 2020-03-01 12 Year_4
# 13 2020-06-01 13 Year_4
Data:
d <- structure(list(Date = structure(c(17318, 17410, 17501, 17591,
17683, 17775, 17866, 17956, 18048, 18140, 18231, 18322, 18414
), class = "Date"), temperature = 1:13), class = "data.frame", row.names = c(NA,
-13L))

Performing between-row comparisons of dates in R

I have a dataset of several thousand ICU patients covering several years. Some patients (each with a unique identifier, ID) have had multiple ICU admissions. Each row covers a single ICU admission, and therefore an individual patient may have multiple rows of data.
For each patient, I want to determine whether their ICU admission was:
A readmission during the same hospital stay. This could be identified by an icu_adm time occurring prior to their previous hosp_dis time, or by multiple rows with the same hosp_dis time.
A transfer to a different ICU for management of the same illness. I am defining this as an icu_adm time occurring within 24 hours of their previous hosp_dis time. These patients icu_dis time and hosp_dis time should be the same, as their hospital discharge occured from ICU.
A new admission of the same patient
I am able to use lubridate to compare times without difficulty, but I am stuck on how to do the between-row comparisons, especially for patients with multiple ICU admissions (who have new admissions, readmissions, and transfers all in the time period of interest).
Some example data:
ID site icu_adm icu_dis hosp_adm hosp_dis
1 A 2016-02-02 15:38:00 2016-02-06 14:25:00 2016-02-02 15:17:00 2016-02-06 14:25:00
1 B 2016-02-06 16:17:00 2016-02-16 14:16:00 2016-02-06 16:16:00 2016-03-16 17:50:00
2 C 2009-08-09 14:27:00 2009-08-10 15:06:00 2009-08-03 02:51:00 2009-09-02 00:00:00
3 C 2009-08-18 20:32:00 2009-08-27 15:10:00 2009-08-03 02:51:00 2009-09-02 00:00:00
3 A 2010-02-20 21:00:00 2010-03-03 13:00:00 2010-02-18 03:00:00 2010-03-18 15:21:00
3 B 2010-05-05 17:00:00 2010-05-08 09:13:00 2010-05-03 11:21:00 2010-05-20 17:18:00
Desired output would be:
ID … readmission transferred new_adm
1 0 0 1
1 0 1 0
2 0 0 1
2 1 0 0
3 0 0 1
3 0 0 1
I'm not entirely sure this will work with all of your data, but thought this might be helpful.
Using tidyverse (or dplyr package in this case), you can start by grouping by ID to look at transfers. Based on your definition, if your icu_adm time is less than 24 hours of the previous row's discharge time (hosp_dis), then it is considered an ICU transfer. You can use lag to compare with previous row, assume dates/times are in chronological order (if not, you can use arrange to order).
Next, you can group by ID, hosp_adm, and hosp_dis. This will help look at readmissions. After grouping, all rows of data after the first row (for the same hospital admission) will be considered ICU readmissions.
Then, everything left that is not a transfer or readmission could be considered a new ICU admission.
Let me know if this is what you had in mind.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(transfer = ifelse(abs(difftime(icu_adm, lag(hosp_dis, default = first(hosp_dis)), units = "hours")) < 24, 1, 0)) %>%
group_by(ID, hosp_adm, hosp_dis) %>%
mutate(readmission = ifelse(row_number() > 1, 1, 0),
new_adm = ifelse(transfer != 1 & readmission != 1, 1, 0))
Output
ID site icu_adm icu_dis hosp_adm hosp_dis transfer readmission new_adm
<int> <chr> <dttm> <dttm> <dttm> <dttm> <dbl> <dbl> <dbl>
1 1 A 2016-02-02 15:38:00 2016-02-06 14:25:00 2016-02-02 15:17:00 2016-02-06 14:25:00 0 0 1
2 1 B 2016-02-06 16:17:00 2016-02-16 14:16:00 2016-02-06 16:16:00 2016-03-16 17:50:00 1 0 0
3 2 C 2009-08-09 14:27:00 2009-08-10 15:06:00 2009-08-03 02:51:00 2009-09-02 00:00:00 0 0 1
4 2 C 2009-08-18 20:32:00 2009-08-27 15:10:00 2009-08-03 02:51:00 2009-09-02 00:00:00 0 1 0
5 3 A 2010-02-20 21:00:00 2010-03-03 13:00:00 2010-02-18 03:00:00 2010-03-18 15:21:00 0 0 1
6 3 B 2010-05-05 17:00:00 2010-05-08 09:13:00 2010-05-03 11:21:00 2010-05-20 17:18:00 0 0 1
Data
df <- structure(list(ID = c(1L, 1L, 2L, 2L, 3L, 3L), site = c("A",
"B", "C", "C", "A", "B"), icu_adm = structure(c(1454427480, 1454775420,
1249828020, 1250627520, 1266699600, 1273078800), tzone = "", class = c("POSIXct",
"POSIXt")), icu_dis = structure(c(1454768700, 1455632160, 1249916760,
1251385800, 1267621200, 1273309980), tzone = "", class = c("POSIXct",
"POSIXt")), hosp_adm = structure(c(1454426220, 1454775360, 1249267860,
1249267860, 1266462000, 1272885660), tzone = "", class = c("POSIXct",
"POSIXt")), hosp_dis = structure(c(1454768700, 1458150600, 1251849600,
1251849600, 1268925660, 1274375880), tzone = "", class = c("POSIXct",
"POSIXt"))), class = "data.frame", row.names = c(NA, -6L))

creating indicator variables if a time is within certain intervals

I have a column of times that have been entered as raw text. An example is below (code for data input at the bottom of the post):
#> id time
#> 1 NA <NA>
#> 2 1 7:50 pm
#> 3 2 7:20 pm
#> 4 3 3:20 pm
I would like to add indicator variables, that for example, indicate if the time is:
after 7pm
between 7pm and 7.30pm
So my desired output would look like this:
#> id time before_1930 between_1900_1930
#> 1 NA <NA> NA NA
#> 2 1 7:50 pm 0 0
#> 3 2 7:20 pm 1 1
#> 4 3 3:20 pm 1 0
So far, I have tried reading in the times with parse_date_time, but this adds on a date:
library(lubridate)
df <- df %>% mutate(time = lubridate::parse_date_time(time, '%I:%M %p'))
df
#> id time
#> 1 NA <NA>
#> 2 1 0000-01-01 19:50:00
#> 3 2 0000-01-01 19:20:00
#> 4 3 0000-01-01 15:20:00
Is there an easy way to work directly with the hours and minutes, and then create the dummy variables I mentioned?
Code for data input
df <- data.frame(
id = c(NA, 1, 2, 3),
time = c(NA, "7:50 pm", "7:20 pm", "3:20 pm")
)
Try this one:
library(dplyr)
library(lubridate)
data.frame(
id = c(NA, 1, 2, 3),
time = c(NA, "7:50 pm", "7:20 pm", "3:20 pm")
) %>%
mutate(real_time = lubridate::parse_date_time(time, '%I:%M %p'),
is_before = case_when(
hour(real_time) < 19 ~ "Before 19",
hour(real_time) == 19 & minute(real_time) < 30 ~ "19:00 - 19:30",
T ~ "After 19:30"
))
id time real_time is_before
1 NA <NA> <NA> After 19:30
2 1 7:50 pm 0000-01-01 19:50:00 After 19:30
3 2 7:20 pm 0000-01-01 19:20:00 19:00 - 19:30
4 3 3:20 pm 0000-01-01 15:20:00 Before 19
Rather than trying to deal with it as a date/time, use your output from parse_date_time to calculate the number of hours since midnight on 0000-01-01.
df <- data.frame(
id = c(NA, 1, 2, 3),
time = c(NA, "7:50 pm", "7:20 pm", "3:20 pm")
)
library(dplyr)
library(lubridate)
df <- df %>% mutate(time = lubridate::parse_date_time(time, '%I:%M %p'),
time = difftime(time,
as.POSIXct("0000-01-01", tz = "UTC"),
units = "hours"),
before_1930 = as.numeric(time < 19.5),
between_1900_1930 = as.numeric(time > 19 & time < 19.5))
df

Resources