I am looking at how extubation rates in an intensive care unit have changed over the course of the pandemic.
I have a data set which has hourly timestamps next to a category of airway types which simplified looks like this:
Time
AirwayStatus
2020/01/01 00:00
ETT/LMA
2020/01/01 01:00
ETT/LMA
2020/01/01 02:00
Own Airway
2020/01/01 03:00
Own Airway
2020/01/01 04:00
ETT/LMA
What I am effectively looking to do is find the times when the patient is extubated (ETT/LMA turns to Own Airway) and also when intubated (own airway to ETT/LMA). Eventually I want to be able to see how often an extubated patient has to be re-intubated.
Within 48 hours this is known as a failed extubation and we are expecting to see vastly different data during the pandemic compared to before.
The ideas I have so far are creating a seperate column with the airwayStatus of the prior hour and then if these are not the same then counting this. This seems unsophisticated though and I was hoping some of you clever people may have a nicer option.
Thank you in advance
Using dplyr from tidyverse:
Supposing you have a dataframe (or tibble) df and patient(?) id ID:
library(dplyr)
df <- tibble(
ID = c(1,1,1,1,1),
Time = c("2020/01/01 00:00", "2020/01/01 01:00", "2020/01/01 02:00", "2020/01/01 03:00", "2020/01/01 04:00"),
AirwayStatus = c("ETT/LMA", "ETT/LMA", "Own Airway", "Own Airway", "ETT/LMA"))
df <- df %>%
group_by(ID) %>%
arrange(Time) %>%
mutate(
Extubated = ifelse(AirwayStatus == "Own Airway" & lag(AirwayStatus) == "ETT/LMA", TRUE, FALSE),
Intubated = ifelse(AirwayStatus == "ETT/LMA" & lag(AirwayStatus) == "Own Airway", TRUE, FALSE))
result <- df %>%
summarise_at(c("Extubated", "Intubated"), sum, na.rm = TRUE)
result
Result:
# A tibble: 1 x 3
ID Extubated Intubated
<dbl> <int> <int>
1 1 1 1
This allows grouping by patient id which you will most likely do.
It's a bit longer than Oliver's answer though.
Your idea is the right way to go. You can skip storing intermediary results but they have to be estimated anyway. Lets assume your data is called df, then we could do something similar to
# Read table: (Could get read.table to work)
library(data.table)
df <- fread("Time AirwayStatus
2020/01/01 00:00 ETT/LMA
2020/01/01 01:00 ETT/LMA
2020/01/01 02:00 Own Airway
2020/01/01 03:00 Own Airway
2020/01/01 04:00 ETT/LMA")
setDF(df)
# Convert time to a date format
df$Time <- as.POSIXct(df$Time)
n <- nrow(df)
# Find changes
df$change <- with(df, c(FALSE, AirwayStatus[seq(n - 1)] != AirwayStatus[seq(2, n)]))
# estimate the length of time since last change
df$hours_between_change[df$change] <- with(df, diff(c(NA, Time[change])) / 3600)
df
Time AirwayStatus change hours_between_change
1 2020-01-01 00:00:00 ETT/LMA FALSE NA
2 2020-01-01 01:00:00 ETT/LMA FALSE NA
3 2020-01-01 02:00:00 Own Airway TRUE NA
4 2020-01-01 03:00:00 Own Airway FALSE NA
5 2020-01-01 04:00:00 ETT/LMA TRUE 2
Note I store the intermediate results here. We likely could make it a bit more readable using dplyr but this does the job.
Here is an approach using dplyr.
First, you might want to consider a separate column to indicate an intubation or extubation "event." If someone is "Own Airway" and then the previous row has "ETT/LMA", we assume the person has been extubated. The opposite can also be determined for intubation.
Then, you can filter and only focus on these events.
For each event, you may want to capture when the event is "Extubation", and then following event is "Intubation", and the time difference is < 48 hrs. If this is true, then the extubation is actually a "failed extubation."
This may handle situations where someone has data that begins with "Own Airway" and gets intubated (if no extubation event, then cannot be failed extubation). It will also keep extubation events where the time difference is > 48 hrs as well.
library(tidyverse)
df %>%
mutate(Event = case_when(
AirwayStatus == "Own Airway" & lag(AirwayStatus) == "ETT/LMA" ~ "Extubation",
AirwayStatus == "ETT/LMA" & lag(AirwayStatus) == "Own Airway" ~ "Intubation",
TRUE ~ NA_character_)
) %>%
filter(!is.na(Event)) %>%
mutate(Event = ifelse(
Event == "Extubation" & lead(Event) == "Intubation" & (lead(Time) - Time < 48),
"Failed Extubation",
Event
))
Output
Time AirwayStatus Event
1 2020-01-01 02:00:00 Own Airway Failed Extubation
2 2020-01-01 04:00:00 ETT/LMA Intubation
Data
df <- structure(list(Time = structure(c(1577858400, 1577862000, 1577865600,
1577869200, 1577872800), class = c("POSIXct", "POSIXt"), tzone = ""),
AirwayStatus = c("ETT/LMA", "ETT/LMA", "Own Airway", "Own Airway",
"ETT/LMA"), Event = c(NA, NA, "Extubated", NA, "Intubated"
)), row.names = c(NA, -5L), class = "data.frame")
Related
I have participant data during an exercise test, which includes participant ID, the condition (either Environmental or Control) and the total time taken to complete the test. A small example of my data:
RawData <- data.frame(
ParticipantID = c (1:6),
Condition = c("Control","Experimental","Experimental","Control","Experimental","Control"),
Time = c("04:34:22","02:48:47","04:22:06","02:57:11","02:07:11","05:34:22"))
I then used the lubridate package so I have time in hms via:
RawData <- RawData %>%
mutate(TotalTime = hms::as_hms(Time))
Now I wish to create a new column, that bins each RawData$TotalTime result into a category including: Sub2, Sub230, Sub3, Sub330, Sub4, Sub430, Sub5, Sub530 and Sub6. I could probably do this via a long case_when statement but is there an easy way to do this in lubridate given I am after 30 minute intervals?
My desired output would be:
RawData <- data.frame(
ParticipantID = c (1:6),
Condition = c("Control","Experimental","Experimental","Control","Experimental","Control"),
Time = c("04:34:22","02:48:47","04:22:06","02:57:11","02:07:11","05:34:22"),
Category = c("Sub5","Sub3","Sub430","Sub3","Sub230","Sub6"))
Thank you!
You can use ceiling_date function with units as "30 mins".
library(dplyr)
library(lubridate)
RawData %>%
mutate(TotalTime = as.POSIXct(Time, format = '%T'),
Category = format(ceiling_date(TotalTime, '30 mins'), "%H%M")) %>%
select(-TotalTime)
# ParticipantID Condition Time Category
#1 1 Control 04:34:22 0500
#2 2 Experimental 02:48:47 0300
#3 3 Experimental 04:22:06 0430
#4 4 Control 02:57:11 0300
#5 5 Experimental 02:07:11 0230
#6 6 Control 05:34:22 0600
I am trying to summarize time from 4 am to 12 pm as morning, 12-9 as evening and 9 pm to 4 am as night. I am doing this to make a logistic regression model to know if the arrest would happen or not considering the type of crime and the time of the crime.
I have tried using the lubridate function but because the format is the string I am not able to use the function. And, as.Date function is neither of help since some of the strings are having this value: 03/26/2015 06:56:30 PM while some of the rows have this value: 04-12-15 20:24. Both the formatting are totally different so not able to use the as.Date function.
Apart from the as.Date function what we can do is convert all the 04-12-15 20:24 to 03/26/2015 06:56:30 PM format by doing something like => if you find - then replace it with / (for the date format).
I don't know how to achieve this goal.
You can use case_when() from the dplyr library to determine the format of the date and then proceed with the conversion based on the format type. From there we check the 24H time component to determine the time of day based on the bins in the OP.
library(dplyr)
chicago15 <- data.frame(Date = c("03/26/2015 06:56:30 PM","04-12-15 20:24",
"03/26/2015 06:56:30 AM","04-12-15 21:24",
"12/31/2017 03:28:43 AM"))
chicago15 %>%
dplyr::mutate(Date2 = dplyr::case_when(
grepl('-',Date) ~ as.POSIXct(Date,format = '%m-%d-%y %H:%M'),
TRUE ~ as.POSIXct(Date,format = '%m/%d/%Y %I:%M:%S %p')
)) %>%
dplyr::mutate(Time_of_Day = dplyr::case_when(
as.numeric(format(Date2,'%H')) >= 21 ~ 'night',
as.numeric(format(Date2,'%H')) >= 12 ~ 'evening',
as.numeric(format(Date2,'%H')) >= 4 ~ 'morning',
TRUE ~ 'night'
))
Date Date2 Time_of_Day
1 03/26/2015 06:56:30 PM 2015-03-26 18:56:30 evening
2 04-12-15 20:24 2015-04-12 20:24:00 evening
3 03/26/2015 06:56:30 AM 2015-03-26 06:56:30 morning
4 04-12-15 21:24 2015-04-12 21:24:00 night
5 12/31/2017 03:28:43 AM 2017-12-31 03:28:43 night
How do I write this code (hour is from lubridate package)?
Objective: if hour part of PICK_DATE is later than 16:00, the ADJ_PICK_DATE should be next day 03:00. If the hour part of PICK_DATE is earlier than 03:00, then ADJ_PICK_DATE is to be same day 03:00. Problem is, when there is no change needed, the code still adds 3 hours to the PICK_DATE i.e. when the hour part of PICK_DATE is within 03:00 and 16:00.
x$PICK_TIME <- cut(hour(x$PICK_DATE), c(-1, 2, 15, 24), c("EARLY", "OKAY", "LATE"))
x$ADJ_PICK_DATE <- ifelse(x$PICK_TIME=="EARLY",
as.POSIXct(paste(format(x$PICK_DATE, "%d-%b-%Y"), "03:00"),
format="%d-%b-%Y %H:%M"), x$PICK_DATE)
x$ADJ_PICK_DATE <- ifelse(x$PICK_TIME=="LATE",
as.POSIXct(paste(format(x$PICK_DATE+86400, "%d-%b-%Y"),
"03:00"), format="%d-%b-%Y %H:%M"),
x$ADJ_PICK_DATE)
x$ADJ_PICK_DATE <- as.POSIXct(x$ADJ_PICK_DATE, origin = "1970-01-01")
Help please.
Sample data:
PICK_DATE SHIP_DATE
01-APR-2017 00:51 02-APR-2017 06:55 AM
01-APR-2017 00:51 02-APR-2017 12:11 PM
01-APR-2017 00:51 02-APR-2017 12:11 PM
01-APR-2017 00:51 02-APR-2017 09:39 AM
Here is a simple, reproducible example. I had to make up some sample data, based on an earlier question you asked. I suggest reading into dplyr and lubridate as they will help you with your work on manipulating dates.
EDIT: Updated to work with end-of-month dates.
library(lubridate)
library(dplyr)
df <- data.frame(pick_date = c("01-APR-2017 00:51", "02-APR-2017 08:53", "15-APR-2017 16:12", "23-APR-2017 02:04", "30-APR-2017 20:08"), ship_date = c("05-APR-2017 06:55", "09-APR-2017 12:11", "30-APR-2017 13:11", "02-MAY-2017 15:16", "05-MAY-2017 09:57"))
df %>%
mutate(pick_date = dmy_hm(pick_date)) %>%
mutate(ship_date = dmy_hm(ship_date)) %>%
mutate(pick_time = case_when(
hour(pick_date) <= 3 ~ "early",
hour(pick_date) >= 16 ~ "late",
TRUE ~ "okay")
) %>%
mutate(new_pick_time = case_when(
pick_time == "early" ~ hms(hours(3)),
pick_time == "late" ~ hms(hours(3)),
TRUE ~ hms(paste0(hour(pick_date), "H ", minute(pick_date), "M ", second(pick_date), "S")))
) %>%
mutate(temp_pick_date = case_when(
pick_time == "early" ~ pick_date,
pick_time == "late" ~ pick_date + days(1),
TRUE ~ pick_date)
) %>%
mutate(new_pick_date = make_datetime(year(temp_pick_date), month(temp_pick_date), day(temp_pick_date), hour(new_pick_time), minute(new_pick_time), second(new_pick_time))) %>%
select(-new_pick_time, -temp_pick_date)
This returns
pick_date ship_date pick_time new_pick_date
1 2017-04-01 00:51:00 2017-04-05 06:55:00 early 2017-04-01 03:00:00
2 2017-04-02 08:53:00 2017-04-09 12:11:00 okay 2017-04-02 08:53:00
3 2017-04-15 16:12:00 2017-04-30 13:11:00 late 2017-04-16 03:00:00
4 2017-04-23 02:04:00 2017-05-02 15:16:00 early 2017-04-23 03:00:00
5 2017-04-30 20:08:00 2017-05-05 09:57:00 late 2017-05-01 03:00:00
So it sounds like you just need to do two different arithmetic operations, conditional on the hour of a date time?
The simplest way I can think to access the hour component is to store the time in a POSIXlt. I believe the "l" stands or "list", and this lets you treat a timestamp like a list with the different time measurements being accessible attributes accordingly.
Like this:
> time <- as.POSIXlt('2017-07-29 15:12:01')
> time
[1] "2017-07-29 15:12:01 EDT"
> time$hour
[1] 15
So you could write a function that does the operation you desire, and feed it your date column. Hard for me to take it further because I don't quite understand the question, but here's a skeleton:
ComputeDifference <- function(time) {
if (time$hour < 3) {
# code to count orders between 0 and 3 "from same day 3:00"
}
if (time$hour > 16) {
# code to consider late orders
}
}
If you throw in sample data and refine the question, maybe I can take a more thorough crack at this.
I have a large amount of time series data stored in a dataframe called "Tag.data" where one record is taken every 30 seconds over the course of several months. For example:
2013-09-30 23:59:00
2013-09-30 23:59:30
2013-10-01 00:00:00
2013-10-01 00:00:30
2013-10-01 00:01:00
2013-10-01 00:01:30
2013-10-01 00:02:00
...
2013-10-15 05:00:00
2013-10-15 05:00:30
2013-10-15 05:01:00
2013-10-15 05:01:30
2013-10-15 05:02:00
...
This data is stored in Tag.data$dt.
Within my data I would like to identify the 1st and 15th day of each month so that these can be used on a later plot.
I was successfully able to identify the first day of each month with this code:
locs <- tapply (X=Tag.data$dt, FUN=min, INDEX=format(Tag.data$dt, '%Y%m'))
at <- Tag.data$dt %in% locs
at <- at & format(Tag.data$dt, '%m') %in% c('01', '02', '03','04', '05', '06','07', '08', '09','10', '11', '12') & format(Tag.data$dt, '%d') == '01'
Unfortunately I was less successful when I attempted to also identify the 15th day of each month with this code:
locs <- tapply (X=Tag.data$dt, FUN=min, INDEX=format(Tag.data$dt, '%Y%m'))
at <- Tag.data$dt %in% locs
at <- at & format(Tag.data$dt, '%m') %in% c('01', '02', '03','04', '05', '06','07', '08', '09','10', '11', '12') & format(Tag.data$dt, '%d') == '01'|
format(Tag.data$dt, '%m') %in% c('01', '02', '03','04', '05', '06','07', '08', '09','10', '11', '12') & format(Tag.data$dt, '%d') == '15'
While this did identify both the 1st and the 15th days of each month, for some reason it identifies only one record for the 1st day of the month but every record for the 15th day of the month (of which there are a great many). I would like to identify only the first record for both the 1st and 15th days of each month. Any help would be much appreciated.
Judging from your code:
locs <- tapply (X=Tag.data$dt, FUN=min, INDEX=format(Tag.data$dt, '%Y%m'))
I assume Tag.data$dt is stored as one of POSIX classes.
I would like to identify only the first record for both the 1st and 15th days of each month.
Probably slow, but this does the work.
ymd <- format(Tag.data$dt,"%Y%m%d")
index.01.15 <- !duplicated(ymd) & grepl("01$|15$", ymd)
You can use the logical vector to select the rows Tag.data[index.01.15, ]
Try this. It makes use of lubridate. You can select all rows where the day is either 1 or 15.
library(lubridate)
options(stringsAsFactors=FALSE)
Tag.data = structure(list(dt = c("30/09/2013 23:59", "1/10/2013 0:00", "1/10/2013 0:00",
"1/10/2013 0:01", "1/10/2013 0:01", "1/10/2013 0:02", "2/10/2013 0:04",
"15/10/2013 5:00", "15/10/2013 5:00", "15/10/2013 5:01", "15/10/2013 5:01",
"15/10/2013 5:02")), .Names = "dt", class = "data.frame", row.names = c(NA,
-12L))
Tag.data$dt = parse_date_time(Tag.data$dt, '%d/%m/%Y %H%M')
at = Tag.data[day(Tag.data$dt) %in% c(1,15), ]
This is more flexible as you can specify any day you wish to subset on. E.g replace the values in c(1,15) for any day, or month(Tag.data$dt) %in% c(<INSERT MONTH NUMBER>) to subset on month.
It looks like your data are already stored as dates of some sort (e.g., POSIXct). Something like this, but with even more rows?
Tag.data <- data.frame(dt=seq(ISOdate(2013,10,1), by = "30 min", length.out = 10000))
Then if you want just the first record from each 1st or 15th day, this might work:
daychars <- format(Tag.data$dt, '%d')
day1or15 <- daychars %in% c("01","15")
newday <- c(TRUE, (daychars[1:(length(daychars)-1)] != daychars[2:length(daychars)]))
format(Tag.data[day1or15 & newday,"dt"],"%m/%d/%Y %H:%M:%S")
The newday line helpfully does not require that the day begins at any particular time, but it does assume that your time series is ordered.
I suggest you use the excellent xts package for time series data in R.
You didn't provide reproducible data, so i made some of my own.
require(xts)
Tag.data <- xts(rnorm(1e5), order.by = Sys.time() + seq(30, 3e6, 30))
Sub-setting by day of the month is a simple one-liner.
days_1n15 <- Tag.data[.indexmday(Tag.data) %in% c(1, 15)]
This returns all records on the 1st and 15th day of any month.
Now we just need to pull out the first observations on each matching day.
firstOf <- do.call(rbind, lapply(split(days_1n15, 'days'), first))
Which contains the data you want:
R> firstOf
[,1]
2014-02-01 21:29:01 1.284222
2014-02-15 00:00:01 -1.262235
2014-03-01 00:00:01 -0.465001
Here is an example of a subset data in .csv files. There are three columns with no header. The first column represents the date/time and the second column is load [kw] and the third column is 1= weekday, 0 = weekends/ holiday.
9/9/2010 3:00 153.94 1
9/9/2010 3:15 148.46 1
I would like to program in R, so that it selects the first and second column within time ranges from 10:00 to 20:00 for all weekdays (when the third column is 1) within a month of September and do not know what's the best and most efficient way to code.
code dt <- read.csv("file", header = F, sep=",")
#Select a column with weekday designation = 1, weekend or holiday = 0
y <- data.frame(dt[,3])
#Select a column with timestamps and loads
x <- data.frame(dt[,1:2])
t <- data.frame(dt[,1])
#convert timestamps into readable format
s <- strptime("9/1/2010 0:00", format="%m/%d/%Y %H:%M")
e <- strptime("9/30/2010 23:45", format="%m/%d/%Y %H:%M")
range <- seq(s,e, by = "min")
df <- data.frame(range)
OP ask for "best and efficient way to code" this without showing "inefficient code", so #Justin is right.
It's seems that the OP is new to R (and it's officially the summer of love) so I give it a try and I have a solution (not sure about efficiency..)
index <- c("9/9/2010 19:00", "9/9/2010 21:15", "10/9/2010 11:00", "3/10/2010 10:30")
index <- as.POSIXct(index, format = "%d/%m/%Y %H:%M")
set.seed(1)
Data <- data.frame(Date = index, load = rnorm(4, mean = 120, sd = 10), weeks = c(0, 1, 1, 1))
## Data
## Date load weeks
## 1 2010-09-09 19:00:00 113.74 0
## 2 2010-09-09 21:15:00 121.84 1
## 3 2010-09-10 11:00:00 111.64 1
## 4 2010-10-03 10:30:00 135.95 1
cond <- expression(format(Date, "%H:%M") < "20:00" &
format(Date, "%H:%M") > "10:00" &
weeks == 1 &
format(Date, "%m") == "09")
subset(Data, eval(cond))
## Date load weeks
## 3 2010-09-10 11:00:00 111.64 1