Divide Results Into Three Groups Based On Condition And Date Check - r

This is one I have been having trouble with for days. I need to take my data and divide results into three groups based on conditions and date check. You can see this in the original data table that I have provided.
Table with original data
Basically, I need to do this by individual. If they fail then they have 7 days to pass. If they fail and pass within 7 days then they go in the Yes category. If they fail and then have another failure within 7 days, they go in the No category. If they have a failed result and nothing after that, then they go in the Refused category.
So, I need to test the row after a Fail for a Pass or Fail or Nothing by individual and then check that it is within 7 days.
Individuals such as Sam, since he did not take another test after the second failure, can be in multiple groups at the same time. Luke on the other hand, passed but it was after the 7 day period so they scored a refused. The final table would then look like this:
enter image description here
I have tried to use if-else statements but I don't know how to check the next row of the same individual and ignore any other rows other than the row that exist, if any, right after Fail per individual.
I don't know if this can be done in R but I appreciate any help I can get on this.
Thank you!

It is not a complete solution, but my suggestion.
Your dataset:
# A tibble: 13 x 4
name result time_1 time_2
<chr> <chr> <date> <date>
1 Joe Fail 2022-03-01 NA
2 Joe Pass NA 2022-03-05
3 Heather Fail 2022-03-21 NA
4 Heather Pass NA 2022-03-26
5 Heather Pass NA 2022-03-27
6 Heather Fail 2022-03-13 NA
7 Heather Pass NA 2022-03-17
8 Sam Fail 2022-03-20 NA
9 Sam Fail 2022-03-21 NA
10 Luke Fail 2022-03-11 NA
11 Luke Pass NA 2022-03-13
12 Luke Fail 2022-03-19 NA
13 Luke Pass NA 2022-03-29
library(lubridate)
library(tidyverse)
df_clean <- df %>%
arrange(name, result, time_1, time_2) %>%
group_by(name, result) %>%
mutate(attempt = 1:n()) %>%
unite(col = "result",
c("result", "attempt"),
sep = "_", remove = TRUE) %>%
unite(col = "time",
c("time_1", "time_2"),
sep = "", remove = TRUE) %>%
mutate(time = time %>% str_remove_all("NA") %>% as.Date()) %>%
ungroup() %>%
spread(key = result, value = time)
"Cleaned dataset":
# A tibble: 4 x 6
name Fail_1 Fail_2 Pass_1 Pass_2 Pass_3
<chr> <date> <date> <date> <date> <date>
1 Heather 2022-03-13 2022-03-21 2022-03-17 2022-03-26 2022-03-27
2 Joe 2022-03-01 NA 2022-03-05 NA NA
3 Luke 2022-03-11 2022-03-19 2022-03-13 2022-03-29 NA
4 Sam 2022-03-20 2022-03-21 NA NA NA
df_clean %>%
mutate(yes = case_when(interval(Fail_1, Pass_1) %>%
as.numeric("days") <= 7 ~ 1,
TRUE ~ 0),
refused = case_when(is.Date(Fail_1) & is.na(Pass_1) ~ 1,
TRUE ~ 0))
# A tibble: 4 x 8
name Fail_1 Fail_2 Pass_1 Pass_2 Pass_3 yes refused
<chr> <date> <date> <date> <date> <date> <dbl> <dbl>
1 Heather 2022-03-13 2022-03-21 2022-03-17 2022-03-26 2022-03-27 1 0
2 Joe 2022-03-01 NA 2022-03-05 NA NA 1 0
3 Luke 2022-03-11 2022-03-19 2022-03-13 2022-03-29 NA 1 0
4 Sam 2022-03-20 2022-03-21 NA NA NA 0 1

Related

How to filter for all instances of X happening only if nothing else is in the data before the associated date

I'm not sure how to word the title better - I have a list of names, dates, and services. I want to find all instances of a specific service occurring only when there were 0 other services BEFORE the date of the specific one.
Example data below.
The desired output would be ONLY returning row 5 because Bruce Wayne had a surgery with 0 services beforehand. John Doe is disqualified because there was a check-up beforehand and Jane Doe is disqualified because there was no surgery.
Extra question - Instead of checking for any occurrence beforehand, how would I check for any occurrence within 6 months instead?
Date <- c("2022-01-01","2022-04-01","2022-05-01","2022-07-01","2022-08-01","2022-08-05")
Name <- c("John Doe","John Doe","John Doe","Jane Doe","Bruce Wayne","Bruce Wayne")
Service <- c("Check-up","Surgery","Follow-up", "Check-up", "Surgery", "Follow-up")
df <- data.frame(Date,Name,Service)
df
Date Name Service
1 2022-01-01 John Doe Check-up
2 2022-04-01 John Doe Surgery
3 2022-05-01 John Doe Follow-up
4 2022-07-01 Jane Doe Check-up
5 2022-08-01 Bruce Wayne Surgery
6 2022-08-05 Bruce Wayne Follow-up
I don't always trust the ordering of the frame,
df %>%
group_by(Name) %>%
filter(Service == "Surgery", Date == min(Date)) %>%
ungroup()
# # A tibble: 1 × 3
# Date Name Service
# <chr> <chr> <chr>
# 1 2022-08-01 Bruce Wayne Surgery
You could filter on surgery and check if it is the first row_number
library(dplyr)
df %>%
group_by(Name) %>%
filter(Service == "Surgery" & row_number() == 1)
#> # A tibble: 1 × 3
#> # Groups: Name [1]
#> Date Name Service
#> <chr> <chr> <chr>
#> 1 2022-08-01 Bruce Wayne Surgery
Created on 2023-01-27 with reprex v2.0.2

Create date of "X" column, when I have age in days at "X" column and birth date column in R

I'm having some trouble finding out how to do a specific thing in R.
In my dataset, I have a column with the date of birth of participants. I also have a column giving me the age in days at which a disease was diagnosed.
What I want to do is to create a new column showing the date of diagnosis. I'm guessing it's a pretty easy thing to do since I have all the information needed, basically it's birth date + X number of days = Date of diagnosis, but I'm unable to figure out how to do it.
All of my searches give me information on the opposite, going from date to age. So if you're able to help me, it would be much appreciated!
library(tidyverse)
library(lubridate)
df <- tibble(
birth = sample(seq("1950-01-01" %>%
as.Date(),
today(), by = "day"), 10, replace = TRUE),
age = sample(3650:15000, 10, replace = TRUE)
)
df %>%
mutate(diagnosis_date = birth %m+% days(age))
#> # A tibble: 10 x 3
#> birth age diagnosis_date
#> <date> <int> <date>
#> 1 1955-01-16 6684 1973-05-05
#> 2 1958-11-03 6322 1976-02-24
#> 3 2007-02-23 4312 2018-12-14
#> 4 2002-07-11 8681 2026-04-17
#> 5 2021-12-28 11892 2054-07-20
#> 6 2017-07-31 3872 2028-03-07
#> 7 1995-06-30 14549 2035-04-30
#> 8 1955-09-02 12633 1990-04-04
#> 9 1958-10-10 4534 1971-03-10
#> 10 1980-12-05 6893 1999-10-20
Created on 2022-06-30 by the reprex package (v2.0.1)

How to look up date value from same table based on a condition in R?

I am not sure if this task is of self join or not. I am basically trying to lookup the latest date for each State.UnionTerritory in below dataframe where the Daily_confirmed cases for each of them were <= half of current Date.
This will help me to get the the doubling time of cases for each State on each date.
library(tidyverse)
library(lubridate)
df_ind <- read_csv("https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/sample_data.csv")
df_ind %>% head()
# output
Date State.UnionTerritory Daily_confirmed
<date> <chr> <dbl>
1 2021-12-23 Haryana 46
2 2021-12-23 Maharashtra 1179
3 2021-12-23 Delhi 118
4 2021-12-22 Haryana 55
5 2021-12-22 Maharashtra 1201
6 2021-12-22 Delhi 125
For example Delhi has 118 Cases on 2021-12-23 and less than or half of this for Delhi is coming as 57 on 2021-12-15 so doubling rate would be 2021-12-23 - 2021-12-15 = 8 days.
so I should get something like:
This should be applied for each State in the data & on all dates.
df_ind <- df_ind %>%
mutate(Daily_confirmed_half = as.integer(Daily_confirmed / 2) )
I am not sure how exactly I can try this to get proper Dates as the Doubling_Date column where Daily_confirmed Cases values met the condition of <= half.
I can group summarize & use first to pull the latest dates but not sure what would be the efficient way of bringing the right dates in another column of this data frame.
I'm not sure if this is your desired output. The approach uses a full_join together with filter to simulate an non-euqal join in dplyr. Then we do some data cleaning and in the last step we need a left_join to our original data, since there are quite a couple of days, where we cannot calculate the doubling date, since its not included in the time series.
library(tidyverse)
library(lubridate)
df_ind %>%
group_by(State.UnionTerritory) %>%
full_join(., ., by = c("State.UnionTerritory")) %>%
filter(Date.x > Date.y,
Daily_confirmed.x > (Daily_confirmed.y * 2)) %>%
group_by(Date.x, State.UnionTerritory) %>%
filter(Date.y == max(Date.y)) %>%
filter(Daily_confirmed.y == max(Daily_confirmed.y)) %>%
rename("Date" = Date.x,
"Daily_confirmed" = Daily_confirmed.x,
"Doubling_Date" = Date.y) %>%
select(- Daily_confirmed.y) %>%
mutate(Day_to_double = Date - Doubling_Date) %>%
left_join(df_ind,
.,
by = c("Date", "State.UnionTerritory", "Daily_confirmed")) %>%
arrange(State.UnionTerritory, desc(Date))
#> # A tibble: 252 x 5
#> Date State.UnionTerritory Daily_confirmed Doubling_Date Day_to_double
#> <date> <chr> <dbl> <date> <drtn>
#> 1 2021-12-23 Delhi 118 2021-12-15 8 days
#> 2 2021-12-22 Delhi 125 2021-12-15 7 days
#> 3 2021-12-21 Delhi 102 2021-12-14 7 days
#> 4 2021-12-20 Delhi 91 2021-12-14 6 days
#> 5 2021-12-19 Delhi 107 2021-12-14 5 days
#> 6 2021-12-18 Delhi 86 2021-12-13 5 days
#> 7 2021-12-17 Delhi 69 2021-12-13 4 days
#> 8 2021-12-16 Delhi 85 2021-12-13 3 days
#> 9 2021-12-15 Delhi 57 2021-11-27 18 days
#> 10 2021-12-14 Delhi 45 2021-11-15 29 days
#> # … with 242 more rows
Created on 2021-12-25 by the reprex package (v0.3.0)

R - Purrr - Apply Function to Tibbles in List

I need some help applying a function to four tibbles individually stored in the same list.
Function:
status_fun <- function(Status,
Escalated,
Created,
Resolved
){
if(Escalated == "Yes"){
return("Escalated")
} else if(Status == "Closed" && (month(Created) == month(Resolved) || Resolved - Created < 5
)
){
return("Closed")
} else {
return("Not Solved")
}
}
I have a list with 4 tibbles inside of different sizes.
I simply want to apply the function above that uses four columns to each tibble, but I'm getting all sorts of errors. I've searched as much as I can and read R4DS and other posts here, but I can't find a solution.
dummy %>%
map(., status_fun)
Error in .f(.x[[i]], ...) :
argument "Escalated" is missing with no default
dummy %>%
map(~ map(., status_fun))
Error in .f(.x[[i]], ...) :
argument "Escalated" is missing with no default
The following returns a list with only one value, which I'm not interest in, I want a list with four tibbles with the same dimensions (rows) as the input
dummy %>%
map(., ~ status_fun(Status = 'Status', Escalated = 'Escalated', Created = 'Created', Resolved = 'Resolved'))
[[1]]
[1] "Not Solved"
[[2]]
[1] "Not Solved"
[[3]]
[1] "Not Solved"
[[4]]
[1] "Not Solved"
The dummy list is the following:
[[1]]
# A tibble: 589 x 5
Created Resolved Status Country Escalated
<date> <date> <chr> <chr> <chr>
1 2020-04-03 2020-04-08 Closed Luxembourg No
2 2020-03-31 NA In Progress France No
3 2020-03-31 NA In Progress France No
4 2020-03-31 NA In Progress Luxembourg No
5 2020-03-31 NA In Progress Luxembourg No
6 2020-03-30 NA In Progress France Yes
7 2020-03-27 NA In Progress Ireland No
8 2020-03-27 2020-04-10 Closed Luxembourg No
9 2020-03-27 NA In Progress Luxembourg No
10 2020-03-27 2020-03-30 Closed Ireland No
# ... with 579 more rows
[[2]]
# A tibble: 316 x 5
Created Resolved Status Country Escalated
<date> <date> <chr> <chr> <chr>
1 2020-04-13 NA Open Luxembourg No
2 2020-04-13 NA Open Spain No
3 2020-04-07 NA Open France No
4 2020-04-03 NA In Progress Luxembourg No
5 2020-03-30 NA Awaiting Information Luxembourg No
6 2020-03-30 NA Awaiting Information France Yes
7 2020-03-30 2020-03-31 Closed France No
8 2020-03-30 NA Awaiting Information France No
9 2020-03-30 NA Awaiting Information Spain No
10 2020-03-30 NA Awaiting Information Sweden No
# ... with 306 more rows
[[3]]
# A tibble: 64 x 5
Created Resolved Status Country Escalated
<date> <date> <chr> <chr> <chr>
1 2020-04-13 NA Open Chile No
2 2020-04-10 NA Open Mexico Yes
3 2020-04-10 NA Awaiting Information Mexico No
4 2020-04-09 NA Open Chile No
5 2020-04-03 2020-04-06 Closed Mexico Yes
6 2020-04-02 2020-04-02 Closed Mexico No
7 2020-04-01 2020-04-01 Closed Mexico No
8 2020-03-31 2020-04-01 Closed Brazil No
9 2020-03-30 2020-03-31 Closed Mexico No
10 2020-03-27 2020-04-06 Closed Mexico No
# ... with 54 more rows
[[4]]
# A tibble: 30 x 5
Created Resolved Status Country Escalated
<date> <date> <chr> <chr> <chr>
1 2020-04-13 NA Open Chile No
2 2020-04-07 NA Open Brazil No
3 2020-03-23 2020-03-25 Closed Chile No
4 2020-03-17 2020-03-18 Closed Chile No
5 2020-03-16 NA Open Mexico No
6 2020-03-11 2020-03-11 Closed Brazil No
7 2020-03-11 2020-03-12 Closed Brazil No
8 2020-03-10 2020-03-10 Closed Brazil No
9 2020-03-09 NA In Progress Brazil No
10 2020-03-02 2020-03-03 Closed Brazil No
# ... with 20 more rows
What am I missing?
I've tried all sorts of pmap, map_2, the instructions here Code not working using map from purrr package in R
and here Apply function to nested loop (purrr package?)
with no success..
Thanks in advance for someone willing to take their time to solve my problem.
> version _
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 0.0
year 2020
month 04
day 24
svn rev 78286
language R
version.string R version 4.0.0 (2020-04-24)
nickname Arbor Day
packageVersion("tidyverse")
[1] ‘1.3.0’
packageVersion("lubridate")
[1] ‘1.7.8’
One issue is that you are passing a single data.frame to a function that expects 4 arguments. To fix that you could change your function to:
new_fx = function (DF) {
Status = DF$Status
Escalated = DF$Escalated
...
}
map(dummy, new_fx)
The next potential issue is your use of if ... else.... Because this was not a reproducible example with expected output, I am assuming you want to add a column with the if ... else... statement. You will want to get rid of the double && and || because they will evaluate to a single logical value.
Along with that, switch to using ifelse or, since you are in tidyverse, you could use case_when() would produce a vector of expected length.
For anyone struggling with mutating columns on several tibbles inside a list object, the below code worked on the problem above:
status_fun <- function(df){
Escalated = df$Escalated
Status = df$Status
Created = df$Created
Resolved = df$Resolved
dplyr::mutate(df,
Status = case_when(
Escalated == "Yes" ~ "Escalated",
(Status == "Closed" &
(month(Created) == month(Resolved) | Resolved - Created < 5)) ~ "Closed",
TRUE ~ "Not Solved"
)
)
}
dummy <- dummy %>% map(., status_fun)

time differences for multiple events for same ID in R

I'm new to Stackoverflow and looked at similar posts but couldn't find a solution that can capture time differences from multiple events from the same ID.
What I've got:
Time<-c('2016-10-04','2016-10-18', '2016-10-04','2016-10-18','2016-10-19','2016-10-28','2016-10-04','2016-10-19','2016-10-21','2016-10-22', '2017-01-02', '2017-03-04')
Value<-c(0,1,0,1,0,0,0,1,0,1,1,0)
StoreID<-c('a','a','b','b','c','c','d','d','a','a','d','c')
Unit<-c(1,1,2,2,5,5,6,6,1,1,6,5)
Helper<-c('a1','a1','b2','b2','c5','c5','d6','d6','a1','a1','d6','c5')
The helper column is the StoreID and Unit combined because I couldn't figure out how to group by both Store ID and the Unit. I want to sort the data to show when the unit was disabled (value =0) and enabled again (value =1).
Ultimately, I'd want:
Store_ID Unit Helper Time(v=0) Time(v=1) Time2(v=0) Time 2(v=1)
a 1 a1 2016-10-04 2016-10-18 2016-10-21 2016-10-22
b 2 b2 2016-10-04 2016-10-18
c 5 c5 2016-10-19 2016-10-28 2017-03-04
d 6 d6 2016-10-04 2017-10-19
Any thoughts?
I'm thinking something in dplyr but am stumped about where to go further.
Create a Header column that combines the Value column and the row number that distinguishes duplicates, then spread to wide format:
Didn't use the helper column, grouped by StoredID and Unit instead.
df <- data.frame(StoreID, Unit, Time, Value)
df %>%
group_by(StoreID, Unit, Value) %>%
mutate(Headers = sprintf('Time %s (v=%s)', row_number(), Value)) %>%
ungroup() %>% select(-Value) %>%
spread(Headers, Time)
# A tibble: 4 x 7
# StoreID Unit `Time 1 (v=0)` `Time 1 (v=1)` `Time 2 (v=0)` `Time 2 (v=1)` `Time 3 (v=0)`
#* <fctr> <dbl> <fctr> <fctr> <fctr> <fctr> <fctr>
#1 a 1 2016-10-04 2016-10-18 2016-10-21 2016-10-22 NA
#2 b 2 2016-10-04 2016-10-18 NA NA NA
#3 c 5 2016-10-19 NA 2016-10-28 NA 2017-03-04
#4 d 6 2016-10-04 2016-10-19 NA 2017-01-02 NA

Resources