dplyr::rowwise and min outputs a single value - r

I have an odd situation where when I use dplyr::rowwise() and min in mutate, it outputs a single value across all rows rather than by row. It works with my other dataframes in the same session, and not sure what the issue is. I have also restarted my Rstudio.
df <- indf
dplyr::rowwise(.) %>%
mutate(test = min(as.Date(date1), as.Date(date2), na.rm = T)
structure(list(id = structure(c("5001", "3002", "2001", "1001",
"6001", "9001"), label = "Subject name or identifier", format.sas = "$"),
date1 = structure(c(NA, 18599, NA, NA, NA, NA), class = "Date"),
date2 = structure(c(18472, 18597, 18638, 18675, 18678, 18696
), class = "Date"), test = structure(c(18472, 18472, 18472,
18472, 18472, 18472), class = "Date")), class = c("rowwise_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
.rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame")))

It could be a result of loading plyr package after dplyr which masked the mutate from dplyr
library(dplyr)
indf %>%
rowwise %>%
plyr::mutate(test = min(date1, date2, na.rm = TRUE))
# A tibble: 6 × 4
# Rowwise:
id date1 date2 test
<chr> <date> <date> <date>
1 5001 NA 2020-07-29 2020-07-29
2 3002 2020-12-03 2020-12-01 2020-07-29
3 2001 NA 2021-01-11 2020-07-29
4 1001 NA 2021-02-17 2020-07-29
5 6001 NA 2021-02-20 2020-07-29
6 9001 NA 2021-03-10 2020-07-29
versus using :: to load the function from dplyr
> indf %>%
rowwise %>%
dplyr::mutate(test = min(date1, date2, na.rm = TRUE))
# A tibble: 6 × 4
# Rowwise:
id date1 date2 test
<chr> <date> <date> <date>
1 5001 NA 2020-07-29 2020-07-29
2 3002 2020-12-03 2020-12-01 2020-12-01
3 2001 NA 2021-01-11 2021-01-11
4 1001 NA 2021-02-17 2021-02-17
5 6001 NA 2021-02-20 2021-02-20
6 9001 NA 2021-03-10 2021-03-10
Note that rowwise is slow, it may be better to use vectorized pmin
indf %>%
ungroup %>%
dplyr::mutate(test = pmin(date1, date2, na.rm = TRUE))
# A tibble: 6 × 4
id date1 date2 test
<chr> <date> <date> <date>
1 5001 NA 2020-07-29 2020-07-29
2 3002 2020-12-03 2020-12-01 2020-12-01
3 2001 NA 2021-01-11 2021-01-11
4 1001 NA 2021-02-17 2021-02-17
5 6001 NA 2021-02-20 2021-02-20
6 9001 NA 2021-03-10 2021-03-10

Related

Sum cumulative time between changes in a single status variable in R

I've been looking for answers and messing around with my code for a couple hours. I have a dataset that looks like the following for a specific ID:
# A tibble: 14 × 3
ID state orderDate
<dbl> <chr> <dttm>
1 4227631 1 2022-03-14 19:00:00
2 4227631 1 2022-03-14 20:00:00
3 4227631 1 2022-03-15 11:00:00
4 4227631 0 2022-03-15 11:00:00
5 4227631 1 2022-03-15 20:00:00
6 4227631 1 2022-03-16 04:00:00
7 4227631 0 2022-03-16 04:00:00
8 4227631 1 2022-03-16 05:00:00
9 4227631 0 2022-03-16 13:00:00
10 4227631 1 2022-03-16 15:00:00
This occurs for hundreds of IDs. For this example, I am using dplyr to group_by ID. I only care when status changes between values, not if it stays the same.
I want to calculate the cumulative time each ID remains in status 1. The instances where status 1 is repeated multiple times before it changes should be ignored. I have been planning to use lubridate and dplyr to perform the analysis.
Tibble I am using for this example:
structure(list(ID = c(4227631, 4227631, 4227631, 4227631, 4227631,
4227631, 4227631, 4227631, 4227631, 4227631), state = c("1",
"1", "1", "0", "1", "1", "0", "1", "0", "1"), orderDate = structure(c(1647284400,
1647288000, 1647342000, 1647342000, 1647374400, 1647403200, 1647403200,
1647406800, 1647435600, 1647442800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
I've tried various solutions such as Cumulative time with reset however I'm having trouble with lag and incorporating it into this specific analysis.
The expected output would maybe look something like this:
And then I would plan to sum all statusOne together to figure out cumulative time spent in this state.
Invite all more elegant solutions or if someone has a link to a prior question.
EDIT
Using solution below I figured it out!
The solution didn't look at the situations where state 0 immediately followed state 1 and we wanted to look at the total time elapsed between these states.
df %>%
group_by(ID) %>%
mutate(max = cumsum(ifelse(orderName == lag(orderName, default = "1"), 0, 1))) %>%
mutate(hours1 = ifelse(max == lag(max) &
orderName=="1", difftime(orderDate, lag(orderDate), units = "h"), NA)) %>%
mutate(hours2 = ifelse(orderName=="0" & lag(orderName)=="1",
difftime(orderDate, lag(orderDate), units = "h"), NA)) %>%
mutate(hours1 = replace_na(hours1, 0),
hours2 = replace_na(hours2, 0)) %>%
mutate(hours = hours1+hours2) %>%
select(-hours1, -hours2) %>%
summarise(total_hours = sum(hours, na.rm = TRUE)) %>%
filter(total_hours!=0)
This is far from elegant, but at least it appears to provide the correct answer:
library(tidyverse)
df <- structure(list(ID = c(4227631, 4227631, 4227631, 4227631, 4227631,
4227631, 4227631, 4227631, 4227631, 4227631),
state = c("1", "1", "1", "0", "1", "1", "0", "1", "0", "1"),
orderDate = structure(c(1647284400, 1647288000, 1647342000,
1647342000, 1647374400, 1647403200,
1647403200, 1647406800, 1647435600,
1647442800),
tzone = "UTC",
class = c("POSIXct", "POSIXt"))),
row.names = c(NA, -10L),
class = c("tbl_df", "tbl", "data.frame"))
df2 <- df %>%
group_by(ID) %>%
mutate(tmp = ifelse(state == lag(state, default = "1"), 0, 1),
max = cumsum(tmp)) %>%
mutate(hours = ifelse(max == lag(max), difftime(orderDate, lag(orderDate), units = "h"), NA)) %>%
select(-tmp)
df3 <- df2 %>%
group_by(max) %>%
summarise(max, statusOne = sum(hours, na.rm = TRUE))
df4 <- left_join(df2, df3, by = "max") %>%
distinct() %>%
select(-c(max, hours)) %>%
mutate(statusOne = ifelse(statusOne != 0 & lag(statusOne, default = 1) == statusOne, 0, statusOne))
df4
#> # A tibble: 10 × 4
#> # Groups: ID [1]
#> ID state orderDate statusOne
#> <dbl> <chr> <dttm> <dbl>
#> 1 4227631 1 2022-03-14 19:00:00 16
#> 2 4227631 1 2022-03-14 20:00:00 0
#> 3 4227631 1 2022-03-15 11:00:00 0
#> 4 4227631 0 2022-03-15 11:00:00 0
#> 5 4227631 1 2022-03-15 20:00:00 8
#> 6 4227631 1 2022-03-16 04:00:00 0
#> 7 4227631 0 2022-03-16 04:00:00 0
#> 8 4227631 1 2022-03-16 05:00:00 0
#> 9 4227631 0 2022-03-16 13:00:00 0
#> 10 4227631 1 2022-03-16 15:00:00 0
Created on 2022-04-04 by the reprex package (v2.0.1)
Edit
It's a lot more straightforward to get the total_hours state=1 for each ID:
df %>%
group_by(ID) %>%
mutate(max = cumsum(ifelse(state == lag(state, default = "1"), 0, 1))) %>%
mutate(hours = ifelse(max == lag(max), difftime(orderDate, lag(orderDate), units = "h"), NA)) %>%
summarise(total_hours = sum(hours, na.rm = TRUE))
#> # A tibble: 1 × 2
#> ID total_hours
#> <dbl> <dbl>
#> 1 4227631 24
Created on 2022-04-04 by the reprex package (v2.0.1)

How can we check if any 2 intervals of a unique ID overlaps?

I have data of patient prescription of oral DM drugs, i.e. DPP4 and SU, and would like to find out if patients had taken the drugs concurrently (i.e. whether there are overlapping intervals for DPP4 and SU within the same patient ID).
Sample data:
ID DRUG START END
1 1 DPP4 2020-01-01 2020-01-20
2 1 DPP4 2020-03-01 2020-04-01
3 1 SU 2020-03-15 2020-04-30
4 2 SU 2020-10-01 2020-10-31
5 2 DPP4 2020-12-01 2020-12-31
In the sample data above,
ID == 1, patient had DPP4 and SU concurrently from 2020-03-15 to 2020-04-01.
ID == 2, patient had consumed both medications at separate intervals.
I thought of splitting the data into 2, one for DPP4 and another for SU. Then, do a full join, and compare each DPP4 interval with each SU interval. This may be okay for small data, but if a patient has like 5 rows for DPP4 and another 5 for SU, we will have 25 comparisons, which may not be efficient. Add that with 10000+ patients.
I am not sure how to do it.
New data:
Hope to have a new df that looks like this. Or anything that is tidy.
ID DRUG START END
1 1 DPP4-SU 2020-03-15 2020-04-01
2 2 <NA> <NA> <NA>
Data Code:
df <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L), DRUG = c("DPP4", "DPP4",
"SU", "SU", "DPP4"), START = structure(c(18262, 18322, 18336,
18536, 18597), class = "Date"), END = structure(c(18281, 18353,
18382, 18566, 18627), class = "Date")), class = "data.frame", row.names = c(NA,
-5L))
df_new <- structure(list(ID = 1:2, DRUG = c("DPP4-SU", NA), START = structure(c(18336,
NA), class = "Date"), END = structure(c(18353, NA), class = "Date")), class = "data.frame", row.names = c(NA,
-2L))
Edit:
I think from the sample data I gave, it may seem that there can only be 1 intersecting interval. But there may be more. So, I think this would be better data to illustrate.
structure(list(ID = c(3, 3, 3, 3, 3, 3, 3), DRUG = c("DPP4",
"DPP4", "SU", "SU", "DPP4", "DPP4", "DPP4"), START = structure(c(17004,
17383, 17383, 17418, 17437, 17649, 17676), class = c("IDate",
"Date")), END = structure(c(17039, 17405, 17405, 17521, 17625,
17669, 17711), class = c("IDate", "Date")), duration = c(35L,
22L, 22L, 103L, 188L, 20L, 35L), INDEX = c(1L, 0L, 0L, 0L, 0L,
0L, 0L)), row.names = c(NA, -7L), class = c("tbl_df", "tbl",
"data.frame"))
It's way more complicated than dear #AnoushiravanR's but as an alternative you could try
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
full_join(x = ., y = ., by = "ID") %>%
# filter(DRUG.x != DRUG.y | START.x != START.y | END.x != END.y) %>%
filter(DRUG.x != DRUG.y) %>%
group_by(ID, intersection = intersect(interval(START.x, END.x), interval(START.y, END.y))) %>%
drop_na(intersection) %>%
filter(START.x == first(START.x)) %>%
summarise(DRUG = paste(DRUG.x, DRUG.y, sep = "-"),
START = as_date(int_start(intersection)),
END = as_date(int_end(intersection)),
.groups = "drop") %>%
select(-intersection)
returning
# A tibble: 1 x 4
ID DRUG START END
<int> <chr> <date> <date>
1 1 DPP4-SU 2020-03-15 2020-04-01
Edit: Changed the filter condition. The former one was flawed.
Updated Solution
I have made considerable modifications based on the newly provided data set. This time I first created interval for each START and END pair and extract the intersecting period between them. As dear Martin nicely made use of them we could use lubridate::int_start and lubridate::int_end to extract the START and END date of each interval:
library(dplyr)
library(lubridate)
library(purrr)
library(tidyr)
df %>%
group_by(ID) %>%
arrange(START, END) %>%
mutate(int = interval(START, END),
is_over = c(NA, map2(int[-n()], int[-1],
~ intersect(.x, .y)))) %>%
unnest(cols = c(is_over)) %>%
select(-int) %>%
filter(!is.na(is_over) | !is.na(lead(is_over))) %>%
select(!c(START, END)) %>%
mutate(grp = cumsum(is.na(is_over))) %>%
group_by(grp) %>%
summarise(ID = first(ID),
DRUG = paste0(DRUG, collapse = "-"),
is_over = na.omit(is_over)) %>%
mutate(START = int_start(is_over),
END = int_end(is_over)) %>%
select(!is_over)
# A tibble: 1 x 5
grp ID DRUG START END
<int> <int> <chr> <dttm> <dttm>
1 1 1 DPP4-SU 2020-03-15 00:00:00 2020-04-01 00:00:00
Second data set:
# A tibble: 2 x 5
grp ID DRUG START END
<int> <dbl> <chr> <dttm> <dttm>
1 1 3 DPP4-SU 2017-08-05 00:00:00 2017-08-27 00:00:00
2 2 3 SU-DPP4 2017-09-28 00:00:00 2017-12-21 00:00:00
Update
As per updated df
df <- structure(list(ID = c(3, 3, 3, 3, 3, 3, 3), DRUG = c(
"DPP4",
"DPP4", "SU", "SU", "DPP4", "DPP4", "DPP4"
), START = structure(c(
17004,
17383, 17383, 17418, 17437, 17649, 17676
), class = c(
"IDate",
"Date"
)), END = structure(c(
17039, 17405, 17405, 17521, 17625,
17669, 17711
), class = c("IDate", "Date")), duration = c(
35L,
22L, 22L, 103L, 188L, 20L, 35L
), INDEX = c(
1L, 0L, 0L, 0L, 0L,
0L, 0L
)), row.names = c(NA, -7L), class = c(
"tbl_df", "tbl",
"data.frame"
))
we obtain
> dfnew
ID DRUG start end
3.3 3 DPP4-SU 2017-08-05 2017-08-27
3.7 3 SU-DPP4 2017-09-28 2017-12-21
A base R option (not as fancy as the answers by #Anoushiravan R or #Martin Gal)
f <- function(d) {
d <- d[with(d, order(START, END)), ]
idx <- subset(
data.frame(which((u <- with(d, outer(START, END, `<`))) & t(u), arr.ind = TRUE)),
row > col
)
if (nrow(idx) == 0) {
return(data.frame(ID = unique(d$ID), DRUG = NA, start = NA, end = NA))
}
with(
d,
do.call(rbind,
apply(
idx,
1,
FUN = function(v) {
data.frame(
ID = ID[v["row"]],
DRUG = paste0(DRUG[sort(unlist(v))], collapse = "-"),
start = START[v["row"]],
end = END[v["col"]]
)
}
))
)
}
dfnew <- do.call(rbind, Map(f, split(df, ~ID)))
gives
> dfnew
ID DRUG start end
1 1 DPP4-SU 2020-03-15 2020-04-01
2 2 <NA> <NA> <NA>
You may use a slightly different approach from the above answers, but this will give you results in format different than required. Obviously, these can be joined to get expected results. You may try this
df <- structure(list(ID = c(3, 3, 3, 3, 3, 3, 3), DRUG = c("DPP4", "DPP4", "SU", "SU", "DPP4", "DPP4", "DPP4"), START = structure(c(17004, 17383, 17383, 17418, 17437, 17649, 17676), class = c("IDate", "Date")), END = structure(c(17039, 17405, 17405, 17521, 17625, 17669, 17711), class = c("IDate", "Date"))), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))
df
#> # A tibble: 7 x 4
#> ID DRUG START END
#> <dbl> <chr> <date> <date>
#> 1 3 DPP4 2016-07-22 2016-08-26
#> 2 3 DPP4 2017-08-05 2017-08-27
#> 3 3 SU 2017-08-05 2017-08-27
#> 4 3 SU 2017-09-09 2017-12-21
#> 5 3 DPP4 2017-09-28 2018-04-04
#> 6 3 DPP4 2018-04-28 2018-05-18
#> 7 3 DPP4 2018-05-25 2018-06-29
library(tidyverse)
df %>%
mutate(treatment_id = row_number()) %>%
pivot_longer(c(START, END), names_to = 'event', values_to = 'dates') %>%
mutate(event = factor(event, levels = c('END', 'START'), ordered = TRUE)) %>%
group_by(ID) %>%
arrange(dates, event, .by_group = TRUE) %>%
mutate(overlap = cumsum(ifelse(event == 'START', 1, -1))) %>%
filter((overlap > 1 & event == 'START') | (overlap > 0 & event == 'END'))
#> # A tibble: 4 x 6
#> # Groups: ID [1]
#> ID DRUG treatment_id event dates overlap
#> <dbl> <chr> <int> <ord> <date> <dbl>
#> 1 3 SU 3 START 2017-08-05 2
#> 2 3 DPP4 2 END 2017-08-27 1
#> 3 3 DPP4 5 START 2017-09-28 2
#> 4 3 SU 4 END 2017-12-21 1
on originally provided data
# A tibble: 2 x 6
# Groups: ID [1]
ID DRUG treatment_id event dates overlap
<int> <chr> <int> <ord> <date> <dbl>
1 1 SU 3 START 2020-03-15 2
2 1 DPP4 2 END 2020-04-01 1
For transforming/getting results in original shape, you may filter overlapping rows
library(tidyverse)
df_new <- structure(list(ID = c(3, 3, 3, 3, 3, 3, 3), DRUG = c("DPP4", "DPP4", "SU", "SU", "DPP4", "DPP4", "DPP4"), START = structure(c(17004, 17383, 17383, 17418, 17437, 17649, 17676), class = c("IDate", "Date")), END = structure(c(17039, 17405, 17405, 17521, 17625, 17669, 17711), class = c("IDate", "Date"))), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))
df_new %>%
mutate(treatment_id = row_number()) %>%
pivot_longer(c(START, END), names_to = 'event', values_to = 'dates') %>%
mutate(event = factor(event, levels = c('END', 'START'), ordered = TRUE)) %>%
group_by(ID) %>%
arrange(dates, event, .by_group = TRUE) %>%
mutate(overlap = cumsum(ifelse(event == 'START', 1, -1))) %>%
filter((overlap > 1 & event == 'START') | (overlap > 0 & event == 'END')) %>%
left_join(df_new %>% mutate(treatment_id = row_number()), by = c('ID', 'DRUG', 'treatment_id'))
#> # A tibble: 4 x 8
#> # Groups: ID [1]
#> ID DRUG treatment_id event dates overlap START END
#> <dbl> <chr> <int> <ord> <date> <dbl> <date> <date>
#> 1 3 SU 3 START 2017-08-05 2 2017-08-05 2017-08-27
#> 2 3 DPP4 2 END 2017-08-27 1 2017-08-05 2017-08-27
#> 3 3 DPP4 5 START 2017-09-28 2 2017-09-28 2018-04-04
#> 4 3 SU 4 END 2017-12-21 1 2017-09-09 2017-12-21
Created on 2021-08-10 by the reprex package (v2.0.0)

Check if dates are within a time frame r

I have two datasets, one with values at specific time points for different IDs and another one with several time frames for the IDs. Now I want to check if the timepoint in dataframe one is within any of the time frames from dataset 2 matching the ID.
For example:
df1:
ID date time
1 2020-04-14 11:00:00
1 2020-04-14 18:00:00
1 2020-04-15 10:00:00
1 2020-04-15 20:00:00
1 2020-04-16 11:00:00
1 ...
2 ...
df2:
ID start end
1 2020-04-14 16:00:00 2020-04-14 20:00:00
1 2020-04-15 18:00:00 2020-04-16 13:00:00
2 ...
2
what I want
df1_new:
ID date time mark
1 2020-04-14 11:00:00 0
1 2020-04-14 18:00:00 1
1 2020-04-15 10:00:00 0
1 2020-04-15 20:00:00 1
1 2020-04-16 11:00:00 1
1 ...
2 ...
Any help would be appreciated!
An option could be:
library(tidyverse)
library(lubridate)
#> date, intersect, setdiff, union
df_1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L), date = c("14.04.2020",
"14.04.2020", "15.04.2020", "15.04.2020", "16.04.2020"), time = c("11:00:00",
"18:00:00", "10:00:00", "20:00:00", "11:00:00"), date_time = structure(c(1586862000,
1586887200, 1586944800, 1586980800, 1587034800), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA,
-5L))
df_2 <- structure(list(ID = c(1L, 1L), start = c("14.04.2020 16:00",
"15.04.2020 18:00"), end = c("14.04.2020 20:00", "16.04.2020 13:00"
)), class = "data.frame", row.names = c(NA, -2L))
df_22 <- df_2 %>%
mutate(across(c("start", "end"), dmy_hm)) %>%
group_nest(ID)
left_join(x = df_1, y = df_22, by = "ID") %>%
as_tibble() %>%
mutate(mark = map2_dbl(date_time, data, ~+any(.x %within% interval(.y$start, .y$end)))) %>%
select(-data)
#> # A tibble: 5 x 5
#> ID date time date_time mark
#> <int> <chr> <chr> <dttm> <dbl>
#> 1 1 14.04.2020 11:00:00 2020-04-14 11:00:00 0
#> 2 1 14.04.2020 18:00:00 2020-04-14 18:00:00 1
#> 3 1 15.04.2020 10:00:00 2020-04-15 10:00:00 0
#> 4 1 15.04.2020 20:00:00 2020-04-15 20:00:00 1
#> 5 1 16.04.2020 11:00:00 2020-04-16 11:00:00 1
Created on 2021-05-25 by the reprex package (v2.0.0)

How do I check whether dates are in chronological order in R?

I have a large data set with 4 date columns (let's say Date_1, Date_2, Date_3, Date_4). I would like to check whether Date_1 occurs before Date_2, Date_2 before Date_3, and Date_3 before Date_4. How would I do this? I've thought of doing a nested if statement but haven't had much luck.
Assuming your data is similar to this :
df <- structure(list(id = 1:4, Date_1 = structure(c(18534, 18544, 18536,
18547), class = "Date"), Date_2 = structure(c(18533, 18539, 18540,
18545), class = "Date"), Date_3 = structure(c(18532, 18535, 18543,
18541), class = "Date"), Date_4 = structure(c(18537, 18550, 18545,
18537), class = "Date")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
df
# A tibble: 4 x 5
# id Date_1 Date_2 Date_3 Date_4
# <int> <date> <date> <date> <date>
#1 1 2020-09-29 2020-09-28 2020-09-27 2020-10-02
#2 2 2020-10-09 2020-10-04 2020-09-30 2020-10-15
#3 3 2020-10-01 2020-10-05 2020-10-08 2020-10-10
#4 4 2020-10-12 2020-10-10 2020-10-06 2020-10-02
You can use rowwise with diff to check if all the date values occur before the next one.
library(dplyr)
df %>%
rowwise() %>%
mutate(check = all(diff(c_across(contains('date'))) > 0))
# id Date_1 Date_2 Date_3 Date_4 check
# <int> <date> <date> <date> <date> <lgl>
#1 1 2020-09-29 2020-09-28 2020-09-27 2020-10-02 FALSE
#2 2 2020-10-09 2020-10-04 2020-09-30 2020-10-15 FALSE
#3 3 2020-10-01 2020-10-05 2020-10-08 2020-10-10 TRUE
#4 4 2020-10-12 2020-10-10 2020-10-06 2020-10-02 FALSE
In base R, you can do this with apply :
cols <- grep('Date', names(df))
df$check <- apply(df[cols], 1, function(x) all(diff(as.Date(x)) > 0))

How do you change dates in a data frame?

In a data frame that I've called into R, I'm trying to change the dates listed to a different date. For example, I want 2020-06-04 to become 2020-06-03.
Below is code that I've tried to write in order to do this, but haven't succeeded.
I also did this to the data frame prior:
AbsoluteCover$Date <- as.Date(AbsoluteCover$Date,
format = "%m/%d/%y")
1:
AC <- mutate(AbsoluteCover, NewDate = c("2020-06-04" == "2020-06-03" & "2020-06-19" == "2020-06-18" & "2020-07-12" == "2020-07-28"))
This just creates a new column called "NewDate" but with all FALSE in the cells. This outcome makes sense, but it's not what I want.
2:
AC <- AbsoluteCover %>% mutate(Date, "2020-06-04" == "2020-06-03" & "2020-06-19" == "2020-06-18" & "2020-07-12" == "2020-07-28")
This does the same thing as 1 above.
3:
AC <- replace(AbsoluteCover$Date, c("2020-06-04", "2020-06-19", "2020-07-12"), c("2020-06-03", "2020-06-18", "2020-07-28"))
This just returns a data frame with one column with dates.
Here is an example of my data frame:
dput(head(AbsoluteCover))
structure(list(Plot = c("A1", "A1", "A1", "A2", "A2", "A2"),
Date = structure(c(18417, 18432, 18455, 18417, 18432, 18455
), class = "Date"), Cover = c(12L, 34L, 17L, 2L, 50L, 3L)), row.names = c(NA,
-6L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), groups = structure(list(
Plot = c("A1", "A2"), .rows = list(1:3, 4:6)), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))
As you haven't provided a sample dataframe, I have worked out the example using a test dataset.
You can use the which function to select the rows based on condition
dates = c(as.Date('2020-06-04'), as.Date('2020-01-03'))
df = data.frame('a' = sample(dates, 15, replace= TRUE))
df
#> a
#> 1 2020-01-03
#> 2 2020-06-04
#> 3 2020-06-04
#> 4 2020-01-03
#> 5 2020-06-04
#> 6 2020-06-04
#> 7 2020-01-03
#> 8 2020-06-04
#> 9 2020-01-03
#> 10 2020-01-03
#> 11 2020-01-03
#> 12 2020-01-03
#> 13 2020-01-03
#> 14 2020-06-04
#> 15 2020-06-04
df[which(df$a == as.Date('2020-06-04')), 'a'] = as.Date('2020-06-03')
df
#> a
#> 1 2020-01-03
#> 2 2020-06-03
#> 3 2020-06-03
#> 4 2020-01-03
#> 5 2020-06-03
#> 6 2020-06-03
#> 7 2020-01-03
#> 8 2020-06-03
#> 9 2020-01-03
#> 10 2020-01-03
#> 11 2020-01-03
#> 12 2020-01-03
#> 13 2020-01-03
#> 14 2020-06-03
#> 15 2020-06-03
Created on 2020-07-09 by the reprex package (v0.3.0)
You can use mutateand case_when:
library(dplyr)
df %>% mutate(Date = case_when(
Date == "2020-06-04" ~ "2020-06-03",
Date == "2020-06-19" ~ "2020-06-18",
Date == "2020-07-12" ~ "2020-07-28"))
# A tibble: 6 x 3
# Groups: Plot [2]
Plot Date Cover
<chr> <chr> <int>
1 A1 2020-06-03 12
2 A1 2020-06-18 34
3 A1 2020-07-28 17
4 A2 2020-06-03 2
5 A2 2020-06-18 50
6 A2 2020-07-28 3

Resources