I need to duplicate rows with incontinuous dates to fill all the dates in a dataframe.
Suppose this df:
df <- data.frame(date = c("2022-07-05", "2022-07-07", "2022-07-11", "2022-07-15", "2022-07-18"), letter = c("a", "b", "a", "b", "c"))
The desired output is this df_new:
df_new <- data.frame(date = c("2022-07-05", "2022-07-06",
"2022-07-07", "2022-07-08", "2022-07-09", "2022-07-10",
"2022-07-11", "2022-07-12", "2022-07-13", "2022-07-14",
"2022-07-15"),
letter = c("a", "a",
"b", "b", "b", "b",
"a", "a", "a", "a",
"c"))
Could you please help ?
We could use complete from tidyr to expand the data based on the min/max date incremented by '1 day' and then fill the NA elements in 'letter' by the previous non-NA element
library(dplyr)
library(tidyr)
df %>%
mutate(date = as.Date(date)) %>%
complete(date = seq(min(date), max(date), by = '1 day')) %>%
fill(letter)
Related
I have patients' data including their start and end of hospitalization. I need to calculate the total patients number by date and day (8:00 to 17:00) or nighttime (17:00-08:00), meaning that I need to transform my wide, two-timepoint data to long format.
Simulated data:
library(tidyverse)
library(lubridate)
df = tibble(
id = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"),
start = sample(seq(as.POSIXct('2022-01-01'), as.POSIXct('2022-02-02'), by = "sec"), 10),
end = sample(seq(as.POSIXct('2022-02-02'), as.POSIXct('2022-03-03'), by = "sec"), 10))
The result should be something like this. I can use group_by() and summarize() to find necessary patient numbers.
Not the most efficient but maybe good enough?
df %>%
mutate(across(start:end, ~floor_date(., "hour"), .names = "{.col}_rnd")) %>%
group_by(id, start, end) %>%
summarize(day_shifts = seq(start_rnd, end_rnd, "hour"), .groups = "drop") %>%
mutate(date = as_date(day_shifts),
day_night = if_else(hour(day_shifts) %>% between(8, 16), "day", "night")) %>%
distinct(id, date, day_night)
I am trying to generate running number in dplyr using row_number but the results are not as desired. I would like to have say cat A to have 1, 2, 3 in var1 Any leads?
library(dplyr)
df <- tibble(
cat = c("A", "B", "C", "A", "A", "B"),
date = seq.Date(Sys.Date(), Sys.Date() + 5,
by = 1),
age = c(12, 13, 34,23,32,34)
)
df <- df %>%
arrange(cat, date) %>%
group_by(cat, date) %>%
mutate(var1 = row_number())
df
I have a subset data that has a total count for each observation from a bigger dataset. If I want to drop duplicates based on a higher count and drop codes that appear less if the name is the same, how would I go about that? So for instance:
name = c("a", "a", "b", "b", "b", "c", "d", "e", "e", "e")
code = c(1,1,2,3,4,1,1,2,2,3)
n = c(1,10,2,3,5,4,8,100,90,40)
data = data.frame(name,code,n)
The end product would be left with these:
name = c("a", "b", "c", "d", "e")
code = c(1,4,1,1,2)
n = c(10,5,4,8,100)
data2 = data.frame(name,code,n)
If you can use dplyr, this should do the trick:
library(dplyr)
data %>%
group_by(name) %>%
filter(n == max(n)) %>%
ungroup()
After grouping by id I wish to replace the NAs in dist_from_top with sequential values such that dist_from_top becomes c(5,4,3,2,1,5,4,3,2). I am using the one dist_from_top value within each id grouping as a seed of sorts to fill in the values of dist_from_top that are above and below.
tidyr::fill() can fill in the same value throughout the grouping, but I can't think of a way to make it increase and decrease by 1 as it fills. Any help is greatly appreciated.
library(dplyr)
library(tidyr)
df <-
tribble(
~id, ~mgr, ~dist_from_top,
"A", "B", NA,
"A", "C", NA,
"A", "D", 3,
"A", "E", NA,
"A", "F", NA,
"B", "C", NA,
"B", "D", 4,
"B", "E", NA,
"B", "F", NA
)
An "almost there" solution using fill()
df %>%
group_by(id) %>%
fill(dist_from_top, .direction = "up") %>%
fill(dist_from_top, .direction = "down")
Create a column that counts downwards in each group, from any starting point:
... %>% mutate(rn = -row_number())
Add the offset that is defined by the difference between dist_from_top and rn for the one row where dist_from_top is not NA:
... %>% mutate(dist_from_top = rn + max(dist_from_top - rn, na.rm = TRUE))
This uses max() merely to pick one value, assuming there is only one value that isn't NA.
Both mutate() operations operate on groups:
df %>%
group_by(id) %>%
mutate(rn = ...) %>%
mutate(dist_from_top = ...) %>%
ungroup() %>%
select(-rn)
If there is an all-NA group, you'll see a warning.
I have the following problem: I have two dataframes. df1 contains among other variables (which are not shown in the code below) a date-variable. In df2 I have an id (refering to the id in table df1), a factor-variable (type) and another date.
df1 <- data.frame(id=1:5, referenceDate=c("2018-01-20","2018-02-03","2018-05-20", "2018-08-01", "2018-07-31"))
df2 <- data.frame(id=c(1,1,1,2,2,4,4,5,5), type=c("A", "A", "B", "A", "A", "B", "A", "B", "B"), dates=c("2018-01-10", "2018-01-23", "2018-01-24", "2018-05-21", "2018-05-18", "2018-06-01", "2018-09-01", "2018-07-10", "2018-07-20"))
My goal is to create a new column in df1 indicating the number of rows in df2 where (e.g.) df2$type=='A' and df2$dates occures before df1$referenceDate.
In R I have the following solution that gives me the number of rows where df2$type=='A'. But how can I additionally consider the date? I had the idea of first joining the two tables in order to get the referenceDate-Variable from df1 into df2 and then do the counting and join the two tables again in the other direction (in order to get the count variable back into df1). But this does not sound very elegant to me.
library(tidyverse)
reduced <- df2 %>% filter(type=='A') %>% group_by(id) %>% mutate(count=n()) %>% filter(!duplicated(id))
df1 %>% left_join(reduced[, c("id", "count")])
I think this might be what you want:
df1 <- tibble(id = 1:5,
referenceDate = as.Date(c("2018-01-20","2018-02-03","2018-05-20", "2018-08-01", "2018-07-31")))
df2 <- tibble(id = c(1,1,1,2,2,4,4,5,5),
type = c("A", "A", "B", "A", "A", "B", "A", "B", "B"),
dates = as.Date(c("2018-01-10", "2018-01-23", "2018-01-24", "2018-05-21", "2018-05-18", "2018-06-01", "2018-09-01", "2018-07-10", "2018-07-20")))
df1 %>%
left_join(
df2 %>%
left_join(df1, by = 'id') %>%
filter(dates < referenceDate) %>%
group_by(id) %>%
count(type) %>%
ungroup(),
by = 'id'
)
The key is to join df1 to df2 first and then filter based on reference date. That allows you to use filter to keep what you want. Then, use count. Then join back to df1