How to duplicate rows with incontinuous dates in R

How to duplicate rows with incontinuous dates in R - r

I need to duplicate rows with incontinuous dates to fill all the dates in a dataframe.
Suppose this df:
df <- data.frame(date = c("2022-07-05", "2022-07-07", "2022-07-11", "2022-07-15", "2022-07-18"), letter = c("a", "b", "a", "b", "c"))
The desired output is this df_new:
df_new <- data.frame(date = c("2022-07-05", "2022-07-06",
"2022-07-07", "2022-07-08", "2022-07-09", "2022-07-10",
"2022-07-11", "2022-07-12", "2022-07-13", "2022-07-14",
"2022-07-15"),
letter = c("a", "a",
"b", "b", "b", "b",
"a", "a", "a", "a",
"c"))
Could you please help ?

We could use complete from tidyr to expand the data based on the min/max date incremented by '1 day' and then fill the NA elements in 'letter' by the previous non-NA element
library(dplyr)
library(tidyr)
df %>%
mutate(date = as.Date(date)) %>%
complete(date = seq(min(date), max(date), by = '1 day')) %>%
fill(letter)

Related

Transforming wide two-timepoint data to long format by date and day/night time

I have patients' data including their start and end of hospitalization. I need to calculate the total patients number by date and day (8:00 to 17:00) or nighttime (17:00-08:00), meaning that I need to transform my wide, two-timepoint data to long format.
Simulated data:
library(tidyverse)
library(lubridate)
df = tibble(
id = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"),
start = sample(seq(as.POSIXct('2022-01-01'), as.POSIXct('2022-02-02'), by = "sec"), 10),
end = sample(seq(as.POSIXct('2022-02-02'), as.POSIXct('2022-03-03'), by = "sec"), 10))
The result should be something like this. I can use group_by() and summarize() to find necessary patient numbers.

Not the most efficient but maybe good enough?
df %>%
mutate(across(start:end, ~floor_date(., "hour"), .names = "{.col}_rnd")) %>%
group_by(id, start, end) %>%
summarize(day_shifts = seq(start_rnd, end_rnd, "hour"), .groups = "drop") %>%
mutate(date = as_date(day_shifts),
day_night = if_else(hour(day_shifts) %>% between(8, 16), "day", "night")) %>%
distinct(id, date, day_night)

Is there a way to generate a running within groups in?

I am trying to generate running number in dplyr using row_number but the results are not as desired. I would like to have say cat A to have 1, 2, 3 in var1 Any leads?
library(dplyr)
df <- tibble(
cat = c("A", "B", "C", "A", "A", "B"),
date = seq.Date(Sys.Date(), Sys.Date() + 5,
by = 1),
age = c(12, 13, 34,23,32,34)
)
df <- df %>%
arrange(cat, date) %>%
group_by(cat, date) %>%
mutate(var1 = row_number())
df

How to drop observations based on conditions

I have a subset data that has a total count for each observation from a bigger dataset. If I want to drop duplicates based on a higher count and drop codes that appear less if the name is the same, how would I go about that? So for instance:
name = c("a", "a", "b", "b", "b", "c", "d", "e", "e", "e")
code = c(1,1,2,3,4,1,1,2,2,3)
n = c(1,10,2,3,5,4,8,100,90,40)
data = data.frame(name,code,n)
The end product would be left with these:
name = c("a", "b", "c", "d", "e")
code = c(1,4,1,1,2)
n = c(10,5,4,8,100)
data2 = data.frame(name,code,n)

If you can use dplyr, this should do the trick:
library(dplyr)
data %>%
group_by(name) %>%
filter(n == max(n)) %>%
ungroup()

tidyr::fill() with sequential integers rather than a repeated value

After grouping by id I wish to replace the NAs in dist_from_top with sequential values such that dist_from_top becomes c(5,4,3,2,1,5,4,3,2). I am using the one dist_from_top value within each id grouping as a seed of sorts to fill in the values of dist_from_top that are above and below.
tidyr::fill() can fill in the same value throughout the grouping, but I can't think of a way to make it increase and decrease by 1 as it fills. Any help is greatly appreciated.
library(dplyr)
library(tidyr)
df <-
tribble(
~id, ~mgr, ~dist_from_top,
"A", "B", NA,
"A", "C", NA,
"A", "D", 3,
"A", "E", NA,
"A", "F", NA,
"B", "C", NA,
"B", "D", 4,
"B", "E", NA,
"B", "F", NA
)
An "almost there" solution using fill()
df %>%
group_by(id) %>%
fill(dist_from_top, .direction = "up") %>%
fill(dist_from_top, .direction = "down")

Create a column that counts downwards in each group, from any starting point:
... %>% mutate(rn = -row_number())
Add the offset that is defined by the difference between dist_from_top and rn for the one row where dist_from_top is not NA:
... %>% mutate(dist_from_top = rn + max(dist_from_top - rn, na.rm = TRUE))
This uses max() merely to pick one value, assuming there is only one value that isn't NA.
Both mutate() operations operate on groups:
df %>%
group_by(id) %>%
mutate(rn = ...) %>%
mutate(dist_from_top = ...) %>%
ungroup() %>%
select(-rn)
If there is an all-NA group, you'll see a warning.

Count number of entries in another dataframe given a certain condition

I have the following problem: I have two dataframes. df1 contains among other variables (which are not shown in the code below) a date-variable. In df2 I have an id (refering to the id in table df1), a factor-variable (type) and another date.
df1 <- data.frame(id=1:5, referenceDate=c("2018-01-20","2018-02-03","2018-05-20", "2018-08-01", "2018-07-31"))
df2 <- data.frame(id=c(1,1,1,2,2,4,4,5,5), type=c("A", "A", "B", "A", "A", "B", "A", "B", "B"), dates=c("2018-01-10", "2018-01-23", "2018-01-24", "2018-05-21", "2018-05-18", "2018-06-01", "2018-09-01", "2018-07-10", "2018-07-20"))
My goal is to create a new column in df1 indicating the number of rows in df2 where (e.g.) df2$type=='A' and df2$dates occures before df1$referenceDate.
In R I have the following solution that gives me the number of rows where df2$type=='A'. But how can I additionally consider the date? I had the idea of first joining the two tables in order to get the referenceDate-Variable from df1 into df2 and then do the counting and join the two tables again in the other direction (in order to get the count variable back into df1). But this does not sound very elegant to me.
library(tidyverse)
reduced <- df2 %>% filter(type=='A') %>% group_by(id) %>% mutate(count=n()) %>% filter(!duplicated(id))
df1 %>% left_join(reduced[, c("id", "count")])

I think this might be what you want:
df1 <- tibble(id = 1:5,
referenceDate = as.Date(c("2018-01-20","2018-02-03","2018-05-20", "2018-08-01", "2018-07-31")))
df2 <- tibble(id = c(1,1,1,2,2,4,4,5,5),
type = c("A", "A", "B", "A", "A", "B", "A", "B", "B"),
dates = as.Date(c("2018-01-10", "2018-01-23", "2018-01-24", "2018-05-21", "2018-05-18", "2018-06-01", "2018-09-01", "2018-07-10", "2018-07-20")))
df1 %>%
left_join(
df2 %>%
left_join(df1, by = 'id') %>%
filter(dates < referenceDate) %>%
group_by(id) %>%
count(type) %>%
ungroup(),
by = 'id'
)
The key is to join df1 to df2 first and then filter based on reference date. That allows you to use filter to keep what you want. Then, use count. Then join back to df1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to duplicate rows with incontinuous dates in R - r

Related

Transforming wide two-timepoint data to long format by date and day/night time

Is there a way to generate a running within groups in?

How to drop observations based on conditions

tidyr::fill() with sequential integers rather than a repeated value

Count number of entries in another dataframe given a certain condition

Categories

Resources