r - how to fill in values on stepped data hierarchy

r - how to fill in values on stepped data hierarchy - r

Is there an elegant/tidy way to fill in the data if there are non-null values to the right? I have a wonky work-around but wanted to know if there was a nice dplyr way to do this.
actual <-
tibble(
a = c("A", NA, NA, NA, NA, NA, NA, "B", NA, NA, NA),
b = c(NA, "A", NA, NA, NA, "C", NA, NA, "E", NA, NA),
c = c(NA, NA, "B", NA, NA, NA, "D", NA, NA, "F", "G"),
d = c(NA, NA, NA, "C", "D", NA, NA, NA, NA, NA, NA)
)
desired <-
tibble(
w = c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B"),
x = c(NA, "A", "A", "A", "A", "C", "C", NA, "E", "E", "E"),
y = c(NA, NA, "B", "B", "B", NA, "D", NA, NA, "F", "G"),
z = c(NA, NA, NA, "C", "D", NA, NA, NA, NA, NA, NA)
)

We can use fill from tidyr together with dplyr like the following.
library(dplyr)
library(tidyr)
dat <- actual %>%
fill(a) %>%
group_by(a) %>%
fill(b) %>%
group_by(b) %>%
fill(c) %>%
group_by(c) %>%
fill(d) %>%
ungroup()
print(dat)
# # A tibble: 11 x 4
# a b c d
# <chr> <chr> <chr> <chr>
# 1 A NA NA NA
# 2 A A NA NA
# 3 A A B NA
# 4 A A B C
# 5 A A B D
# 6 A C NA NA
# 7 A C D NA
# 8 B NA NA NA
# 9 B E NA NA
# 10 B E F NA
# 11 B E G NA

Related

How to merge rows with duplicate ID, replacing NAs with data in the other row, and leading with data present in both duplicate rows?

I have a df like this:
data <- tribble(~id, ~othervar, ~it_1, ~it_2, ~it_3, ~it_4, ~it_5, ~it_6,
"k01", "lum", "a", "b", "c", "a", NA, NA,
"k01", "lum", NA, NA, NA, NA, "a", "d",
"k02", "nar", "a", "b", "c", "b", NA, NA,
"k03", "lum", "a", "b", "a", "c", NA, NA,
"k03", "lum", "b", "b", "a", NA, "d", "e")
I want to merge rows with duplicated IDs in only one row where NAs are replaced with the information available in the other row. But where there are no-NA in both rows, the problem is to preserve any one. I´ve tried pivoting the table, but have no resources to deal with this.
i expect somthing like this:
id othervar it_1 it_2 it_3 it_4 it_5 it_6
k01 lum a b c a a d
k02 nar a b c b NA NA
k03 lum a b a c d e

With ifelse and summarise:
library(dplyr)
data %>%
group_by(id) %>%
summarise(across(everything(), ~ ifelse(any(complete.cases(.x)),
first(.x[!is.na(.x)]),
NA)))
# A tibble: 3 × 8
id othervar it_1 it_2 it_3 it_4 it_5 it_6
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 k01 lum a b c a a d
2 k02 nar a b c b NA NA
3 k03 lum a b a c d e

Without ifelse with dplyr functions only:
data %>%
group_by(id) %>%
summarise(across(everything(),
~coalesce(.x) %>%
`[`(!is.na(.)) %>%
`[`(1) ))

Based on common values on one column, assign same values in another column

Absolute newbie to R.
I have a dataframe that has some common values in one column(C1), but only one of the corresponding column has a value(C2), so I want to paste that value to all of the empty/NA spaces in C2 based on same value in C1.
This would make more sense:
df:
C1 C2
A NA
A val10
A NA
B val14
B NA
B NA
B NA
C NA
C val9
What I wanted it to look like is
C1 C2
A val10
A val10
A val10
B val14
B val14
B val14
B val14
C val9
C val9
(C2 and C1 don't have any particular pattern or sequence between each other)
I'm assuming I would do a Group_by for C1, but I'm bit confused how to copy the values. Using transmute/mutate or paste. I tried a few iterations but wasn't successful.

You can use the fill function from tidyr, which makes it really easy to take care of the NAs.
library(tidyr)
library(dplyr)
df %>%
dplyr::group_by(C1) %>%
tidyr::fill(C2) %>% #default direction down
tidyr::fill(C2, .direction = "up")
Output
# A tibble: 9 × 2
# Groups: C1 [3]
C1 C2
<chr> <chr>
1 A val10
2 A val10
3 A val10
4 B val14
5 B val14
6 B val14
7 B val14
8 C val9
9 C val9
Data
df <- structure(list(C1 = c("A", "A", "A", "B", "B", "B", "B", "C",
"C"), C2 = c(NA, "val10", NA, "val14", NA, NA, NA, NA, "val9"
)), class = "data.frame", row.names = c(NA, -9L))

I doubt this is the most elegant solution, but a Tidyverse-style method could be:
df <- tibble::tribble(
~C1, ~C2,
"A", NA,
"A", "val10",
"A", NA,
"B", "val14",
"B", NA,
"B", NA,
"B", NA,
"C", NA,
"C", "val9"
)
df %>%
filter(!is.na(C2)) %>%
rename(C3 = C2) %>%
right_join(df) %>%
select(-C2) %>%
rename(C2 = C3)
Which gives you:

check if subset of rows is NA then move adjacent rows to replace them

I have a dataframe that's a result of combining multiple sheets from excel. The columns did not align properly. I need to check if a subset of rows is all NA. If they are NA, then I need to check if the adjacent equally sized subset has content, and if it does, I need to copy over that row to replace the NAs.
This is what the data looks like from my dput:
structure(list(id = 1:20, A = c(NA, NA, NA, NA, NA, "c", "d",
"q", "p", "m", NA, NA, NA, NA, NA, "k", "o", "i", "a", "b"),
B = c(NA, NA, NA, NA, NA, "h", "a", "f", "b", "e", NA, NA,
NA, NA, NA, "m", "c", "s", "g", "p"), C = c(NA, NA, NA, NA,
NA, "a", "f", "j", "s", "g", NA, NA, NA, NA, NA, "l", "m",
"o", "k", "t"), D = c(NA, NA, NA, NA, NA, "n", "r", "l",
"h", "g", NA, NA, NA, NA, NA, "j", "p", "f", "d", "q"), E = c("j",
"p", "n", "i", "g", NA, NA, NA, NA, NA, "k", "e", "s", "m",
"l", NA, NA, NA, NA, NA), F = c("o", "d", "r", "q", "a",
NA, NA, NA, NA, NA, "h", "s", "f", "j", "k", NA, NA, NA,
NA, NA), G = c("f", "c", "a", "l", "m", NA, NA, NA, NA, NA,
"n", "t", "s", "e", "r", NA, NA, NA, NA, NA), H = c("r",
"c", "h", "i", "j", NA, NA, NA, NA, NA, "f", "e", "b", "l",
"n", NA, NA, NA, NA, NA)), row.names = c(NA, -20L), class = "data.frame")

If you have equal number of non-missing values in each row as shown in the shared example you can drop NA values in each row.
df1 <- as.data.frame(t(apply(df, 1, na.omit)))
# V1 V2 V3 V4 V5
#1 1 j o f r
#2 2 p d c c
#3 3 n r a h
#4 4 i q l i
#5 5 g a m j
#6 6 c h a n
#7 7 d a f r
#8 8 q f j l
#9 9 p b s h
#10 10 m e g g
#11 11 k h n f
#12 12 e s t e
#13 13 s f s b
#14 14 m j e l
#15 15 l k r n
#16 16 k m l j
#17 17 o c m p
#18 18 i s o f
#19 19 a g k d
#20 20 b p t q
To check for 1st half values and if all of them are NA we select second half we can do :
cbind(df[1], t(apply(df[-1], 1, function(x) {
x1 <- (length(x)/2)
if(all(is.na(x[1:x1]))) x[(x1+1):length(x)]
else x[1:x1]
})))

Create date range based on sparse variable by group in R

I have sparse data which has a score taken at periodic intervals and a measurement taken at more regular interval for multiple subjects along with corresponding dates. I would like to generate date ranges based on the score dates for each subject ID ie. starting at the score date and ending at the next score date (or starting/ending at the first/last subject observation if the score doesn't fall on those dates).
I would then like to average the measurement variable within these date ranges. The averaging step should be straightforward but I am stuck on generating the date ranges.
Below is a sample of the data and an example of how I would envision the resulting data
sample data:
structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B",
"B", "B", "C", "C", "C", "D", "D", "D", "D", "D", "D", "D", "D",
"D", "D", "D", "D", "D", "D", "D"), date = c("1/21/2020", "1/27/2020",
"2/1/2020", "2/3/2020", "2/5/2020", "2/6/2020", "2/8/2020", "2/9/2020",
"2/11/2020", "2/12/2020", "2/13/2020", "2/15/2020", "2/18/2020",
"2/20/2020", "2/21/2020", "2/22/2020", "2/25/2020", "2/1/2020",
"2/5/2020", "2/7/2020", "2/8/2020", "2/11/2020", "2/12/2020",
"1/30/2020", "2/10/2020", "2/11/2020", "2/6/2020", "2/7/2020",
"2/8/2020", "2/9/2020", "2/11/2020", "2/13/2020", "2/14/2020",
"2/16/2020", "2/17/2020", "2/20/2020", "2/23/2020", "2/26/2020",
"3/1/2020", "3/3/2020", "3/5/2020"), score = c(0.5, 2, NA, NA,
3, NA, NA, NA, NA, NA, 2.5, NA, NA, 1.5, NA, NA, NA, 3, NA, NA,
2.5, NA, 1, 0.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 14,
NA, NA, 11.5, NA, 9.5, NA), measure = c(0.394160734, 0.722462998,
0.82984815, 0.738432745, 0.321792398, 0.167492308, 0.218020898,
0.929210786, 0.686818585, 0.939678073, 0.708172942, 0.299863884,
0.48216267, 0.290307369, 0.801947902, 0.579418467, 0.78101844,
0.219494852, 0.875129822, 0.517971003, 0.475625007, 0.723003744,
0.257473477, 0.629818537, 0.817369151, 0.628573413, 0.364660834,
0.5971024, 0.002274261, 0.318937617, 0.983917106, 0.685933928,
0.487922831, 0.151769304, 0.392413694, 0.012429414, 0.149627658,
0.011724992, 0.536998203, 0.798399999, 0.763353822)), class = "data.frame", row.names = c(NA,
-41L))
answer data:
structure(list(ID = c("A", "A", "A"), startDate = c("1/21/2020",
"1/27/2020", "2/5/2020"), endDate = c("1/27/2020", "2/5/2020",
"2/13/2020"), score = c(0.5, 2, 3), measure = c(0.394160734,
0.763581298, 0.543835508)), class = "data.frame", row.names = c(NA,
-3L))

Here's a way with dplyr :
library(dplyr)
df %>%
group_by(ID, grp = cumsum(!is.na(score))) %>%
summarise(start_date = first(date),
score = first(score),
measure = mean(measure)) %>%
mutate(end_date = lead(start_date, default = last(start_date))) %>%
select(-grp)
# ID start_date score measure end_date
# <chr> <chr> <dbl> <dbl> <chr>
# 1 A 1/21/2020 0.5 0.394 1/27/2020
# 2 A 1/27/2020 2 0.764 2/5/2020
# 3 A 2/5/2020 3 0.544 2/13/2020
# 4 A 2/13/2020 2.5 0.497 2/20/2020
# 5 A 2/20/2020 1.5 0.613 2/20/2020
# 6 B 2/1/2020 3 0.538 2/8/2020
# 7 B 2/8/2020 2.5 0.599 2/12/2020
# 8 B 2/12/2020 1 0.257 2/12/2020
# 9 C 1/30/2020 0.5 0.692 1/30/2020
#10 D 2/6/2020 NA 0.449 2/17/2020
#11 D 2/17/2020 14 0.185 2/26/2020
#12 D 2/26/2020 11.5 0.274 3/3/2020
#13 D 3/3/2020 9.5 0.781 3/3/2020

Using data.table
library(data.table)
setDT(df)[, .(start_date = first(date),
score = first(score),
measure = mean(measure)),
by = .(ID, grp = cumsum(!is.na(score)))
][, end_date := shift(start_date, type= 'lead', fill = last(start_date))
][, grp := NULL][]

Remove NAs after pivot_wider to match up rows

I spread a column using pivot_wider so I could compare two groups (var1 vs var2) using an xy plot. But I can't compare them because there is a corresponding NA in the column.
Here is an example dataframe:
df <- data.frame(group = c("a", "a", "b", "b", "c", "c"), var1 = c(3, NA, 1, NA, 2, NA),
var2 = c(NA, 2, NA, 4, NA, 8))
I would like it to look like:
df2 <- data.frame(group = c("a", "b", "c"), var1 = c(3, 1, 2),
var2 = c( 2, 4, 8))

You can use summarize. But this treats the symptom not the cause. You may have a column in id_cols which is one-to-one with your variable in values_from.
library(dplyr)
df %>%
group_by(group) %>%
summarize_all(sum, na.rm = T)
# A tibble: 3 x 3
group var1 var2
<fct> <dbl> <dbl>
1 a 3 2
2 b 1 4
3 c 2 8

This solution is a bit more robust, with a slightly more general data.frame to begin with:
df <- data.frame(col_1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B"),
col_2 = c(1, 3, NA, NA, NA, NA, 4, NA, NA),
col_3 = c(NA, NA, 2, 5, NA, NA, NA, 5, NA),
col_4 = c(NA, NA, NA, NA, 5, 6, NA, NA, 7))
df %>% dplyr::group_by(col_1) %>%
dplyr::summarise_all(purrr::discard, is.na)

Here is a way to do it, assuming you only have two rows by group and one row with NA
library(dplyr)
df %>% group_by(group) %>%
summarise(var1=max(var1,na.rm=TRUE),
var2=max(var2,na.rm=TRUE))
The na.rm=TRUE will not count the NAs and get the max on only one value (the one which is not NA)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

r - how to fill in values on stepped data hierarchy - r

Related

How to merge rows with duplicate ID, replacing NAs with data in the other row, and leading with data present in both duplicate rows?

Based on common values on one column, assign same values in another column

check if subset of rows is NA then move adjacent rows to replace them

Create date range based on sparse variable by group in R

Remove NAs after pivot_wider to match up rows

Categories

Resources