rowMeans() grouping by variable [duplicate] - r

This question already has answers here:
Calculate the mean by group
(9 answers)
Closed 12 months ago.
this is probably trivial, but my data looks like this:
t <- structure(list(var = 1:5, ID = c(1, 2, 1, 1, 3)), class = "data.frame", row.names = c(NA,
-5L))
> t
var ID
1 1 1
2 2 2
3 3 1
4 4 1
5 5 3
I would like to get a mean value for each ID, so my idea was to transform them into this (variable names are not important):
f <- structure(list(ID = 1:3, var.1 = c(1, 2, 5), var.2 = c(2, NA,
NA), var.3 = c(3, NA, NA)), class = "data.frame", row.names = c(NA,
-3L))
> f
ID var.1 var.2 var.3
1 1 1 2 3
2 2 2 NA NA
3 3 5 NA NA
so that I could then calculate the mean for each var.x.
I know it's possible with tidyr (possibly pivot_wider?), but I can't figure out how to group it. How do I get a mean value for each ID?
Thank you in advance

You could use ave to get the mean of var for each ID:
t$mean = ave(t$var, t$ID, FUN = mean)
Result:
var ID mean
1 1 1 2.666667
2 2 2 2.000000
3 3 1 2.666667
4 4 1 2.666667
5 5 3 5.000000
If you want a simple table with the means, you could use aggregate:
aggregate(formula = var~ID, data = t, FUN = mean)
ID var
1 1 2.666667
2 2 2.000000
3 3 5.000000

If you want to use rowMeans on your t dataframe, then we can first use pivot_wider, then get the mean of the row.
library(tidyverse)
t %>%
group_by(ID) %>%
mutate(row = row_number()) %>%
ungroup %>%
pivot_wider(names_from = row, values_from = var, names_prefix = "var.") %>%
mutate(mean = rowMeans(select(., starts_with("var")), na.rm = TRUE))
# ID var.1 var.2 var.3 mean
# <dbl> <int> <int> <int> <dbl>
# 1 1 1 3 4 2.67
# 2 2 2 NA NA 2
# 3 3 5 NA NA 5
Or since t is in long form, then we can just group by ID, then get the mean for all values in that group.
t %>%
group_by(ID) %>%
summarise(mean = mean(var))
# ID mean
# <dbl> <dbl>
#1 1 2.67
#2 2 2
#3 3 5
Or for f, we can use rowMeans for each row that will include any column that starts with var.
f %>%
mutate(mean = rowMeans(select(., starts_with("var")), na.rm = TRUE))
# ID var.1 var.2 var.3 mean
#1 1 1 2 3 2
#2 2 2 NA NA 2
#3 3 5 NA NA 5

Related

summarise by group returns 0 instead of NA if all values are NA

library(dplyr)
dat <-
data.frame(id = rep(c(1,2,3,4), each = 3),
value = c(NA, NA, NA, 0, 1, 2, 0, 1, NA, 1, 2,3))
dat %>%
dplyr::group_by(id) %>%
dplyr::summarise(value_sum = sum(value, na.rm = T))
# A tibble: 4 x 2
id value_sum
1 0
2 3
3 1
4 6
Is there any way I can return NA if all the entries in a group are NA. For e.g. id 1 has all the entries as NA so I want the value_sum to be NA as well.
# A tibble: 4 x 2
id value_sum
1 NA
2 3
3 1
4 6
One way is to use an if/else statement: If all is Na return NA else return sum():
dat %>%
dplyr::group_by(id) %>%
#dplyr::summarise(value_sum = sum(value, na.rm = F)) %>%
summarise(number = if(all(is.na(value))) NA_real_ else sum(value, na.rm = TRUE))
id number
<dbl> <dbl>
1 1 NA
2 2 3
3 3 1
4 4 6
We could use fsum
library(collapse)
fsum(dat$value, g = dat$id)
1 2 3 4
NA 3 1 6
Or with dplyr
library(dplyr)
dat %>%
group_by(id) %>%
summarise(number = fsum(value))
# A tibble: 4 × 2
id number
<dbl> <dbl>
1 1 NA
2 2 3
3 3 1
4 4 6

R dplyr::c_across() strange behaviour in rowSums

I'm trying to see how to apply rowSums() to specific columns only.
here is a reprex:
df <- tibble(
"ride" = c("bicycle", "motorcycle", "car", "other"),
"A" = c(1, NA, 1, NA),
"B" = c(NA, 2, NA, 2)
)
I can get the desired result, by index[2:3]
df %>%
mutate(total = rowSums(.[2:3], na.rm = TRUE))
# A tibble: 4 × 4
ride A B total
<chr> <dbl> <dbl> <dbl>
1 bicycle 1 NA 1
2 motorcycle NA 2 2
3 car 1 NA 1
4 other NA 2 2
however, if I try specifying columns by name, strange results occur
df %>%
mutate(total = sum(c_across(c("A":"B")), na.rm = TRUE))
# A tibble: 4 × 4
ride A B total
<chr> <dbl> <dbl> <dbl>
1 bicycle 1 NA 6
2 motorcycle NA 2 6
3 car 1 NA 6
4 other NA 2 6
What am I doing wrong?
I can achieve what I want, by something like this:
df %>%
mutate_all(~replace(., is.na(.), 0)) %>%
mutate(total = A + B)
but I'd like to specify column names by passing a vector, so I can change to different combination of column names in future.
Something like this is what I'd like to achieve:
cols_to_sum <- c("A","B")
df %>%
mutate(total = sum(across(cols_to_sum), na.rm = TRUE))
You may use select to specify the columns you want to sum.
library(dplyr)
cols_to_sum <- c("A","B")
df %>%
mutate(total = rowSums(select(., all_of(cols_to_sum)), na.rm = TRUE))
# ride A B total
# <chr> <dbl> <dbl> <dbl>
#1 bicycle 1 NA 1
#2 motorcycle NA 2 2
#3 car 1 NA 1
#4 other NA 2 2
c_across works with rowwise -
df %>%
rowwise() %>%
mutate(total = sum(c_across(all_of(cols_to_sum)), na.rm = TRUE)) %>%
ungroup

How to replace any NAs in dataframe with the previous value in the same row in R

I have a data frame that contains several scattered NA values. I would like to fill those NAs with the values immediately preceding it in the cell to the left (same row) or the following cell to the right (same row) if a value doesn't exist to the left or is NA. It seems like using zoo::na.locf or tidyr::fill() can help with this but it only seems to work by taking the previous/next value either above or below in the same column.
I currently have this code but it's only filling based on above values in same column:
lapply(df, function(x) zoo::na.locf(zoo::na.locf(x, na.rm = FALSE), fromLast = TRUE))
My dataframe df looks like this:
C1 C2 C3 C4
1 2 1 9 2
2 NA 5 1 1
3 1 NA 3 8
4 3 NA NA 4
structure(list(C1 = c(2, NA, 1, 3), C2 = c(1, 5, NA, NA), C3 = c(9,
1, 3, NA), C4 = c(2, 1, 8, 4)), row.names = c(NA, 4L), class = "data.frame")
After filling the NA values, I would like it to look like this:
C1 C2 C3 C4
1 2 1 9 2
2 5 5 1 1
3 1 1 3 8
4 3 3 3 4
This is indeed not the usual way to store data, but if you just transpose you can use tidyr::fill(). Only downside is that it adds quite a bit of wrapping code.
xx <- structure(list(C1 = c(2, NA, 1, 3), C2 = c(1, 5, NA, NA), C3 = c(9,
1, 3, NA), C4 = c(2, 1, 8, 4)), row.names = c(NA, 4L), class = "data.frame")
xx %>%
t() %>%
as_tibble() %>%
tidyr::fill(everything(), .direction = "downup") %>%
t() %>%
as_tibble() %>%
set_names(names(xx))
# A tibble: 4 x 4
# C1 C2 C3 C4
# <dbl> <dbl> <dbl> <dbl>
#1 2 1 9 2
#2 5 5 1 1
#3 1 1 3 8
#4 3 3 3 4
With apply and na.locf
library(zoo)
df[] <- t(apply(df, 1, function(x) na.locf0(na.locf0(x), fromLast = TRUE)))
-output
df
# C1 C2 C3 C4
#1 2 1 9 2
#2 5 5 1 1
#3 1 1 3 8
#4 3 3 3 4
na.locf can directly work on dataframes but it works column-wise. If you want to make it run row-wise you can transpose the dataframe. You can also use fromLast = TRUE to fill the data from opposite direction. Finally, we use coalesce to select the first non-NA value from the two vectors.
library(zoo)
df[] <- dplyr::coalesce(c(t(na.locf(t(df), na.rm = FALSE))),
c(t(na.locf(t(df), na.rm = FALSE, fromLast = TRUE))))
df
# C1 C2 C3 C4
#1 2 1 9 2
#2 5 5 1 1
#3 1 1 3 8
#4 3 3 3 4

Summing values in R based on column value with dplyr

I have a data set that has the following information:
Subject Value1 Value2 Value3 UniqueNumber
001 1 0 1 3
002 0 1 1 2
003 1 1 1 1
If the value of UniqueNumber > 0, I would like to sum the values with dplyr for each subject from rows 1 through UniqueNumber and calculate the mean. So for Subject 001, sum = 2 and mean = .67.
total = 0;
average = 0;
for(i in 1:length(Data$Subject)){
for(j in 1:ncols(Data)){
if(Data$UniqueNumber[i] > 0){
total[i] = sum(Data[i,1:j])
average[i] = mean(Data[i,1:j])
}
}
Edit: I am only looking to sum through the number of columns listed in the 'UniqueNumber' column. So this is looping through every row and stopping at column listed in 'UniqueNumber'.
Example: Row 2 with Subject 002 should sum up the values in columns 'Value1' and 'Value2', while Row 3 with Subject 003 should only sum the value in column 'Value1'.
Not a tidyverse fan/expert, but I would try this using long format. Then, just filter by row index per group and then run any functions you want on a single column (much easier this way).
library(tidyr)
library(dplyr)
Data %>%
gather(variable, value, -Subject, -UniqueNumber) %>% # long format
group_by(Subject) %>% # group by Subject in order to get row counts
filter(row_number() <= UniqueNumber) %>% # filter by row index
summarise(Mean = mean(value), Total = sum(value)) %>% # do the calculations
ungroup()
## A tibble: 3 x 3
# Subject Mean Total
# <int> <dbl> <int>
# 1 1 0.667 2
# 2 2 0.5 1
# 3 3 1 1
A very similar way to achieve this could be filtering by the integers in the column names. The filter step comes before the group_by so it could potentially increase performance (or not?) but it is less robust as I'm assuming that the cols of interest are called "Value#"
Data %>%
gather(variable, value, -Subject, -UniqueNumber) %>% #long format
filter(as.numeric(gsub("Value", "", variable, fixed = TRUE)) <= UniqueNumber) %>% #filter
group_by(Subject) %>% # group by Subject
summarise(Mean = mean(value), Total = sum(value)) %>% # do the calculations
ungroup()
## A tibble: 3 x 3
# Subject Mean Total
# <int> <dbl> <int>
# 1 1 0.667 2
# 2 2 0.5 1
# 3 3 1 1
Just for fun, adding a data.table solution
library(data.table)
data.table(Data) %>%
melt(id = c("Subject", "UniqueNumber")) %>%
.[as.numeric(gsub("Value", "", variable, fixed = TRUE)) <= UniqueNumber,
.(Mean = round(mean(value), 3), Total = sum(value)),
by = Subject]
# Subject Mean Total
# 1: 1 0.667 2
# 2: 2 0.500 1
# 3: 3 1.000 1
Here is another method that uses tidyr::nest to collect the Values columns into a list so that we can iterate through the table with map2. In each row, we select the correct values from the Values list-col and take the sum or mean respectively.
library(tidyverse)
tbl <- read_table2(
"Subject Value1 Value2 Value3 UniqueNumber
001 1 0 1 3
002 0 1 1 2
003 1 1 1 1"
)
tbl %>%
filter(UniqueNumber > 0) %>%
nest(starts_with("Value"), .key = "Values") %>%
mutate(
sum = map2_dbl(UniqueNumber, Values, ~ sum(.y[1:.x], na.rm = TRUE)),
mean = map2_dbl(UniqueNumber, Values, ~ mean(as.numeric(.y[1:.x], na.rm = TRUE))),
)
#> # A tibble: 3 x 5
#> Subject UniqueNumber Values sum mean
#> <chr> <dbl> <list> <dbl> <dbl>
#> 1 001 3 <tibble [1 × 3]> 2 0.667
#> 2 002 2 <tibble [1 × 3]> 1 0.5
#> 3 003 1 <tibble [1 × 3]> 1 1
Created on 2019-02-14 by the reprex package (v0.2.1)
Check this solution:
df %>%
gather(key, val, Value1:Value3) %>%
group_by(Subject) %>%
mutate(
Sum = sum(val[c(1:(UniqueNumber[1]))]),
Mean = mean(val[c(1:(UniqueNumber[1]))]),
) %>%
spread(key, val)
Output:
Subject UniqueNumber Sum Mean Value1 Value2 Value3
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 001 3 2 0.667 1 0 1
2 002 2 1 0.5 0 1 1
3 003 1 1 1 1 1 1
OP might be interested only for dplyr solution but for comparison purposes and for future readers a base R option using mapply
cols <- grep("^Value", names(df))
cbind(df, t(mapply(function(x, y) {
if (y > 0) {
vals = as.numeric(df[x, cols[1:y]])
c(Sum = sum(vals, na.rm = TRUE), Mean = mean(vals, na.rm = TRUE))
}
else
c(0, 0)
},1:nrow(df), df$UniqueNumber)))
# Subject Value1 Value2 Value3 UniqueNumber Sum Mean
#1 1 1 0 1 3 2 0.667
#2 2 0 1 1 2 1 0.500
#3 3 1 1 1 1 1 1.000
Here we subset each row based on its respective UniqueNumber and then calculate it's sum and mean if the UniqueNumber value is greater than 0 or else return only 0.
A solution that uses purrr::map_df(which is from the same author as dplyr).
library(dplyr)
library(purrr)
l_dat <- split(dat, dat$Subject) # first we need to split in a list
map_df(l_dat, function(x) {
n_cols <- x$UniqueNumber # finds the number of columns
x <- as.numeric(x[2:(n_cols+1)]) # subsets x and converts to numeric
mean(x, na.rm=T) # mean to be returned
})
# output:
# # A tibble: 1 x 3
# `1` `2` `3`
# <dbl> <dbl> <dbl>
# 1 0.667 0.5 1
Another option (output format closer to a dplyr solution):
map_df(l_dat, function(x) {
n_cols <- x$UniqueNumber
id <- x$Subject
x <- as.numeric(x[2:(n_cols+1)])
tibble(id=id, mean_values=mean(x, na.rm=T))
})
# # A tibble: 3 x 2
# id mean_values
# <int> <dbl>
# 1 1 0.667
# 2 2 0.5
# 3 3 1
Just as an example I added a sum() then divided by length(x)-1:
map_df(l_dat, function(x) {
n_cols <- x$UniqueNumber
id <- x$Subject
x <- as.numeric(x[2:(n_cols+1)])
tibble(id=id,
mean_values=sum(x, na.rm=T)/(length(x)-1)) # change here
})
# # A tibble: 3 x 2
# id mean_values
# <int> <dbl>
# 1 1 1.
# 2 2 1.
# 3 3 Inf #beware of this case where you end up dividing by 0
Data:
tt <- "Subject Value1 Value2 Value3 UniqueNumber
001 1 0 1 3
002 0 1 1 2
003 1 1 1 1"
dat <- read.table(text=tt, header=T)
I think the easiest way is to set to NA the zeros that really should be NA, then use rowSums and rowMeans on the appropriate subset of columns.
Data[2:4][(col(dat[2:4])>dat[[5]])] <- NA
Data
# Subject Value1 Value2 Value3 UniqueNumber
# 1 1 1 0 1 3
# 2 2 0 1 NA 2
# 3 3 1 NA NA 1
library(dplyr)
Data%>%
mutate(sum = rowSums(.[2:4], na.rm = TRUE),
mean = rowMeans(.[2:4], na.rm = TRUE))
# Subject Value1 Value2 Value3 UniqueNumber sum mean
# 1 1 1 0 1 3 2 0.6666667
# 2 2 0 1 NA 2 1 0.5000000
# 3 3 1 NA NA 1 1 1.0000000
or transform(Data, sum = rowSums(Data[2:4],na.rm = TRUE), mean = rowMeans(Data[2:4],na.rm = TRUE)) to stay in base R.
data
Data <- structure(
list(Subject = 1:3,
Value1 = c(1L, 0L, 1L),
Value2 = c(0L, 1L, NA),
Value3 = c(1L, NA, NA),
UniqueNumber = c(3L, 2L, 1L)),
.Names = c("Subject","Value1", "Value2", "Value3", "UniqueNumber"),
row.names = c(NA, 3L), class = "data.frame")

Replace all NA values for variable with one row equal to 0

Slightly difficult to phrase, as far as I saw none of the similar questions answered my problem.
I have a data.frame such as:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
id val
1 a NA
2 a NA
3 a NA
4 a NA
5 b 1
6 b 2
7 b 2
8 b 3
9 c NA
10 c 2
11 c NA
12 c 3
and I want to get rid of all the NA values (easy enough using e.g. filter() ) but make sure that if this removes all of one id value (in this case it removes every instance of "a") that one extra row is inserted of (e.g.) a = 0
so that:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c 2
7 c 3
obviously easy enough to do this in a roundabout way but I was wondering if there's a tidy/elegant way to do this. I thought tidyr::complete() might help but not entirely sure how to apply it to a case like this
I don't care about the order of the rows
Cheers!
edit: updated with clearer desired output. might make desired answers submitted before that a bit less clear
Another idea using dplyr,
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(val = ifelse(row_number() == 1 & all(is.na(val)), 0, val)) %>%
na.omit()
which gives,
# A tibble: 5 x 2
# Groups: id [2]
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
We may do
df1 %>% group_by(id) %>% do(if(all(is.na(.$val))) replace(.[1, ], 2, 0) else na.omit(.))
# A tibble: 5 x 2
# Groups: id [2]
# id val
# <fct> <dbl>
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
After grouping by id, if everything in val is NA, then we leave only the first row with the second element replaced by 0, otherwise the same data is returned after applying na.omit.
In a more readable format that would be
df1 %>% group_by(id) %>%
do(if(all(is.na(.$val))) data.frame(id = .$id[1], val = 0) else na.omit(.))
(Here I presume that you indeed want to get rid of all NA values; otherwise there is no need for na.omit.)
df1[is.na(df1)] <- 0
df1[!(duplicated(df1$id) & df1$val == 0), ]
id val
1 a 0
5 b 1
6 b 2
7 b 2
8 b 3
Base R option is to find groups with all NAs and transform them by changing their val to 0 and select only unique rows so that there is only one row per group. We rbind this dataframe with the groups which are !all_NA.
all_NA <- with(df1, ave(is.na(val), id, FUN = all))
rbind(unique(transform(df1[all_NA, ], val = 0)), df1[!all_NA, ])
# id val
#1 a 0
#5 b 1
#6 b 2
#7 b 2
#8 b 3
dplyr option looks ugly but one way is to make two groups of dataframes one with groups of all NA values and other with groups of all non-NA values. For groups with all NA values we add row with it's id and val as 0 and bind this to the other group.
library(dplyr)
bind_rows(df1 %>%
group_by(id) %>%
filter(all(!is.na(val))),
df1 %>%
group_by(id) %>%
filter(all(is.na(val))) %>%
ungroup() %>%
summarise(id = unique(id),
val = 0)) %>%
arrange(id)
# id val
# <fct> <dbl>
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
Changed the df to make example more exhaustive -
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(case=sum(is.na(val))==n(), row_num=row_number() ) %>%
mutate(val=ifelse(is.na(val)&case,0,val)) %>%
filter( !(case&row_num!=1) ) %>%
select(id, val)
Output
id val
<fct> <dbl>
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3
Another base approach, one that doesn't maintain the order of the rows and takes advantage of factors remembering lost values:
df1 <- na.omit(df1)
df1 <- rbind(
df1,
data.frame(
id = levels(df1$id)[!levels(df1$id) %in% df1$id],
val = 0)
)
I do personally prefer the dplyr approach given by Sotos, as I don't like rbind-ing data.frames back together so it's a matter of taste, but this isn't unbearably complicated by my eye. It's easy enough to adapt to a character id column with a unique(df1$id) variable.
Here is an option too:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
slice(4:nrow(.))
This gives:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
Alternative:
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate_all(funs(replace(.,is.na(.),0))) %>%
unique()
UPDATE based on other requirements:
Some users suggested to test on this dataframe. Of course this answer assumes you'll look at everything by hand. Might be less useful if you have to look at everything by "hand" but here goes:
df1 <- data.frame(id = rep(c("a", "b","c"), each = 4), val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1 %>%
mutate_if(is.factor,as.character) %>%
mutate(val=ifelse(id=="a",0,val)) %>%
slice(4:nrow(.))
This yields:
id val
1 a 0
2 b 1
3 b 2
4 b 2
5 b 3
6 c NA
7 c 2
8 c NA
9 c 3
Here is a base R solution.
res <- lapply(split(df1, df1$id), function(DF){
if(anyNA(DF$val)) {
i <- is.na(DF$val)
DF$val[i] <- 0
DF <- rbind(DF[i & !duplicated(DF[i, ]), ], DF[!i, ])
}
DF
})
res <- do.call(rbind, res)
row.names(res) <- NULL
res
# id val
#1 a 0
#2 b 1
#3 b 2
#4 b 2
#5 b 3
Edit.
A dplyr solution could be the following.
It was tested with the original dataset posted by the OP, with the dataset in Vivek Kalyanarangan's answer and with the dataset in markus' comment, renamed df2 and df3, respectively.
library(dplyr)
na2zero <- function(DF){
DF %>%
group_by(id) %>%
mutate(val = ifelse(is.na(val), 0, val),
crit = val == 0 & duplicated(val)) %>%
filter(!crit) %>%
select(-crit)
}
na2zero(df1)
na2zero(df2)
na2zero(df3)
One may try this :
df1 = data.frame(id = rep(c("a", "b","c"), each = 4),
val = c(NA, NA, NA, NA, 1, 2, 2, 3,NA,2,NA,3))
df1
# id val
#1 a NA
#2 a NA
#3 a NA
#4 a NA
#5 b 1
#6 b 2
#7 b 2
#8 b 3
#9 c NA
#10 c 2
#11 c NA
#12 c 3
Task is to remove all rows corresponding to any id IFF val for the corresponding id is all NAs and add new row with this id and val = 0.
In this example, id = a.
Note : val for c also has NAs but all the val corresponding to c are not NA therefore we need to remove the corresponding row for c where val = NA.
So lets create another column say, val2 which indicates 0 means its all NAs and 1 otherwise.
library(dplyr)
df1 = df1 %>%
group_by(id) %>%
mutate(val2 = if_else(condition = all(is.na(val)),true = 0, false = 1))
df1
# A tibble: 12 x 3
# Groups: id [3]
# id val val2
# <fct> <dbl> <dbl>
#1 a NA 0
#2 a NA 0
#3 a NA 0
#4 a NA 0
#5 b 1 1
#6 b 2 1
#7 b 2 1
#8 b 3 1
#9 c NA 1
#10 c 2 1
#11 c NA 1
#12 c 3 1
Get the list of ids with corresponding val = NA for all.
all_na = unique(df1$id[df1$val2 == 0])
Then remove theids from the dataframe df1 with val = NA.
df1 = na.omit(df1)
df1
# A tibble: 6 x 3
# Groups: id [2]
# id val val2
# <fct> <dbl> <dbl>
# 1 b 1 1
# 2 b 2 1
# 3 b 2 1
# 4 b 3 1
# 5 c 2 1
# 6 c 3 1
And create a new dataframe with ids in all_na and val = 0
all_na_df = data.frame(id = all_na, val = 0)
all_na_df
# id val
# 1 a 0
then combine these two dataframes.
df1 = bind_rows(all_na_df, df1[,c('id', 'val')])
df1
# id val
# 1 a 0
# 2 b 1
# 3 b 2
# 4 b 2
# 5 b 3
# 6 c 2
# 7 c 3
Hope this helps and Edits are most welcomed :-)

Resources