How to take a maximum of several columns in R/dplyr [duplicate] - r

This question already has answers here:
Calculate max value across multiple columns by multiple groups
(5 answers)
Closed 2 years ago.
I have data which looks basically like this:
id <- c(1:5)
VolumeA <- c(12, NA, NA, NA, NA)
VolumeB <- c(NA, 34, NA, NA, NA)
VolumeC <- c(NA, NA, 56, NA, NA)
VolumeD <- c(NA, NA, NA, 78, NA)
VolumeE <- c(NA, NA, NA, NA, 90)
df_now <- tibble(id, VolumeA, VolumeB, VolumeC, VolumeD, VolumeE)
df_now
# A tibble: 5 x 6
id VolumeA VolumeB VolumeC VolumeD VolumeE
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 12 NA NA NA NA
2 2 NA 34 NA NA NA
3 3 NA NA 56 NA NA
4 4 NA NA NA 78 NA
5 5 NA NA NA NA 90
In the IRL dataset, there are MANY more Volume[label] columns, but in each row I only need one of them: the largest one. So I want to create a new variable which has the largest value:
Volume <- c(12, 34, 56, 78, 90)
df_desired <- cbind(df_now, Volume)
df_desired
id VolumeA VolumeB VolumeC VolumeD VolumeE Volume
1 1 12 NA NA NA NA 12
2 2 NA 34 NA NA NA 34
3 3 NA NA 56 NA NA 56
4 4 NA NA NA 78 NA 78
5 5 NA NA NA NA 90 90
After looking at the dplyr documentation, I tried this...
library(tidyverse)
df_try <- df_now %>%
mutate(Volume = across(contains("Volume"), max, na.rm = TRUE))
...but got back a tibble of data, not a single column. Can someone tell me how to do this properly?
(Please assume, due to issues with my IRL data too complicated to explain here, that I cannot just gather and spread my data. I want to use a conditional mutate.)

Since you have "MANY more Volume[label] columns", any solution that works over each row (rowwise) or individually on each column (with reduce or Reduce) is going to be much slower than necessary.
df_now %>%
mutate(Volume = do.call(pmax, c(select(., starts_with('Volume')), na.rm = TRUE)))
# # A tibble: 5 x 7
# id VolumeA VolumeB VolumeC VolumeD VolumeE Volume
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 12 NA NA NA NA 12
# 2 2 NA 34 NA NA NA 34
# 3 3 NA NA 56 NA NA 56
# 4 4 NA NA NA 78 NA 78
# 5 5 NA NA NA NA 90 90
Proof of relative improvement:
Using Reduce or purrr::reduce or anything that will iterate per column (well, with nc columns, then it will iterate nc-1 times):
mypmax <- function(...) { message("mypmax"); pmax(...); }
df_now %>%
mutate(Volume = reduce(select(., starts_with('Volume')), mypmax, na.rm = TRUE))
# mypmax
# mypmax
# mypmax
# mypmax
# # A tibble: 5 x 7
# id VolumeA VolumeB VolumeC VolumeD VolumeE Volume
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 12 NA NA NA NA 12
# 2 2 NA 34 NA NA NA 34
# 3 3 NA NA 56 NA NA 56
# 4 4 NA NA NA 78 NA 78
# 5 5 NA NA NA NA 90 90
Anything rowwise is doing this once per row, perhaps even worse (assuming more rows than columns in your data:
mymax <- function(...) { message("mymax"); max(...); }
df_now %>%
rowwise %>%
mutate(Volume = mymax(c_across(starts_with('Volume')), na.rm = TRUE))
# mymax
# mymax
# mymax
# mymax
# mymax
# # A tibble: 5 x 7
# # Rowwise:
# id VolumeA VolumeB VolumeC VolumeD VolumeE Volume
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 12 NA NA NA NA 12
# 2 2 NA 34 NA NA NA 34
# 3 3 NA NA 56 NA NA 56
# 4 4 NA NA NA 78 NA 78
# 5 5 NA NA NA NA 90 90
Do it once across all columns, all rows:
mypmax <- function(...) { message("mypmax"); pmax(...); }
df_now %>%
mutate(Volume = do.call(mypmax, c(select(., starts_with('Volume')), na.rm = TRUE)))
# mypmax
# # A tibble: 5 x 7
# id VolumeA VolumeB VolumeC VolumeD VolumeE Volume
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 12 NA NA NA NA 12
# 2 2 NA 34 NA NA NA 34
# 3 3 NA NA 56 NA NA 56
# 4 4 NA NA NA 78 NA 78
# 5 5 NA NA NA NA 90 90
The benchmarking is minor at this scale, but will be more dramatic with larger data:
microbenchmark::microbenchmark(
red = df_now %>% mutate(Volume = reduce(select(., starts_with('Volume')), pmax, na.rm = TRUE)),
row = df_now %>% rowwise %>% mutate(Volume = max(c_across(starts_with('Volume')), na.rm = TRUE)),
sgl = df_now %>% mutate(Volume = do.call(pmax, c(select(., starts_with('Volume')), na.rm = TRUE)))
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# red 4.9736 5.36240 7.240561 5.68010 6.19915 70.7482 100
# row 4.5813 5.02020 6.082047 5.34460 5.70345 63.1166 100
# sgl 3.8270 4.18605 5.803043 4.43215 4.76030 65.7217 100

We can use pmax (first posted the pmax solution here). Note that the relative improvement is very small with do.call
library(dplyr)
library(purrr)
df_now %>%
mutate(Volume = reduce(select(., starts_with('Volume')), pmax, na.rm = TRUE))
# A tibble: 5 x 7
# id VolumeA VolumeB VolumeC VolumeD VolumeE Volume
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 12 NA NA NA NA 12
#2 2 NA 34 NA NA NA 34
#3 3 NA NA 56 NA NA 56
#4 4 NA NA NA 78 NA 78
#5 5 NA NA NA NA 90 90
Or with c_across and max (using only tidyverse approaches)
df_now %>%
rowwise %>%
mutate(Volume = max(c_across(starts_with('Volume')), na.rm = TRUE))
# A tibble: 5 x 7
# Rowwise:
# id VolumeA VolumeB VolumeC VolumeD VolumeE Volume
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 12 NA NA NA NA 12
#2 2 NA 34 NA NA NA 34
#3 3 NA NA 56 NA NA 56
#4 4 NA NA NA 78 NA 78
#5 5 NA NA NA NA 90 90
Benchmarks
system.time({df_now %>% mutate(Volume = reduce(select(., starts_with('Volume')), pmax, na.rm = TRUE))})
# user system elapsed
# 0.023 0.006 0.029
system.time({df_now %>% rowwise %>% mutate(Volume = max(c_across(starts_with('Volume')), na.rm = TRUE))})
# user system elapsed
# 0.012 0.002 0.015
system.time({df_now %>% mutate(Volume = do.call(pmax, c(select(., starts_with('Volume')), na.rm = TRUE)))})
# user system elapsed
# 0.011 0.001 0.011
NOTE: Not that much difference in timings

Related

Filter Data frame if at least 3 columns agrees the condition

I have a data frame as this
df <- data.frame(student_name = c('U','V','X','Y','Z'),
grade = c('AA','CC','DD','AB','BB'),
math_marks = c(40,80,38,97,65),
eng_marks = c(95,78,36,41,25),
sci_marks = c(56,25,36,87,15),
Point_A=c(1,1,1,1,NA),
Point_B=c(NA,1,NA,1,1),
Point_C=c(NA,1,NA,NA,NA),
Point_D=c(NA,NA,NA,NA,1),
Point_E=c(NA,1,NA,NA,1))
I need add a column called "Point" based on the column values Point_A to Point_E, if any 3 column value equals 1.
Excepted output.
df <- data.frame(student_name = c('U','V','X','Y','Z'),
grade = c('AA','CC','DD','AB','BB'),
math_marks = c(40,80,38,97,65),
eng_marks = c(95,78,36,41,25),
sci_marks = c(56,25,36,87,15),
Point_A=c(1,1,1,1,NA),
Point_B=c(NA,1,NA,1,1),
Point_C=c(NA,1,NA,NA,NA),
Point_D=c(NA,NA,NA,NA,1),
Point_E=c(NA,1,NA,NA,1),
Point=c(NA,1,NA,NA,1))
So far I was doing with this for all possible 3 combinations
df%>% filter(Point_A ==1,Point_B==1,Point_C==1)
Is there any other way to do this ?
To subset down to the rows with that condition use rowSums with across to sum the 1's by row:
df %>% filter(rowSums(across(starts_with("Point")), na.rm = TRUE) >= 3)
## student_name grade math_marks eng_marks sci_marks Point_A Point_B Point_C Point_D Point_E
## 1 V CC 80 78 25 1 1 1 NA 1
## 2 Z BB 65 25 15 NA 1 NA 1 1
or to add a 0/1 Point column indicating whether that row satisfies the condition:
df %>% mutate(Point = + (rowSums(across(starts_with("Point")), na.rm = TRUE) >= 3))
## student_name grade math_marks eng_marks sci_marks Point_A Point_B Point_C Point_D Point_E Point
## 1 U AA 40 95 56 1 NA NA NA NA 0
## 2 V CC 80 78 25 1 1 1 NA 1 1
## 3 X DD 38 36 36 1 NA NA NA NA 0
## 4 Y AB 97 41 87 1 1 NA NA NA 0
## 5 Z BB 65 25 15 NA 1 NA 1 1 1
ifelse with rowwise option:
library(dplyr)
df %>%
rowwise %>%
mutate(Point = ifelse(rowSums(across(Point_A:Point_E, ~ .x == 1), na.rm = T)>=3, 1, NA))
Output:
# A tibble: 5 × 11
# Rowwise:
student_name grade math_marks eng_marks sci_marks Point_A Point_B Point_C Point_D Point_E Point
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 U AA 40 95 56 1 NA NA NA NA NA
2 V CC 80 78 25 1 1 1 NA 1 1
3 X DD 38 36 36 1 NA NA NA NA NA
4 Y AB 97 41 87 1 1 NA NA NA NA
5 Z BB 65 25 15 NA 1 NA 1 1 1

Group_by id and count the consective NA's and then restart counting when a new series of NA's is encountered

I have a dataframe like this:
df <- data_frame(id = c(rep('A', 10), rep('B', 10)),
value = c(1:3, rep(NA, 2), 1:2, rep(NA, 3), 1, rep(NA, 4), 1:3, rep(NA, 2)))
I need to count the number of consective NA's in the value column. The count needs to be grouped by ID, and it needs to restart at 1 every time a new NA or new series of NA's is encountered. The exptected output should look like this:
df$expected_output <- c(rep(NA, 3), 1:2, rep(NA, 2), 1:3, NA, 1:4, rep(NA, 3), 1:2)
If anyone can give me a dplyr solution that would also be great :)
I've tried a few things but nothing is giving any sort of sensical result. Thanks in advance^!
A solution using dplyr and data.table.
library(dplyr)
library(data.table)
df2 <- df %>%
group_by(id) %>%
mutate(info = rleid(value)) %>%
group_by(id, info) %>%
mutate(expected_output = row_number()) %>%
ungroup() %>%
mutate(expected_output = ifelse(!is.na(value), NA, expected_output)) %>%
select(-info)
df2
# # A tibble: 20 x 3
# id value expected_output
# <chr> <dbl> <int>
# 1 A 1 NA
# 2 A 2 NA
# 3 A 3 NA
# 4 A NA 1
# 5 A NA 2
# 6 A 1 NA
# 7 A 2 NA
# 8 A NA 1
# 9 A NA 2
# 10 A NA 3
# 11 B 1 NA
# 12 B NA 1
# 13 B NA 2
# 14 B NA 3
# 15 B NA 4
# 16 B 1 NA
# 17 B 2 NA
# 18 B 3 NA
# 19 B NA 1
# 20 B NA 2
We can use rle to get length of groups that are or are not na, and use purrr::map2 to apply seq if they are NA and get the growing count or just fill in with NA values using rep.
library(tidyverse)
count_na <- function(x) {
r <- rle(is.na(x))
consec <- map2(r$lengths, r$values, ~ if (.y) seq(.x) else rep(NA, .x))
unlist(consec)
}
df %>%
mutate(expected_output = count_na(value))
#> # A tibble: 20 × 3
#> id value expected_output
#> <chr> <dbl> <int>
#> 1 A 1 NA
#> 2 A 2 NA
#> 3 A 3 NA
#> 4 A NA 1
#> 5 A NA 2
#> 6 A 1 NA
#> 7 A 2 NA
#> 8 A NA 1
#> 9 A NA 2
#> 10 A NA 3
#> 11 B 1 NA
#> 12 B NA 1
#> 13 B NA 2
#> 14 B NA 3
#> 15 B NA 4
#> 16 B 1 NA
#> 17 B 2 NA
#> 18 B 3 NA
#> 19 B NA 1
#> 20 B NA 2
Here is a solution using rle:
x <- rle(is.na(df$value))
df$new[is.na(df$value)] <- sequence(x$lengths[x$values])
# A tibble: 20 x 3
id value new
<chr> <dbl> <int>
1 A 1 NA
2 A 2 NA
3 A 3 NA
4 A NA 1
5 A NA 2
6 A 1 NA
7 A 2 NA
8 A NA 1
9 A NA 2
10 A NA 3
11 B 1 NA
12 B NA 1
13 B NA 2
14 B NA 3
15 B NA 4
16 B 1 NA
17 B 2 NA
18 B 3 NA
19 B NA 1
20 B NA 2
Yet another solution:
library(tidyverse)
df %>%
mutate(aux =data.table::rleid(value)) %>%
group_by(id, aux) %>%
mutate(eout = ifelse(is.na(value), row_number(), NA_real_)) %>%
ungroup %>% select(-aux)
#> # A tibble: 20 × 4
#> id value expected_output eout
#> <chr> <dbl> <int> <dbl>
#> 1 A 1 NA NA
#> 2 A 2 NA NA
#> 3 A 3 NA NA
#> 4 A NA 1 1
#> 5 A NA 2 2
#> 6 A 1 NA NA
#> 7 A 2 NA NA
#> 8 A NA 1 1
#> 9 A NA 2 2
#> 10 A NA 3 3
#> 11 B 1 NA NA
#> 12 B NA 1 1
#> 13 B NA 2 2
#> 14 B NA 3 3
#> 15 B NA 4 4
#> 16 B 1 NA NA
#> 17 B 2 NA NA
#> 18 B 3 NA NA
#> 19 B NA 1 1
#> 20 B NA 2 2

split row values into columns

I have some data that looks like this:
samp
# A tibble: 5 x 2
ID Source
<dbl> <chr>
1 34221 75
2 33861 75
3 59741 126,123
4 56561 111,105
5 55836 36,34,34,36,22
Of any of the distinct values, I want to make a new column. If the value exists in a row I want to impute an "x" otherwise no value should be imputed.
Example (pseudo code) of the expected result:
ID 75 126 123 111 105 36 34 22
1 34221 x
2 33861 x
3 59741 x x
4 56561 x x
5 55836 x x x
I tried it by the separtate function of the tydr package. Like this for the start.
into = unique(unlist(strsplit(samp$Source, ",")))
samp %>% separate(col = "Source", into = into, sep = ",")
However, this doesn´t work, because if there are more then one value in a row the values will not be assigned to the respective column (e.g. for the ID 59741 the value 126 is in column 75 and not in the column 126).
A tibble: 5 x 9
ID `75` `126` `123` `111` `105` `36` `34` `22`
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 34221 75 NA NA NA NA NA NA NA
2 33861 75 NA NA NA NA NA NA NA
3 59741 126 123 NA NA NA NA NA NA
4 56561 111 105 NA NA NA NA NA NA
5 55836 36 34 34 36 22 NA NA NA
Here is a dput:
structure(list(ID = c(34221, 33861, 59741, 56561, 55836), Source = c("75",
"75", "126,123", "111,105", "36,34,34,36,22")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
Could also do:
library(tidyverse)
df %>%
mutate(Source = strsplit(Source, ","),
dummy = "x") %>%
unnest() %>% distinct() %>%
spread(Source, dummy)
Output:
ID `105` `111` `123` `126` `22` `34` `36` `75`
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 33861 NA NA NA NA NA NA NA x
2 34221 NA NA NA NA NA NA NA x
3 55836 NA NA NA NA x x x NA
4 56561 x x NA NA NA NA NA NA
5 59741 NA NA x x NA NA NA NA
The package splitstackshape is very handy for such operations, i.e.
library(splitstackshape)
cSplit_e(df, "Source", mode = "binary", type = "character", fill = 0, drop = TRUE)
which gives,
ID Source_105 Source_111 Source_123 Source_126 Source_22 Source_34 Source_36 Source_75
1 34221 0 0 0 0 0 0 0 1
2 33861 0 0 0 0 0 0 0 1
3 59741 0 0 1 1 0 0 0 0
4 56561 1 1 0 0 0 0 0 0
5 55836 0 0 0 0 1 1 1 0
Another option is using tidyr::separate_rows
library(dplyr)
library(tidyr)
df %>% separate_rows(Source,sep=',') %>% distinct() %>%
mutate(dummy='X') %>% spread(Source,dummy)
ID 105 111 123 126 22 34 36 75
1 33861 <NA> <NA> <NA> <NA> <NA> <NA> <NA> X
2 34221 <NA> <NA> <NA> <NA> <NA> <NA> <NA> X
3 55836 <NA> <NA> <NA> <NA> X X X <NA>
4 56561 X X <NA> <NA> <NA> <NA> <NA> <NA>
5 59741 <NA> <NA> X X <NA> <NA> <NA> <NA>

R Mutate multiple columns with ifelse()-condition

I want to create several columns with a ifelse()-condition. Here is my example-code:
df <- tibble(
date = lubridate::today() +0:9,
return= c(1,2.5,2,3,5,6.5,1,9,3,2))
And now I want to add new columns with ascending conditions (from 1 to 8). The first column should only contain values from the "return"-column, which are higher than 1, the second column should only contain values, which are higher than 2, and so on...
I can calculate each column with a mutate() function:
df <- df %>% mutate( `return>1`= ifelse(return > 1, return, NA))
df <- df %>% mutate( `return>2`= ifelse(return > 2, return, NA))
df <- df %>% mutate( `return>3`= ifelse(return > 3, return, NA))
df <- df %>% mutate( `return>4`= ifelse(return > 4, return, NA))
df <- df %>% mutate( `return>5`= ifelse(return > 5, return, NA))
df <- df %>% mutate( `return>6`= ifelse(return > 6, return, NA))
df <- df %>% mutate( `return>7`= ifelse(return > 7, return, NA))
df <- df %>% mutate( `return>8`= ifelse(return > 8, return, NA))
> head(df)
# A tibble: 6 x 10
date return `return>1` `return>2` `return>3` `return>4` `return>5` `return>6` `return>7` `return>8`
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-03-08 1 NA NA NA NA NA NA NA NA
2 2019-03-09 2.5 2.5 2.5 NA NA NA NA NA NA
3 2019-03-10 2 2 NA NA NA NA NA NA NA
4 2019-03-11 3 3 3 NA NA NA NA NA NA
5 2019-03-12 5 5 5 5 5 NA NA NA NA
6 2019-03-13 6.5 6.5 6.5 6.5 6.5 6.5 6.5 NA NA
Is there an easier way to create all these columns and reduce all this code? Maybe with a map_function? And is there a way to automatically name the new columns?
An option with lapply
n <- seq(1, 8)
df[paste0("return > ", n)] <- lapply(n, function(x)
replace(df$return, df$return <= x, NA))
# date return `return > 1` `return > 2` `return > 3` .....
# <date> <dbl> <dbl> <dbl> <dbl>
#1 2019-03-08 1 NA NA NA
#2 2019-03-09 2.5 2.5 2.5 NA
#3 2019-03-10 2 2 NA NA
#4 2019-03-11 3 3 3 NA
#5 2019-03-12 5 5 5 5
#6 2019-03-13 6.5 6.5 6.5 6.5
#...
Here is a for loop solution:
for(i in 1:8){
varname =paste0("return>",i)
df[[varname]] <- with(df, ifelse(return > i, return, NA))
}
use purrr::map_df
> bind_cols(df,purrr::map_df(setNames(1:8,paste0('return>',1:8)),
+ function(x) ifelse(df$return > x, df$return, NA)))
# A tibble: 6 x 10
# date return `return>1` `return>2` `return>3` `return>4` `return>5` `return>6` `return>7` `return>8`
# <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2019-03-08 1 NA NA NA NA NA NA NA NA
# 2 2019-03-09 2.5 2.5 2.5 NA NA NA NA NA NA
# 3 2019-03-10 2 2 NA NA NA NA NA NA NA
# 4 2019-03-11 3 3 3 NA NA NA NA NA NA
# 5 2019-03-12 5 5 5 5 5 NA NA NA NA
# 6 2019-03-13 6.5 6.5 6.5 6.5 6.5 6.5 6.5 NA NA

R: Get multiple rolling means of multiple columns that is lagged

I want to get rolling means for the past 1 to 10 events grouped by a column for multiple columns. I also want it very fast such as in dplyr or data.table because I want to run this on a 1,000,000 x 1,000 dataframe.
starting df
data.table(a = c("bill", "bob", "bill", "bob", "bill", "bob"),
b = c(1,2,1,1,3,2),
c = c(2,3,9,1,4,1),
d = c(4,5,1,7,3,4))
1: bill 1 2 4
2: bob 2 3 5
3: bill 1 9 1
4: bob 1 1 7
5: bill 3 4 3
6: bob 2 1 4
desired df
I want the rolling mean of only b and c grouped by column a with a window of 1 to 10 for each column lagged 1 row.
a b c d b_roll1 c_roll1 b_roll2 c_roll2 b_roll3 c_roll3
1: bill 1 2 4 NA NA NA NA NA NA
2: bob 2 3 5 NA NA NA NA NA NA
3: bill 1 9 1 1 2 1 2 1 2
4: bob 1 1 7 2 3 2 3 2 3
5: bill 3 4 3 1 9 1 5.5 1 5.5
6: bob 2 1 4 1 1 1 2 1 2
Your example outcome doesn't make too much sense to me, but here is an example on how you can generate many mutate calls programmatically.
An extendable solution using lazyeval and RcppRoll:
library(tidyverse)
vars <- c('b', 'c')
ns <- 1:10
com <- expand.grid(vars, ns, stringsAsFactors = FALSE)
dots <- map2(com[[1]], com[[2]],
~lazyeval::interp(~RcppRoll::roll_meanr(x, y, fill = NA), x = as.name(.x), y = .y))
names(dots) <- apply(com, 1, paste0, collapse = '_')
D %>%
group_by(a) %>%
mutate_(.dots = dots)
Gives:
Source: local data frame [6 x 24]
Groups: a [2]
a b c d `b_ 1` `c_ 1` `b_ 2` `c_ 2` `b_ 3` `c_ 3` `b_ 4` `c_ 4` `b_ 5` `c_ 5` `b_ 6` `c_ 6` `b_ 7` `c_ 7` `b_ 8` `c_ 8` `b_ 9`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 bill 1 2 4 1 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 bob 2 3 5 2 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 bill 1 2 1 1 2 1.0 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
4 bob 1 1 7 1 1 1.5 2 NA NA NA NA NA NA NA NA NA NA NA NA NA
5 bill 3 4 3 3 4 2.0 3 1.666667 2.666667 NA NA NA NA NA NA NA NA NA NA NA
6 bob 2 1 4 2 1 1.5 1 1.666667 1.666667 NA NA NA NA NA NA NA NA NA NA NA
# ... with 3 more variables: `c_ 9` <dbl>, b_10 <dbl>, c_10 <dbl>
I am still not completely following you. It seems that you apply a combination of a lag and a rolled mean. For just the rolled mean this is a solution using dplyr and RcppRoll.
roll_mean_na <- function(x, lag){
c(rep(NA, lag - 1), RcppRoll::roll_mean(x, lag, align = "left"))
}
library(dplyr)
df %>% group_by(a) %>%
mutate(b_2 = roll_mean_na(b, 2), c_2 = roll_mean_na(c, 2),
b_3 = roll_mean_na(b, 3), c_3 = roll_mean_na(c, 3),
b_4 = roll_mean_na(b, 4), c_4 = roll_mean_na(c, 4))

Resources