modified cumsum - summarise with ratios r - r

I have a df of moving particles that I'm tracking hourly. I have reference distances at hours 1,11,21,31,41, and the tracks all end at some point between those hours.
So what I want to do is find the total distance traveled for each group/trial between hour0 and hour(end). That means I'll need to add the cumulative sum of hour references before end, and the proportional distance for the hour after end.
For example, if the track ends at hour 34, I would know the length traveled would be (cumsum of lengths of hours 1,11,21,31) + 3/10 length(41).
I've got my code to where I can find the cumsum, but I can't figure out how to add the extra little proportional bit.
set.seed(1)
df1 <- data.frame(matrix(nrow=20,ncol=4))
colnames(df1) <- c("group","trial","hour","length")
df1$group <- rep(c("a","b"),each=10)
df1$trial <- rep(c(1,1,1,1,1,2,2,2,2,2),times=2)
df1$hour <- rep(c(1,11,21,31,41),times=4)
df1$length <- rep(c(10,12,13,17,21),times=4)
df2 <- data.frame(matrix(nrow=4,ncol=3))
colnames(df2) <- c("group","trial","end")
df2$group <- c("a","a","b","b")
df2$trial <- c(1,2,1,2)
df2$end <- runif(4,1,40)
df3 <- df2 %>%
left_join(df1,by=c("group","trial")) %>%
group_by(group,trial) %>%
mutate(cumlength = cumsum(length)) %>%
slice({i1 <- which(hour <= end)
c(i1, tail(i1, 1) + 1)})
that gets me to a df with all the data I should need, but I want to be able to summarise() to find the sum of lengths to the last hour + proportional extra bit.
df3 %>% summarise(total = sum(length))
# sum of all lengths, but this overshoots.
Thanks for the help

If I understand your question, you want to linearly interpolate your cumsum(length) ~ hour for any arbitrary hour (end). There's a handy base R function for this, approxfun.
Given your df1 and df2:
library(dplyr)
df1 %>%
group_by(group, trial) %>%
summarise(
f = list(approxfun(cumsum(length) ~ hour))
)
# A tibble: 4 x 3
# Groups: group [2]
group trial f
<chr> <dbl> <list>
1 a 1 <fn>
2 a 2 <fn>
3 b 1 <fn>
4 b 2 <fn>
Now you have a list of functions, each of which can be evaluated at your selected time. So let's do that join:
df1 %>%
group_by(group, trial) %>%
summarise(
f = list(approxfun(cumsum(length) ~ hour))
) %>%
full_join(df2)
Joining, by = c("group", "trial")
# A tibble: 4 x 4
# Groups: group [2]
group trial f end
<chr> <dbl> <list> <dbl>
1 a 1 <fn> 11.4
2 a 2 <fn> 15.5
3 b 1 <fn> 23.3
4 b 2 <fn> 36.4
Now we can just purrr::map* along that list. We'll use map2 since we want to evaluate along f and end in parallel, and we know it should return a single number, so we'll specifically use map2_dbl.
library(purrr)
df1 %>%
group_by(group, trial) %>%
summarise(
f = list(approxfun(cumsum(length) ~ hour))
) %>%
full_join(df2) %>%
mutate(total = map2_dbl(f, end, ~.x(.y)))
Joining, by = c("group", "trial")
# A tibble: 4 x 5
# Groups: group [2]
group trial f end total
<chr> <dbl> <list> <dbl> <dbl>
1 a 1 <fn> 11.4 22.5
2 a 2 <fn> 15.5 27.9
3 b 1 <fn> 23.3 39.0
4 b 2 <fn> 36.4 63.4
If you haven't used purrr before, that might look like black magic. The map functions are iterators, similar to lapply in base R. They take an element of a list and apply a function on it. You can use these "anonymous" functions, written like formulas. Something like ~.x+.y is the same as function(arg1, arg2) {arg1 + arg2}.
The powerful application here is that one of the arguments is itself the function we want to use, the column f. By passing it first, it's .x in the anonymous function. The second argument, end, becomes .y. So then ~.x(.y) is the same as calling f(end) for each of the four pairs.
Let's do some sanity checking by visualizing the result. Store the above results in df3 and:
library(ggplot2)
df1 %>%
group_by(group, trial) %>%
mutate(cumlength = cumsum(length)) %>%
ggplot(aes(hour, cumlength)) +
geom_point() +
geom_path() +
geom_vline(
data = df3,
aes(xintercept = end),
color = "red"
) +
geom_point(
data = df3,
aes(end, total),
color = "red", size = 3, shape = 0
) +
facet_grid(group~trial)

Related

Search elements of a single character string in a dataframe column to subset it

I have two dataframes:
set.seed(1)
df1 <- data.frame(k1 = "AFD(1);Acf(2);Vgr7(2);"
,k2 = "ABC(7);BHG(46);TFG(675);")
df2 <- data.frame(site =c("AFD(1);AFD(2);", "Acf(2);", "TFG(677);",
"XX(275);", "ABC(7);", "ABC(9);")
,p1 = rnorm(6, mean = 5, sd = 2)
,p2 = rnorm(6, mean = 6.5, sd = 2))
The first dataframe is in fact a list of often very long strings, made of 'elements". Each "element" is made of a few letters/numbers, followed by a number in brackets, followed by a semicolon. In this example I only put 3 "elements" into each string, but in my real dataframe there are tens to hundreds of them.
> df1
k1 k2
1 AFD(1);Acf(2);Vgr7(2); ABC(7);BHG(46);TFG(675);
The second dataframe shares some of the "elements" with df1. Its first column, called site, contains some (not all) "elements" from the first dataframe, sometimes the "element" forms the whole string, and sometimes is a part of a longer string:
> df2
site p1 p2
1 AFD(1);AFD(2); 4.043700 3.745881
2 Acf(2); 5.835883 5.670011
3 TFG(677); 7.717359 5.711420
4 XX(275); 4.794425 6.381373
5 ABC(7); 5.775343 8.700051
6 ABC(9); 4.892390 8.026351
I would like to filter the whole df2 using df2$site and each k column from df1 (there are many K columns, not all of them contain k in the names).
The easiest way to explain this is to show how the desired output would look like.
> outcome
k site p1 p2
1 k1 AFD(1);AFD(2): 4.043700 3.745881
2 k1 Acf(2); 5.835883 5.670011
3 k2 ABC(7); 5.775343 8.700051
The first column of the outcome dataframe corresponds to the column names in df1. The second column corresponds to the site column of df2 and contains only sites from df1 columns that were found in df2$sites. Other columns are from df2.
I appreciate that this question is made of two separate "problems", one grepping-related and one related to looping through df1 columns. I decided to show the task in its entirety in case there exists a solution that addresses both in one go.
FAILED SOLUTION 1
I can create a string to grep, but for each column separately:
# this replaces the semicolons with "|", but does not escape the brackets.
k1_pattern <- df1 %>%
select(k1) %>%
deframe() %>%
str_replace_all(";","|")
And then I am not sure how to use it. This (below) didn't work, maybe because I didn't escape brackets, but I am struggling with doing it:
k1_result <- df2 %>%
filter(grepl(pattern = k1_pattern, site))
But even if it did work, it would only deal with a single column from df1, and I have many, and would like to perform this operation on all df1 columns at the same time.
FAILED SOLUTION 2
I can create a list of sites to search in df2 from columns in df1:
k1_sites<- df1 %>%
select(k1) %>%
deframe() %>%
strsplit(., "[;]") %>%
unlist()
but the delimiter is lost here, and %in% cannot be used, as the match will sometimes be partial.
library(dplyr)
df2 %>%
mutate(site_list = strsplit(site, ";")) %>%
rowwise() %>%
filter(length(intersect(site_list,
unlist(strsplit(x = paste0(c(t(df1)), collapse=""),
split = ";")))) != 0) %>%
select(-site_list)
#> # A tibble: 3 x 3
#> # Rowwise:
#> site p1 p2
#> <chr> <dbl> <dbl>
#> 1 AFD(1);AFD(2); 3.75 7.47
#> 2 Acf(2); 5.37 7.98
#> 3 ABC(7); 5.66 9.52
Updated answer:
library(dplyr)
library(tidyr)
df1 %>%
rownames_to_column("id") %>%
pivot_longer(-id, names_to = "k", values_to = "site") %>%
separate_rows(site, sep = ";") %>%
filter(site != "") %>%
select(-id) -> df1_k
df2 %>%
tibble::rownames_to_column("id") %>%
separate_rows(site, sep = ";") %>%
full_join(., df1_k, by = c("site")) %>%
group_by(id) %>%
fill(k, .direction = "downup") %>%
filter(!is.na(id) & !is.na(k)) %>%
summarise(k = first(k),
site = paste0(site, collapse = ";"),
p1 = first(p1),
p2 = first(p2), .groups = "drop") %>%
select(-id)
#> # A tibble: 3 x 4
#> k site p1 p2
#> <chr> <chr> <dbl> <dbl>
#> 1 k1 AFD(1);AFD(2); 3.75 7.47
#> 2 k1 Acf(2); 5.37 7.98
#> 3 k2 ABC(7); 5.66 9.52
Here's a way going to a long format for exact matching (so no regex):
library(dplyr)
library(tidyr)
df1_long = df1 |> stack() |>
separate_rows(values, sep = ";") |>
filter(values != "")
df2 |>
mutate(id = row_number()) |>
separate_rows(site, sep = ";") |>
filter(site != "") |>
left_join(df1_long, by = c("site" = "values")) %>%
group_by(id) |>
filter(any(!is.na(ind))) %>%
summarize(
site = paste(site, collapse = ";"),
across(-site, \(x) first(na.omit(x)))
)
# # A tibble: 3 × 5
# id site p1 p2 ind
# <int> <chr> <dbl> <dbl> <fct>
# 1 1 AFD(1);AFD(2) 3.75 7.47 k1
# 2 2 Acf(2) 5.37 7.98 k1
# 3 5 ABC(7) 5.66 9.52 k2

Standard deviation of average events per ID in R

Background
I've got this dataset d:
d <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
stringsAsFactors=FALSE)
It's got 2 people (IDs) in it, and they each have some events.
The problem
I'm trying to get an average number (count) of events per person, along with a standard deviation for that average, all in one result (it can be a dataframe or not, doesn't matter).
In other words I'm looking for something like this:
| Mean | SD |
|------|------|
| 4.00 | 2.83 |
What I've tried
I'm not far off, I don't think -- it's just that I've got 2 separate pieces of code doing these calculations. Here's the mean:
d %>%
group_by(ID) %>%
summarise(event = length(event)) %>%
summarise(ratio = mean(event))
# A tibble: 1 x 1
ratio
<dbl>
1 4
And here's the SD:
d %>%
group_by(ID) %>%
summarise(event = length(event)) %>%
summarise(sd = sd(event))
# A tibble: 1 x 1
sd
<dbl>
1 2.83
But I when I try to pipe them together like so...
d %>%
group_by(ID) %>%
summarise(event = length(event)) %>%
summarise(ratio = mean(event)) %>%
summarise(sd = sd(event))
... I get an error:
Error in `h()`:
! Problem with `summarise()` column `sd`.
i `sd = sd(event)`.
x object 'event' not found
Any insight?
You have to put the last two calls to summarise() in the same call. The only remaining columns after summarise() will be those you named and the grouping columns, so after your second summarise, the event column no longer exists.
library(dplyr)
d <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
stringsAsFactors=FALSE)
d %>%
group_by(ID) %>%
# the next summarise will be within ID
summarise(event = length(event)) %>%
# this summarise is overall
summarise(sd = sd(event),
ratio = mean(event))
#> # A tibble: 1 × 2
#> sd ratio
#> <dbl> <dbl>
#> 1 2.83 4
The code is a bit confusing because you are renaming the event variable, and doing the first summarise() within groups and the second without grouping. This code would be a little easier to read and get the same result:
d %>%
count(ID) %>%
summarise(sd = sd(n),
ratio = mean(n))
Created on 2022-05-25 by the reprex package (v2.0.1)

dplyr::summarise with filtering inside

Inside of dplyr::summarise, how can I apply filters based on different rows than the one I'm summarising?
Example:
t = data.frame(
x = c(1,1,1,1,2,2,2,2,3,3, 3, 3),
y = c(1,2,3,4,5,6,7,8,9,10,11,12),
z = c(1,2,1,2,1,2,1,2,1,2, 1, 2)
)
t %>%
dplyr::group_by(x) %>%
dplyr::summarise(
mall = mean(y), # this should include all rows in each group
ma = mean(y), # this should only include rows where z == 1
mb = mean(y) # this should only include rows where z == 2
)
So, the problem here is to apply a summary function to one column, while filtering based on another, all within summarise.
One idea was double-grouping, so applying group_by on both x and z, but I don't want all summary columns to be based on double-grouping, some (like mall in the example above) should be based on single-grouping only.
One quick option would be to use ifelse to filter to the rows you need, make the rest missing and use the na.rm = T argument to ignore missing values, like the example below.
dplyr::group_by(x) %>%
dplyr::summarise(
mall = mean(y), # this should include all rows in each group
ma = mean(ifelse(z == 1, y, NA), na.rm = T), # this should only include rows where z == 1
mb = mean(ifelse(z == 2, y, NA), na.rm = T) # this should only include rows where z == 2
)
# A tibble: 3 x 4
x mall ma mb
<dbl> <dbl> <dbl> <dbl>
1 1 2.5 2 3
2 2 6.5 6 7
3 3 10.5 10 11
While the answer by #Colin H is certainly the way to go for this specific example, a more flexible way to approach this could be to work within the subsets of the first grouping operation. This could be implemented with dplyr::group_split plus a subsequent purrr::map_dfr, but there is also dplyr::group_modify to do this in one step.
Note this relevant sentence from the documentation of dplyr::group_modify:
Use group_modify() when summarize() is too limited, in terms of what you need to do and return for each group.
So here is a solution for the example provided above:
t = data.frame(
x = c(1,1,1,1,2,2,2,2,3,3, 3, 3),
y = c(1,2,3,4,5,6,7,8,9,10,11,12),
z = c(1,2,1,2,1,2,1,2,1,2, 1, 2)
)
t %>%
dplyr::group_by(x) %>%
dplyr::group_modify(function(x, ...) {
x %>% dplyr::mutate(
mall = mean(y)
) %>%
dplyr::group_by(z, mall) %>%
dplyr::summarise(
m = mean(y),
.groups = "drop"
)
}) %>%
dplyr::ungroup()
# A tibble: 6 x 4
x z mall m
<dbl> <dbl> <dbl> <dbl>
1 1 1 2.5 2
2 1 2 2.5 3
3 2 1 6.5 6
4 2 2 6.5 7
5 3 1 10.5 10
6 3 2 10.5 11
group_modify applies a function to each subset tibble after grouping by x. This function has two arguments:
The subset of the data for the group, exposed as .x.
The key, a tibble with exactly one row and columns for each grouping
variable, exposed as .y.
Within our function here we use mutate to cover the requested mall-case first. We do not need any further grouping for that, because that is already covered by the wrapping group_modify. Then we apply another group_by + summarise to cover the different iterations of z. Note that this solution is independent of the number of cases in z we want to consider. While the two cases in this example can be easily handled manually, this might change if there are more.
If the wide output format with individual columns for the cases in z is required, then you can further modify the output of my code with tidyr::pivot_wider.
Another option and perhaps a little more concise is via subsetting:
t %>%
group_by(x) %>%
summarise(mall = mean(y),
ma = mean(y[z == 1]),
mb = mean(y[z == 2]))
# A tibble: 3 x 4
x mall ma mb
* <dbl> <dbl> <dbl> <dbl>
1 1 2.5 2 3
2 2 6.5 6 7
3 3 10.5 10 11
Here is another generic way (just like group_modify) to perform custom filtering on a group data while summarizing. This uses dplyr's context dependent expression: cur_data(), which makes the current group's data available inside dplyr verbs like mutate/summary:
t %>%
dplyr::group_by(x) %>%
dplyr::summarize(
mall = mean(y),
ma = mean(cur_data() %>% as.data.frame() %>% filter(z == 1) %>% pull(y)),
mb = mean(cur_data() %>% as.data.frame() %>% filter(z == 2) %>% pull(y))
)
The benefit of using cur_data() is that you can perform any complex filtering or munging before returning the final summary. For more information refer to: https://dplyr.tidyverse.org/reference/context.html

How can I make row_number() and quosures work together in an R function?

I have results from a within-participants design with timeseries info about each trial. I want to reshuffle the conditions for some permutation testing. I need to write a function though and this is where I run into problems.
My data looks something like this:
library(tidyverse)
sampel <- expand.grid(s0 = 1:5, r0 = 1:12)
sampel <- sampel %>% mutate(c0 = rep(c('A', 'B'), 30))
sampel <- sampel %>%
group_by(s0, c0, r0) %>%
nest() %>%
mutate(t0 = map(data, function(t) seq(1:8)), v0 = map(data, function(v) seq(from = 0, by = runif(1), length.out = 8))) %>%
unnest(cols = c(data, t0, v0)) %>%
ungroup() %>%
mutate(s0 = paste('s', s0, sep = ''))
head(sampel, n = 12)
(if you have any pointers how I could go about displaying this example in a better way I would much appreciate it too)
So to add some context, it's results of a within-subjects study. s0 stands for participant, c0 for condition, r0 for trial number (run). t0 is a timepoint and v0 a value of interest at this timepoint.
I am trying to reshuffle conditions within-participants
resampledSampel <-
sampel %>%
group_by(s0, r0, c0) %>%
nest() %>%
group_by(s0) %>%
mutate(c1 = c0[sample(row_number())])
resampledSampel %>%
head(n = 12)
This works as hoped, but when I try to make a function:
resample_within <- function(df, subject, trial, condition) {
subject <- enquo(subject)
trial <- enquo(trial)
condition <- enquo(condition)
resampled <-
df %>%
group_by(!!subject, !!trial, !!condition) %>%
nest() %>%
group_by(!!subject) %>%
mutate(condition = !!condition[sample(row_number())]) %>%
unnest(data)
return(resampled)
}
resample_within(sampel, s0, r0, c0)
throws an error:
Error: row_number() should only be called in a data context
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Subsetting quosures with `[` is deprecated as of rlang 0.4.0
Please use `quo_get_expr()` instead.
This warning is displayed once per session.
Any idea how I can use mutate(condition = !!condition[sample(row_number())]) in the function? Or how I could do all this without dplyr (it makes me realise that I probably rely on dplyr a bit too much...)
Thank you in advance. And, also in advance, I apologise if the way I presented the question is not ideal (I will gladly take any pointers on how to better formulate questions on stack exchange too. For instance I can't seem to figure out how to display the data structure)
Very nice first question!
This is actually just a matter of operator precedence. When you call !!condition[sample(row_number())], it is interpreted as !!(condition[sample(row_number())]) i.e. you are trying to subset the quosure then apply double bang, but you mean (!!condition)[sample(row_number())], that is, you want to subset the result of the double-bang. So just apply brackets to fix the order of evaluation and it works as expected:
resample_within <- function(df, subject, trial, condition) {
subject <- enquo(subject)
trial <- enquo(trial)
condition <- enquo(condition)
resampled <-
df %>%
group_by(!!subject, !!trial, !!condition) %>%
nest() %>%
group_by(!!subject) %>%
mutate(condition = (!!condition)[sample(row_number())]) %>%
unnest(data)
return(resampled)
}
Now:
resample_within(sampel, s0, r0, c0)
#> # A tibble: 480 x 6
#> # Groups: s0 [5]
#> s0 r0 c0 t0 v0 condition
#> <chr> <int> <chr> <int> <dbl> <chr>
#> 1 s1 1 A 1 0 B
#> 2 s1 1 A 2 0.981 B
#> 3 s1 1 A 3 1.96 B
#> 4 s1 1 A 4 2.94 B
#> 5 s1 1 A 5 3.93 B
#> 6 s1 1 A 6 4.91 B
#> 7 s1 1 A 7 5.89 B
#> 8 s1 1 A 8 6.87 B
#> 9 s2 1 B 1 0 A
#> 10 s2 1 B 2 0.976 A
#> # ... with 470 more rows
We can use the curly-curly ({{}}) operator
library(dplyr)
library(tidyr)
resample_within <- function(df, subject, trial, condition) {
df %>%
group_by({{subject}}, {{trial}}, {{condition}}) %>%
nest() %>%
group_by({{subject}}) %>%
mutate(condition = ({{condition}})[sample(row_number())]) %>%
unnest(data)
}
resample_within(sampel, s0, r0, c0)

Return two objects from lapply

I have created a function which takes a little while to run (lots of crunching going on) and there are two distinct outputs that I need to return from this function. The inputs into these outputs are the same which is why I have combined them in the same function so that I don't have to crunch them twice, but the outputs are so entirely different in content and based on such entirely different calculations that there is no way to actually combine them into a one parse kinda statement. One object is tens of lines earlier than the other. But I need to return both, so I think it has to be in some type of format which mimics: store the two separate objects in a single list, lapply, then extract and rbind the two objects.
Any help on a solution to this would be appreciated - ideally not using a for loop or data.table. Dplyr solutions are fine.
Some dummy data:
df <- data.frame(ID = c(rep("A",10), rep("B", 10), rep("C", 10)),
subID = c(rep("U", 5),rep("V", 5),rep("W", 5),rep("X", 5),rep("Y", 5),rep("Z", 5)),
Val = c(1,6,3,8,6,5,2,4,7,20,4,2,3,5,7,3,2,5,7,12,5,3,7,1,6,1,34,9,5,3))
The function (again noting the function is much more complex than this, and I am calculating many more complex and unrelated things in each of the separate objects, not just the average!):
func <- function(x, df){
temp <- filter(df, ID == x)
average_id <- temp %>% group_by(ID) %>% summarise(avg = mean(Val))
average_subid <- temp %>% group_by(ID, subID) %>% summarise(avg = mean(Val))
df_list <- list(avgID=average_id, avgSubID=average_subid)
return(df_list)
}
Presently I have computed the results using this command, but am unsure whether this is correct or how to further extract the results after the objects are stored in this list (of lists) (i.e. I get stuck here):
result <- lapply(list("A","B","C"), func, df)
The result should look like:
> average_ID
ID avg
1 A 6.2
2 B 5.0
3 C 7.4
> average_subID
ID subID avg
1 A U 4.8
2 A V 7.6
3 B W 4.2
4 B X 5.8
5 C Y 4.4
6 C Z 10.4
I have previously used a for loop and stored the results in lists (i.e. avgListID[x] <- average_id, then binded together. But I don't think this is ideal.
Thanks in advance!
I realize this is a bit old, but since neither provided answer seems to have done the trick, how about this? Split the function into two, and run each within your lapply, returning a list of lists?
library(dplyr)
df <- data.frame(ID = c(rep("A",10), rep("B", 10), rep("C", 10)),
subID = c(rep("U", 5),rep("V", 5),rep("W", 5),rep("X", 5),rep("Y", 5),rep("Z", 5)),
Val = c(1,6,3,8,6,5,2,4,7,20,4,2,3,5,7,3,2,5,7,12,5,3,7,1,6,1,34,9,5,3))
subfunc1 <- function(temp){
return(temp %>% group_by(ID) %>% summarise(avg = mean(Val)))
}
subfunc2 <- function(temp){
return(temp %>% group_by(ID, subID) %>% summarise(avg = mean(Val)))
}
func <- function(x, df){
temp <- filter(df, ID == x)
df_list <- list(avgID=subfunc1(temp), avgSubID=subfunc2(temp))
return(df_list)
}
result <- lapply(list("A","B","C"), func, df)
To get the structure/order you need, transpose the lists as explained here:
n <- length(result[[1]]) # assuming all lists in result have the same length
result <- lapply(1:n, function(i) lapply(result, "[[", i))
> average_ID <- aggregate(df$Val, by = list(df$ID), FUN = mean)
>
> average_ID
Group.1 x
1 A 6.2
2 B 5.0
3 C 7.4
> average_subID <- aggregate(df$Val, by = list(df$ID,df$subID), FUN = mean)
>
> average_subID
Group.1 Group.2 x
1 A U 4.8
2 A V 7.6
3 B W 4.2
4 B X 5.8
5 C Y 4.4
6 C Z 10.4
What about returning a list where each element represents the averages at a specific grouping level. For example:
library(tidyverse)
fnc = function(groups=NULL, data=df) {
groups=as.list(groups)
data %>%
group_by_(.dots=groups) %>%
summarise(avg=mean(Val))
}
list(Avg_Overall=NULL, Avg_by_ID="ID", Avg_by_SubID=c("ID","subID")) %>%
map(~fnc(.x))
$Avg_Overall
# A tibble: 1 x 1
avg
<dbl>
1 6.2
$Avg_by_ID
# A tibble: 3 x 2
ID avg
<fctr> <dbl>
1 A 6.2
2 B 5.0
3 C 7.4
$Avg_by_SubID
# A tibble: 6 x 3
# Groups: ID [?]
ID subID avg
<fctr> <fctr> <dbl>
1 A U 4.8
2 A V 7.6
3 B W 4.2
4 B X 5.8
5 C Y 4.4
6 C Z 10.4
You could also just calculate the average by subID and then the average by ID can be calculated from that:
# Average by subID
avg = df %>% group_by(ID, subID) %>%
summarise(n = n(),
avg = mean(Val))
# Average by ID
avg %>%
group_by(ID) %>%
summarise(avg = sum(avg*n)/sum(n))
# Overall average
avg %>%
ungroup %>%
summarise(avg = sum(avg*n)/sum(n))

Resources