Aggregating the standard deviation and counting non-NAs in sparklyr

Aggregating the standard deviation and counting non-NAs in sparklyr - r

I have a large data.frame and I have been aggregating the summary statistics for numerous variables using the summarise in conjunction with across . Due to the size of my data.frame I have had to start processing my data in sparklyr.
As sparklyr does not support across I am using the summarise_each. This is working OK, except that summarise_each in sparklyr does not appear to support sd and sum(!is.na(.))
Below is an example dataset and how I would process it usually, using dplyr:
test <- data.frame(ID = c("Group1","Group1",'Group1','Group1','Group1','Group1','Group1',
"Group2","Group2","Group2",'Group2','Group2','Group2',"Group2",
"Group3","Group3","Group3"),
Value1 = c(-100,-10,-5,-5,-5,1,2,1,2,3,4,4,4,4,1,2,3),
Value2 = c(50,100,10,-5,3,1,2,2,2,3,4,4,4,4,1,2,3))
test %>%
group_by %>%
summarise(across((Value1:Value2), ~sum(!is.na(.), na.rm = TRUE), .names = "{col}_count"),
across((Value1:Value2), ~min(., na.rm = TRUE), .names = "{col}_min"),
across((Value1:Value2), ~max(., na.rm = TRUE), .names = "{col}_max"),
across((Value1:Value2), ~mean(., na.rm = TRUE), .names = "{col}_mean"),
across((Value1:Value2), ~sd(., na.rm = TRUE), .names = "{col}_sd"))
# A tibble: 1 x 10
Value1_count Value2_count Value1_min Value2_min Value1_max Value2_max Value1_mean Value2_mean Value1_sd Value2_sd
<int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 17 17 -100 -5 4 100 -5.53 11.2 24.7 25.8
I have also been able to successfully achieve the same answer using summarise_each as shown below:
test %>%
group_by(ID) %>%
summarise_each(funs(min = min(., na.rm = TRUE),
max = max(., na.rm = TRUE),
mean = mean(., na.rm = TRUE),
sum = sum(., na.rm = TRUE),
sd = sd(., na.rm = TRUE)))
ID Value1_min Value2_min Value1_max Value2_max Value1_mean Value2_mean Value1_sum Value2_sum
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Group1 -100 -5 2 100 -17.4 23 -122 161
2 Group2 1 2 4 4 3.14 3.29 22 23
3 Group3 1 1 3 3 2 2 6 6
When using sparklyr I have successfully been able to compute the min, max, mean, sum as shown below:
sc <- spark_connect(master = "local", version = "2.4.3")
test <- spark_read_csv(sc = sc, path = "C:\\path\\test space.csv")
test %>%
group_by(ID) %>%
summarise_each(funs(min = min(., na.rm = TRUE),
max = max(., na.rm = TRUE),
mean = mean(., na.rm = TRUE),
sum = sum(., na.rm = TRUE)))
# Source: spark<?> [?? x 9]
ID Value1_min Value_2_min Value1_max Value_2_max Value1_mean Value_2_mean Value1_sum Value_2_sum
<chr> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 Group2 1 2 4 4 3.14 3.29 22 23
2 Group3 1 1 3 3 2 2 6 6
3 Group1 -100 -5 2 100 -17.4 23 -122 161
But I get error messages when trying to obtain the sd and sum(!is.na(.)) Below is the code and error message I am receiving. Is there any work around to help aggregate these values?
test %>%
group_by(ID) %>%
summarise_each(funs(min = min(., na.rm = TRUE),
max = max(., na.rm = TRUE),
mean = mean(., na.rm = TRUE),
sum = sum(., na.rm = TRUE),
sd = sd(., na.rm = TRUE)))
Error: org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'AS' expecting ')'(line 1, pos 298)
== SQL ==
SELECT `ID`, MIN(`Value1`) AS `Value1_min`, MIN(`Value_2`) AS `Value_2_min`, MAX(`Value1`) AS `Value1_max`, MAX(`Value_2`) AS `Value_2_max`, AVG(`Value1`) AS `Value1_mean`, AVG(`Value_2`) AS `Value_2_mean`, SUM(`Value1`) AS `Value1_sum`, SUM(`Value_2`) AS `Value_2_sum`, stddev_samp(`Value1`, TRUE AS `na.rm`) AS `Value1_sd`, stddev_samp(`Value_2`, TRUE AS `na.rm`) AS `Value_2_sd`
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^^^
FROM `test_space_30172a44_c0aa_4305_9a5e_d45fa77ba0b9`
GROUP BY `ID`
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sparklyr.Invoke.invoke(invoke.scala:147)
at sparklyr.StreamHandler.handleMethodCall(stream.scala:136)
at sparklyr.StreamHandler.read(stream.scala:61)
at sparklyr.BackendHandler$$anonfun$channelRead0$1.apply$mcV$sp(handler.scala:58)
at scala.util.control.Breaks.breakable(Breaks.scala:38)
at sparklyr.BackendHandler.channelRead0(handler.scala:38)
at sparklyr.BackendHandler.channelRead0(handler.scala:14)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
In addition: Warning messages:
1: Named arguments ignored for SQL stddev_samp
2: Named arguments ignored for SQL stddev_samp

The problem is the na.rm parameter. Spark's stddev_samp function has no such parameter and sparklyr doesn't seem to handle it.
Missing values are always removed in SQL so you don't need to specify na.rm.
test_spark %>%
group_by(ID) %>%
summarise_each(funs(min = min(.),
max = max(.),
mean = mean(.),
sum = sum(.),
sd = sd(.)))
#> # Source: spark<?> [?? x 11]
#> ID Value1_min Value2_min Value1_max Value2_max Value1_mean Value2_mean
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Group2 1 2 4 4 3.14 3.29
#> 2 Group1 -100 -5 2 100 -17.4 23
#> 3 Group3 1 1 3 3 2 2
#> Value1_sum Value2_sum Value1_sd Value2_sd
#> <dbl> <dbl> <dbl> <dbl>
#> 1 22 23 1.21 0.951
#> 2 -122 161 36.6 38.6
#> 3 6 6 1 1
This looks like a bug specific to summarise as sd with na.rm works fine with mutate.
test_spark %>%
group_by(ID) %>%
mutate_each(funs(sd = sd(., na.rm = TRUE)))
For sum(!is.na(.)), you just need to write it as sum(ifelse(is.na(.), 0, 1)).

Related

Create parameterized summaries of a column

I have a tibble and I want create several summaries of the same column, specifically the first, second and third quartiles.
To do it, I create a named list of functions and that works fine.
library("tidyverse")
set.seed(1234)
df <- tibble(x = rnorm(100))
df %>%
summarise(
across(x,
list(
Q1 = ~ quantile(., 1 / 4),
Q2 = ~ quantile(., 2 / 4),
Q3 = ~ quantile(., 3 / 4)
),
.names = "{.fn}"
)
)
#> # A tibble: 1 × 3
#> Q1 Q2 Q3
#> <dbl> <dbl> <dbl>
#> 1 -0.895 -0.385 0.471
Can I achieve this by specifying the list of probabilities to pass to quantile? So that I save myself typing and more importantly avoid hard-coding the arguments to pass to the aggregating function.
The following doesn't work because it creates one row per probability rather than one column.
df %>%
summarise(
across(x, quantile, 1:3 / 4)
)
#> # A tibble: 3 × 1
#> x
#> <dbl>
#> 1 -0.895
#> 2 -0.385
#> 3 0.471

you're almost here
df <- tibble(x = rnorm(100))
df %>%
summarise(
across(x,
map(1:3, ~partial(quantile, probs=./4)),
.names = "Q{.fn}"
)
)
# A tibble: 1 x 3
Q1 Q2 Q3
<dbl> <dbl> <dbl>
1 -0.579 0.0815 0.475

If you define the quantiles like this:
Q <- c(0.25, 0.5, 0.75)
Then the following code will produce columns of the appropriate quantiles with sensible labels:
df %>%
summarise(
across(x,
setNames( lapply(Q,
function(x) { f <- ~quantile(., b); f[2][[1]][[3]] <- x; f }),
paste("Q", round(100 * Q), sep = "_")),
.names = "{.fn}"
)
)
#> # A tibble: 1 x 3
#> Q_25 Q_50 Q_75
#> <dbl> <dbl> <dbl>
#> 1 -0.895 -0.385 0.471
Created on 2022-06-29 by the reprex package (v2.0.1)

Dplyr Summarise Groups as Column Names

I got a data frame with a lot of columns and want to summarise them with multiple functions.
test_df <- data.frame(Group = sample(c("A", "B", "C"), 10, T), var1 = sample(1:5, 10, T), var2 = sample(3:7, 10, T))
test_df %>%
group_by(Group) %>%
summarise_all(c(Mean = mean, Sum = sum))
# A tibble: 3 x 5
Group var1_Mean var2_Mean var1_Sum var2_Sum
<chr> <dbl> <dbl> <int> <int>
1 A 3.14 5.14 22 36
2 B 4.5 4.5 9 9
3 C 4 6 4 6
This results in a tibble with the first row Group and column names with a combination of the previous column name and the function name.
The desired result is a table with the previous column names as first row and the groups and functions in the column names.
I can achive this with
test_longer <- test_df %>% pivot_longer(cols = starts_with("var"), names_to = "var", values_to = "val")
# Add row number because spread needs unique identifiers for rows
test_longer <- test_longer %>%
group_by(Group) %>%
mutate(grouped_id = row_number())
spread(test_longer, Group, val) %>%
select(-grouped_id) %>%
group_by(var) %>%
summarise_all(c(Mean = mean, Sum = sum), na.rm = T)
# A tibble: 2 x 7
var A_Mean B_Mean C_Mean A_Sum B_Sum C_Sum
<chr> <dbl> <dbl> <dbl> <int> <int> <int>
1 var1 3.14 4.5 4 22 9 4
2 var2 5.14 4.5 6 36 9 6
But this seems to be a rather long detour... There probably is a better way, but I could not find it. Any suggestions? Thank you

There's lots of ways to go about it, but I would simplify it by pivoting to a longer data frame initially, and then grouping by var and group. Then you can just pivot wider to get the final result you want. Note that I used summarize(across()) which replaces the deprecated summarize_all(), even though with a single column could've just manually specified Mean = ... and Sum = ....
set.seed(123)
test_df %>%
pivot_longer(
var1:var2,
names_to = "var"
) %>%
group_by(Group, var) %>%
summarize(
across(
everything(),
list(Mean = mean, Sum = sum),
.names = "{.fn}"
),
.groups = "drop"
) %>%
pivot_wider(
names_from = "Group",
values_from = c(Mean, Sum),
names_glue = "{Group}_{.value}"
)
#> # A tibble: 2 × 7
#> var A_Mean B_Mean C_Mean A_Sum B_Sum C_Sum
#> <chr> <dbl> <dbl> <dbl> <int> <int> <int>
#> 1 var1 1 2.5 3.2 1 10 16
#> 2 var2 5 4.5 4.4 5 18 22

dplyr "weighted sum" and across()

I have already asked a similar question to this here with the answer below. I wanted to aggregate my dataframe by "number" and calculate a weighted mean. Now I would like to do a weighted sum but somehow I cannot find out how to apply a weighted sum to my dataframe. The weighted.sum function doesn no longer work for my R version.
df = data.frame(number=c("a","a","a","b","c","c"), y=c(1,2,3,4,1,7),
z=c(2,2,6,8,9,1), weight =c(1,1,3,1,2,1))
df %>%
group_by(number) %>%
summarise(across(c(y, z),
list( mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE),
weighted = ~weighted.mean(., w = weight))), .groups = 'drop')

We could use
library(dplyr)
df %>%
group_by(number) %>%
summarise(across(c(y, z),
list( mean = ~mean(., na.rm = TRUE),
sd = ~sd(., na.rm = TRUE),
weighted = ~weighted.mean(., w = weight),
weightedsum = ~ sum(. * weight)), .groups = 'drop'))
# A tibble: 3 x 9
# number y_mean y_sd y_weighted y_weightedsum z_mean z_sd z_weighted z_weightedsum
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 a 2 1 2.4 12 3.33 2.31 4.4 22
#2 b 4 NA 4 4 8 NA 8 8
#3 c 4 4.24 3 9 5 5.66 6.33 19

Superfluous columns returned by dplyr::summarise() function

I am having some trouble with the new dplyr::summarise() function
Here is the data
df <- data.frame(id = factor(1:10),
group = factor(rep(letters[1:2],each = 5)),
w1 = rnorm(10),
w2 = rnorm(10),
w3 = rnorm(10),
dummy = as.character(LETTERS[1:10]),
stringsAsFactors = F)
Now I want to get means and standard deviations for the numeric variables only. So I ran the following code
df %>%
dplyr::select(id, group, w1:w3) %>%
group_by(group) %>%
dplyr::summarise(across(where(is.numeric), ~ mean(.x, na.rm = T), .names = "mean_{col}"),
across(where(is.numeric), ~ sd(.x, na.rm = T), .names = "sd_{col}"),
count = n())
Which gives me the following output
# A tibble: 2 x 11
# group mean_w1 mean_w2 mean_w3 sd_w1 sd_w2 sd_w3 sd_mean_w1 sd_mean_w2 sd_mean_w3 count
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# a -0.399 0.152 -0.151 1.07 0.703 1.15 NA NA NA 5
# b 0.560 -0.107 -0.0439 1.18 0.612 0.862 NA NA NA 5
Now the columns starting with mean_ and sd_ are exactly what I want, but I'm also getting this set of sd_mean_ columns, I assume because it is trying to find the sd of the new mean_ columns.
How do I get the output without the superfluous columns?

The issue is when you go to second across the number of numeric columns have increased, so it applies sd function to the new columns as well. To avoid this apply multiple function in the same across using list().
library(dplyr)
df %>%
group_by(group) %>%
summarise(across(where(is.numeric), list(mean = ~mean(., na.rm = TRUE),
sd = ~sd(., na.rm = TRUE)),
.names = "{fn}_{col}"),
count = n())
# group mean_w1 sd_w1 mean_w2 sd_w2 mean_w3 sd_w3 count
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#1 a 0.0746 0.696 0.760 1.39 0.0530 1.29 5
#2 b 0.522 0.686 0.0979 0.566 -0.0133 1.12 5
Also, your attempt would work as expected if you don't select columns by their type :
df %>%
group_by(group) %>%
summarise(across(w1:w3, ~ mean(.x, na.rm = T), .names = "mean_{col}"),
across(w1:w3, ~ sd(.x, na.rm = T), .names = "sd_{col}"),
count = n())

How can you add group percentages to tables using the gt( ) package?

In a separate post I outline a method for adding overall percentages to a table in the gt( ) package (How can you automate the addition of overall percentages to the row_summary in the gt( ) package?) The solution I identified involved a separate invocation of the row_summary( ) function for each overall row percentage being added. But even this rather clunky solution doesn't work if applied to overall group percentages, as illustrated through the worked example below. Solutions?
# Create baseline data
set.seed(1)
df <- tibble(some_letter = sample(letters, size = 10, replace = FALSE),
some_group = sample(c("A", "B"), size = 10, replace = TRUE),
num1 = sample(100:200, size = 10, replace = FALSE),
num2 = sample(100:200, size = 10, replace = FALSE),
n = num1 + num2) %>%
mutate(across(starts_with("num"), ~(.x)/(n), .names = "pct_{col}"))
> df
# A tibble: 10 x 7
some_letter some_group num1 num2 n pct_num1 pct_num2
<chr> <chr> <int> <int> <int> <dbl> <dbl>
1 g A 194 148 342 0.567 0.433
2 j A 121 159 280 0.432 0.568
3 n B 164 200 364 0.451 0.549
4 u A 112 118 230 0.487 0.513
5 e B 125 180 305 0.410 0.590
6 s A 137 164 301 0.455 0.545
7 w B 101 175 276 0.366 0.634
8 m B 135 110 245 0.551 0.449
9 l A 180 167 347 0.519 0.481
10 b B 131 137 268 0.489 0.511
# Target: the weighted group percentages to be added to the table in gt( )
df %>% group_by(some_group) %>%
summarise_at(vars(num1, num2, n), funs(sum)) %>%
mutate(across(starts_with("num"), ~(.x)/(n), .names = "pct_{col}"))
# A tibble: 2 x 6
some_group num1 num2 n pct_num1 pct_num2
<chr> <int> <int> <int> <dbl> <dbl>
1 A 744 756 1500 0.496 0.504
2 B 656 802 1458 0.450 0.550
# Create table in gt( ), attempting to use the summary_rows( ) function to pass
# group-specific percentages for pct_num1, the result of which is that the last
# passed value is recycled across all groups...
gt(df, groupname_col = "some_group", rowname_col="some_letter") %>%
summary_rows(groups = TRUE, columns = vars(num1, num2, n), fns = list( TOTAL = "sum" ) ) %>%
summary_rows(groups = TRUE,
columns = vars(pct_num1),
fns = list(TOTAL = ~ c(0.493,0.454) )
)
Output from gt( )

As I answered in your other question "How can you automate the addition of overall percentages to the row summary in gt() package?", package gt allows you to control cell by cell all information shown in summary rows. The disadvantage is that the code for the table becomes pretty verbose.
I've used a shorter example than yours, for the sake of space, but the solution can be applied to your question
library(dplyr)
library(gt)
df2_ex <- tribble(
~some_letter, ~some_group, ~num1, ~num2,
"c" , "A", 1, 2,
"d" , "A", 3, 4,
"x" , "B", 5, 6,
"y" , "B", 7, 8
) %>%
rowwise() %>%
mutate(pct_num1 = num1 / sum(c_across(starts_with("num"))),
pct_num2 = num2 / sum(c_across(starts_with("num"))))
df2_ex
#> # A tibble: 4 x 6
#> # Rowwise:
#> some_letter some_group num1 num2 pct_num1 pct_num2
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 c A 1 2 0.333 0.667
#> 2 d A 3 4 0.429 0.571
#> 3 x B 5 6 0.455 0.545
#> 4 y B 7 8 0.467 0.533
The summary rows for the grouped table based on some_group column will read
df2_ex_grouped <- df2_ex %>%
group_by(some_group) %>%
summarise_at(vars(num1, num2), sum) %>%
rowwise() %>%
mutate(pct_num1 = num1 / sum(c_across(starts_with("num"))),
pct_num2 = num2 / sum(c_across(starts_with("num"))))
df2_ex_grouped
#> # A tibble: 2 x 5
#> # Rowwise:
#> some_group num1 num2 pct_num1 pct_num2
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 A 4 6 0.4 0.6
#> 2 B 12 14 0.462 0.538
Finally, I've included a grand summary using the same methodology for the sake of completeness
df2_ex_total <- df2_ex %>%
ungroup() %>%
summarise_at(vars(num1, num2), sum) %>%
rowwise() %>%
mutate(pct_num1 = num1 / sum(c_across(starts_with("num"))),
pct_num2 = num2 / sum(c_across(starts_with("num"))))
df2_ex_total
#> # A tibble: 1 x 4
#> # Rowwise:
#> num1 num2 pct_num1 pct_num2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 16 20 0.444 0.556
The code to get the table you wanted is shown below. Note that I used two ways to identify the value that should appear in the right cell of the summary row:
Using base R to get the value from df2_ex_grouped
Using pull()
Pick the one you prefer.
The piece that was missing in your code was to specify which value of the some_groups column you were applying the summary_rows function instead of using groups = TRUE.
Hope this answer solve your question.
df2_ex %>%
gt(groupname_col = "some_group", rowname_col="some_letter") %>%
summary_rows(groups = TRUE, columns = vars(num1, num2), fns = list(TOTAL = "sum"),
formatter = fmt_number, decimals = 0) %>%
summary_rows(groups = TRUE, columns = vars(num1, num2), fns = list(TOTAL = "sum"),
formatter = fmt_number, decimals = 0) %>%
summary_rows(groups = "A", columns = vars(pct_num1),
fns = list(TOTAL = ~ df2_ex_grouped$pct_num1[1]),
formatter = fmt_number, decimals = 4) %>%
summary_rows(groups = "A", columns = vars(pct_num2),
fns = list(TOTAL = ~ df2_ex_grouped$pct_num2[1]),
formatter = fmt_number, decimals = 4) %>%
summary_rows(groups = "B", columns = vars(pct_num1),
fns = list(TOTAL = ~ df2_ex_grouped$pct_num1[2]),
formatter = fmt_number, decimals = 4) %>%
summary_rows(groups = "B", columns = vars(pct_num2),
fns = list(TOTAL = ~ (
df2_ex_grouped %>%
filter(some_group == "B") %>%
select(pct_num2) %>%
pull())),
formatter = fmt_number, decimals = 4) %>%
grand_summary_rows(columns = vars(num1, num2), fns = list(`grand TOTAL` = "sum"),
formatter = fmt_number, decimals = 0) %>%
grand_summary_rows(columns = vars(pct_num1),
fns = list(
`grand TOTAL` = ~ (df2_ex_total$pct_num1)),
formatter = fmt_number, decimals = 3) %>%
grand_summary_rows(columns = vars(pct_num2),
fns = list(
`grand TOTAL` = ~ (df2_ex_total$pct_num2)),
formatter = fmt_number, decimals = 3)
Created on 2020-11-14 by the reprex package (v0.3.0)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Aggregating the standard deviation and counting non-NAs in sparklyr - r

Related

Create parameterized summaries of a column

Dplyr Summarise Groups as Column Names

dplyr "weighted sum" and across()

Superfluous columns returned by dplyr::summarise() function

How can you add group percentages to tables using the gt( ) package?

Categories

Resources