I have a dataframe that looks like this
df <- data.frame(Region = c("Asia","Asia","Africa","Europe","Europe"),
Emp = c(120,40,10,67,110),
Sales18 = c(12310, 4510, 1140, 5310, 16435),
Sales19 = c(15670, 6730, 1605, 6120, 1755))
I am running a code where I group by region and then take average and weighted average for all 'sales' columns by 'Emp'
Result <- df %>% group_by(Region) %>%
summarise(sales18 = mean(Sales18, na.rm = T),
sales19 = mean(Sales19, na.rm = T),
weightedsales18 = weighted.mean(Sales18, .data[[Emp]], na.rm = T),
weightedsales19 = weighted.mean(Sales19, .data[[Emp]], na.rm = T))
However, I get the following err
Error in splice(dot_call(capture_dots, frame_env = frame_env, named = named, :
object 'Emp' not found
Can't figure out what I am doing wrong
An option could be:
library(tidyverse)
df <- data.frame(Region = c("Asia","Asia","Africa","Europe","Europe"),
Emp = c(120,40,10,67,110),
Sales18 = c(12310, 4510, 1140, 5310, 16435),
Sales19 = c(15670, 6730, 1605, 6120, 1755))
df %>%
group_by(Region) %>%
summarise(across(
.cols = starts_with("Sales"),
.fns = list(w_mean = ~ weighted.mean(.x, w = Emp), mean = ~ mean(.x)),
.names = "{.col}_{.fn}")
)
#> # A tibble: 3 x 5
#> Region Sales18_w_mean Sales18_mean Sales19_w_mean Sales19_mean
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Africa 1140 1140 1605 1605
#> 2 Asia 10360 8410 13435 11200
#> 3 Europe 12224. 10872. 3407. 3938.
Created on 2021-05-25 by the reprex package (v2.0.0)
This works. The data masking already takes place, you don't need the .data pronoun.
library(tidyverse)
df <- data.frame(Region = c("Asia","Asia","Africa","Europe","Europe"),
Emp = c(120,40,10,67,110),
Sales18 = c(12310, 4510, 1140, 5310, 16435),
Sales19 = c(15670, 6730, 1605, 6120, 1755))
Result <- df %>% group_by(Region) %>%
summarise(sales18 = mean(Sales18, na.rm = T),
sales19 = mean(Sales19, na.rm = T),
weightedsales18 = weighted.mean(Sales18, Emp, na.rm = T),
weightedsales19 = weighted.mean(Sales19, Emp, na.rm = T))
Result
#> # A tibble: 3 x 5
#> Region sales18 sales19 weightedsales18 weightedsales19
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Africa 1140 1605 1140 1605
#> 2 Asia 8410 11200 10360 13435
#> 3 Europe 10872. 3938. 12224. 3407.
Created on 2021-05-25 by the reprex package (v2.0.0)
Unquoted Emp inside [[ tells R to search for string variable called Emp that presumably contains name of other variable that contains weights, like here:
df <- data.frame(Region = c("Asia","Asia","Africa","Europe","Europe"),
x = c(120,40,10,67,110),
Sales18 = c(12310, 4510, 1140, 5310, 16435),
Sales19 = c(15670, 6730, 1605, 6120, 1755))
Emp <- 'x'
df %>% group_by(Region) %>%
summarise(sales18 = mean(Sales18, na.rm = T),
sales19 = mean(Sales19, na.rm = T),
weightedsales18 = weighted.mean(Sales18, .data[[Emp]], na.rm = T),
weightedsales19 = weighted.mean(Sales19, .data[[Emp]], na.rm = T))
# A tibble: 3 x 5
Region sales18 sales19 weightedsales18 weightedsales19
<chr> <dbl> <dbl> <dbl> <dbl>
1 Africa 1140 1605 1140 1605
2 Asia 8410 11200 10360 13435
3 Europe 10872. 3938. 12224. 3407.
Since, you do not have this kind of Emp, R throws an error.
What to do? Just quote Emp inside [[:
df <- data.frame(Region = c("Asia","Asia","Africa","Europe","Europe"),
Emp = c(120,40,10,67,110),
Sales18 = c(12310, 4510, 1140, 5310, 16435),
Sales19 = c(15670, 6730, 1605, 6120, 1755))
df %>% group_by(Region) %>%
summarise(sales18 = mean(Sales18, na.rm = T),
sales19 = mean(Sales19, na.rm = T),
weightedsales18 = weighted.mean(Sales18, .data[['Emp']], na.rm = T),
weightedsales19 = weighted.mean(Sales19, .data[['Emp']], na.rm = T))
# A tibble: 3 x 5
Region sales18 sales19 weightedsales18 weightedsales19
<chr> <dbl> <dbl> <dbl> <dbl>
1 Africa 1140 1605 1140 1605
2 Asia 8410 11200 10360 13435
3 Europe 10872. 3938. 12224. 3407.
Related
This question is very similar to this one Error in UseMethod("select") : no applicable method for 'select' applied to an object of class "character", what is different is a few things in the return_coef function and it's also that I'm filtering the function by Id, date and Category, and in the answered question, it was just by date and Category.
You will see below that I can generate the coefficients for each Id/date/Category, but when I ask to do it for everyone at once, I get the following error:
Error: Problem with `mutate()` column `coef`.
i `coef = mapply(...)`.
x negative length vectors are not allowed
Executable code:
library(dplyr)
library(tidyverse)
library(lubridate)
library(data.table)
df1<- structure(
list(
Id = c(1, 1, 1, 1),
date1 = c("2022-01-06","2022-01-06","2022-01-06","2022-01-06"),
date2 = c("2022-01-02","2022-01-03","2022-01-09","2022-01-10"),
Week = c("Sunday","Monday","Sunday","Monday"),
Category = c("EFG", "ABC","EFG","ABC"),
DR1 = c(200, 300, 200, 200),
DRM01 = c(300, 300, 300, 300),
DRM02 = c(300, 300, 300, 300),
DRM03 = c(300,300,300,300),
DRM04 = c(300,300,300,300),
DRM05 = c(300,250,350,350)),row.names = c(NA, 4L), class = "data.frame")
return_coef <- function (df1, idd, dmda, CategoryChosse) {
selection = startsWith(names(df1), "DRM0")
df1[selection][is.na(df1[selection])] = 0
data1<-subset(df1,df1$date2<df1$date1)
dt1 <- as.data.table(data1)
cols <- grep("^DRM0", colnames(dt1), value = TRUE)
med <-
dt1[, (paste0(cols, "_PV")) := DR1 - .SD, .SDcols = cols
][, lapply(.SD, median), by = .(Id, Category, Week), .SDcols = paste0(cols, "_PV") ]
SPV<-df1%>%
inner_join(med, by = c('Id','Category', 'Week')) %>%
mutate(across(matches("^DRM0\\d+$"), ~.x +
get(paste0(cur_column(), '_PV')),
.names = '{col}_{col}_PV')) %>%
select(Id:Category, DRM01_DRM01_PV:last_col())%>%
data.frame()
mat1 <- df1 %>%
dplyr::filter(Id==idd, date2 == ymd(dmda), Category == CategoryChosse) %>%
select(starts_with("DRM0")) %>%
pivot_longer(cols = everything()) %>%
arrange(desc(row_number())) %>%
mutate(cs = cumsum(value)) %>%
dplyr::filter(cs == 0) %>%
pull(name)
(dropnames <- paste0(mat1,"_",mat1, "_PV"))
SPV <- SPV %>%
filter(Id==idd,date2 == ymd(dmda), Category == CategoryChosse) %>%
select(-any_of(dropnames))
if(length(grep("DRM0", names(SPV))) == 0) {
SPV[head(mat1,10)] <- NA_real_
}
datas <-SPV %>%
dplyr::filter(Id==idd,date2 == ymd(dmda)) %>%
group_by(Category) %>%
dplyr::summarize(dplyr::across(starts_with("DRM0"), sum)) %>%
pivot_longer(cols= -Category, names_pattern = "DRM0(.+)", values_to = "val") %>%
mutate(name = readr::parse_number(name))
colnames(datas)[-1]<-c("var1","var2")
datas$days <- datas[[as.name("var1")]]
datas$numbers <- datas[[as.name("var2")]]
datas <- datas %>%
group_by(Category) %>%
slice((as.Date(dmda) - min(as.Date(df1$date1) [
df1$Category == first(Category)])):max(days)+1) %>%
ungroup
m<-df1 %>%
group_by(Id,Category,Week) %>%
dplyr::summarize(dplyr::across(starts_with("DR1"), mean), .groups = 'drop')
m<-subset(m, Week == df1$Week[match(ymd(dmda), ymd(df1$date2))] & Category == CategoryChosse)$DR1
if (nrow(datas)<=2){
val<-as.numeric(m)
}
else{
mod <- nls(numbers ~ b1*days^2+b2,start = list(b1 = 0,b2 = 0),data = datas, algorithm = "port")
coef<-coef(mod)[2]
val<-as.numeric(coef(mod)[2])
}
return(val)
}
Find the coef one by one (It works):
return_coef(df1,"1","2022-01-09","EFG")
[1] 200
return_coef(df1,"1","2022-01-10","ABC")
[1] 250
Find all coef at once (Does not work)
subset_df1 <- subset(df1, date2 > date1)
All<-subset_df1%>%
transmute(
Id,date2,Category,
coef = mapply(return_coef, list(cur_data()), Id, as.Date(date2), Category))
Error: Problem with `mutate()` column `coef`.
i `coef = mapply(...)`.
x negative length vectors are not allowed
dput(head(df1))
structure(list(Id = c(1, 1, 1, 1), date1 = structure(c(1641427200,
1641427200, 1641427200, 1641427200), tzone = "UTC", class = c("POSIXct",
"POSIXt")), date2 = structure(c(1641081600, 1641168000, 1641686400,
1641772800), tzone = "UTC", class = c("POSIXct", "POSIXt")),
Week = c("Sunday", "Monday", "Sunday", "Monday"), Category = c("EFG",
"ABC", "EFG", "ABC"), DR1 = c(200, 300, 200, 200), DRM01 = c(300,
300, 300, 300), DRM02 = c(300, 300, 300, 300), DRM03 = c(300,
300, 300, 300), DRM04 = c(300, 300, 300, 300), DRM05 = c(300,
250, 350, 350)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
The function seems to be use df1 as the full data
library(dplyr)
subset_df1 %>%
rowwise %>%
mutate(coef = return_coef(df1, Id, date2, Category)) %>%
ungroup
-output
# A tibble: 2 × 12
Id date1 date2 Week Category DR1 DRM01 DRM02 DRM03 DRM04 DRM05 coef
<dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2022-01-06 2022-01-09 Sunday EFG 200 300 300 300 300 350 200
2 1 2022-01-06 2022-01-10 Monday ABC 200 300 300 300 300 350 250
Update
In the new dataset, the columns 'date1', 'date2' are POSIXct, convert to Date class and it should work
df2 <- df1 %>%
mutate(across(c(date1, date2), as.Date))
subset_df2 <- subset(df2, date2 > date1)
subset_df2 %>%
rowwise %>%
mutate(coef = return_coef(df2, Id, date2, Category)) %>%
ungroup
# A tibble: 2 × 12
Id date1 date2 Week Category DR1 DRM01 DRM02 DRM03 DRM04 DRM05 coef
<dbl> <date> <date> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2022-01-06 2022-01-09 Sunday EFG 200 300 300 300 300 350 200
2 1 2022-01-06 2022-01-10 Monday ABC 200 300 300 300 300 350 250
Or using pmap
library(purrr)
subset_df1 %>%
transmute(Id, date2, Category,
coeff = pmap_dbl(across(c(Id, date2, Category)),
~ return_coef(df1, ..1, ..2, ..3)))
-output
Id date2 Category coeff
3 1 2022-01-09 EFG 200
4 1 2022-01-10 ABC 250
This question already has answers here:
Summarizing multiple columns with dplyr? [duplicate]
(5 answers)
Closed 1 year ago.
I am trying to calculate median mean for group of columns but its calculating only for one column. what i am doing wrong here ...??
df <- data.frame(Name = c("ABC", "DCA", "GOL",NA, "MNA",NA, "VAN"),
Goal =c("published", "pending", "not designed",NA, "pending", "pending", "not designed"),
Target_1 = c(3734, 2639, 2604, NA, 2793, 2688, 2403),
Target_2 = c(3322, 2016, 2310, NA, 3236, 3898, 2309),
Target_3 = c(3785, 2585, 3750, NA, 2781, 3589, 2830))
df_summary <- df %>% select(contains("Target")) %>% summarise(
q25 = round(quantile(., type=6, probs = seq(0, 1, 0.25), na.rm=TRUE)[2],digits = 0),
Median = round(quantile(., type=6, probs = seq(0, 1, 0.25), na.rm=TRUE)[3],digits = 0),
Mean = round( mean(., na.rm=TRUE),digits = 0),
q75 = round(quantile(., type=6, probs = seq(0, 1, 0.25), na.rm=TRUE)[4],digits = 0),
N = sum(!is.na(.)))
Use across to apply a function to multiple columns.
library(dplyr)
library(tidyr)
df %>%
summarise(across(contains("Target"), list(
q25 = ~round(quantile(., type=6, probs = 0.25, na.rm=TRUE),digits = 0),
Median = ~round(quantile(., type=6, probs = 0.5, na.rm=TRUE),digits = 0),
Mean = ~round( mean(., na.rm=TRUE),digits = 0),
q75 = ~round(quantile(., type=6, probs = 0.75, na.rm=TRUE),digits = 0),
N = ~sum(!is.na(.)))))
# Target_1_q25 Target_1_Median Target_1_Mean Target_1_q75 Target_1_N Target_2_q25
#1 2554 2664 2810 3028 6 2236
# Target_2_Median Target_2_Mean Target_2_q75 Target_2_N Target_3_q25 Target_3_Median
#1 2773 2848 3466 6 2732 3210
# Target_3_Mean Target_3_q75 Target_3_N
#1 3220 3759 6
Or maybe long format is a better way to display the values.
df %>%
pivot_longer(cols = contains("Target")) %>%
group_by(name) %>%
summarise( q25 = round(quantile(value, type=6, probs = 0.25, na.rm=TRUE),digits = 0),
Median = round(quantile(value, type=6, probs = 0.5, na.rm=TRUE),digits = 0),
Mean = round( mean(value, na.rm=TRUE),digits = 0),
q75 = round(quantile(value, type=6, probs = 0.75, na.rm=TRUE),digits = 0),
N = sum(!is.na(value)))
# name q25 Median Mean q75 N
# <chr> <dbl> <dbl> <dbl> <dbl> <int>
#1 Target_1 2554 2664 2810 3028 6
#2 Target_2 2236 2773 2848 3466 6
#3 Target_3 2732 3210 3220 3759 6
Using map:
df %>%
select(contains('Target'))%>%
map_dfr(~c(quantile(.x, type=6, probs = c(.25, .5,.75), na.rm = TRUE),
mean = mean(.x, na.rm = TRUE),
N = length(na.omit(.x))), .id = 'grp')
grp `25%` `50%` `75%` mean N
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Target_1 2554. 2664. 3028. 2810. 6
2 Target_2 2236. 2773 3466 2848. 6
3 Target_3 2732 3210. 3759. 3220 6
Whatever you are doing seems like a summary:
df %>%
select(contains('Target'))%>%
summary()
Another way could be:
df %>%
summarise(across(contains('Target'),
~list(quantile(.x, type=6, probs = c(.25, .5,.75), na.rm = TRUE),
mean(.x, na.rm = TRUE),
length(na.omit(.x))))
)%>%
unnest(everything())
A tibble: 5 x 3
Target_1 Target_2 Target_3
<dbl> <dbl> <dbl>
1 2554. 2236. 2732
2 2664. 2773 3210.
3 3028. 3466 3759.
4 2810. 2848. 3220
5 6 6 6
If you were to include pivoting:
df %>%
pivot_longer(contains('Target')) %>%
group_by(name) %>%
summarise(a = list(quantile(value, type=6, probs = c(.25, .5,.75), na.rm = TRUE)),
mean = mean(value, na.rm = TRUE), N = length(na.omit(value)))%>%
unnest_wider(a)
# A tibble: 3 x 6
name `25%` `50%` `75%` mean N
<chr> <dbl> <dbl> <dbl> <dbl> <int>
1 Target_1 2554. 2664. 3028. 2810. 6
2 Target_2 2236. 2773 3466 2848. 6
3 Target_3 2732 3210. 3759. 3220 6
Assume a company that we have info about Total sales and the amount of sales in three counties CA , TX and WI.
How can i calculate : the average sales contribution of the three states of total company sales
I need furthermore to find : the same average percentages for each year, month of the year and day of the week.
EDITED !!!
structure(list(CA = c(11047, 9925, 11322, 12251, 16610, 14696
), TX = c(7381, 5912, 9006, 6226, 9440, 9376), WI = c(6984, 3309,
8883, 9533, 11882, 8664), Total = c(25412, 19146, 29211, 28010,
37932, 32736), date = structure(c(1296518400, 1296604800, 1296691200,
1296777600, 1296864000, 1296950400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), event_type = c("NA", "NA", "NA", "NA", "NA", "Sporting"
), snap_CA = c(1, 1, 1, 1, 1, 1), snap_TX = c(1, 0, 1, 0, 1,
1), snap_WI = c(0, 1, 1, 0, 1, 1)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
if I understood your problem correctly a possible solution would be this:
library(dplyr)
library(lubridate)
df1 <- df %>%
dplyr::mutate(YEAR = lubridate::year(date),
MONTH = lubridate::month(date),
WEEKDAY = lubridate::wday(date),
P_CA = CA / Total,
P_TX = TX / Total,
P_WI = WI / Total)
# Average per Year
df1 %>%
dplyr::group_by(YEAR) %>%
dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
AV_TX = mean(P_TX, na.rm = TRUE),
AV_WI = mean(P_WI, na.rm = TRUE))
YEAR AV_CA AV_TX AV_WI
<dbl> <dbl> <dbl> <dbl>
1 2011 0.444 0.278 0.278
# Average per Month
df1 %>%
dplyr::group_by(MONTH) %>%
dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
AV_TX = mean(P_TX, na.rm = TRUE),
AV_WI = mean(P_WI, na.rm = TRUE))
MONTH AV_CA AV_TX AV_WI
<dbl> <dbl> <dbl> <dbl>
1 2 0.444 0.278 0.278
# Average per Weekday
df1 %>%
dplyr::group_by(WEEKDAY) %>%
dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
AV_TX = mean(P_TX, na.rm = TRUE),
AV_WI = mean(P_WI, na.rm = TRUE))
WEEKDAY AV_CA AV_TX AV_WI
<dbl> <dbl> <dbl> <dbl>
1 1 0.449 0.286 0.265
2 3 0.435 0.290 0.275
3 4 0.518 0.309 0.173
4 5 0.388 0.308 0.304
5 6 0.437 0.222 0.340
6 7 0.438 0.249 0.313
For this dummy data all will up to 100% but when using a larger dataset this might not be true
My data concerns a company and includes Total Sales and the amount of sales in three counties CA , TX and WI.
Data :
> dput(head(WalData))
structure(list(CA = c(11047, 9925, 11322, 12251, 16610, 14696
), TX = c(7381, 5912, 9006, 6226, 9440, 9376), WI = c(6984, 3309,
8883, 9533, 11882, 8664), Total = c(25412, 19146, 29211, 28010,
37932, 32736), date = structure(c(1296518400, 1296604800, 1296691200,
1296777600, 1296864000, 1296950400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), event_type = c("NA", "NA", "NA", "NA", "NA", "Sporting"
), snap_CA = c(1, 1, 1, 1, 1, 1), snap_TX = c(1, 0, 1, 0, 1,
1), snap_WI = c(0, 1, 1, 0, 1, 1)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
With the following code i am trying to calculate the average sales share of the three states on the company's total sales.
In addition, i need the same average percentages for each year, month of the year and day of the week.
install.packages("dplyr")
install.packages("lubridate")
library(dplyr)
library(lubridate)
df1 <- df %>%
dplyr::mutate(YEAR = lubridate::year(date),
MONTH = lubridate::month(date),
WEEKDAY = lubridate::wday(date),
P_CA = CA / Total,
P_TX = TX / Total,
P_WI = WI / Total)
# Average per Year
df1 %>%
dplyr::group_by(YEAR) %>%
dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
AV_TX = mean(P_TX, na.rm = TRUE),
AV_WI = mean(P_WI, na.rm = TRUE))
# Average per Month
df1 %>%
dplyr::group_by(MONTH) %>%
dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
AV_TX = mean(P_TX, na.rm = TRUE),
AV_WI = mean(P_WI, na.rm = TRUE))
# Average per Weekday
df1 %>%
dplyr::group_by(WEEKDAY) %>%
dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
AV_TX = mean(P_TX, na.rm = TRUE),
AV_WI = mean(P_WI, na.rm = TRUE))
Output :
> df1 <- df %>%
+ dplyr::mutate(YEAR = lubridate::year(date),
+ MONTH = lubridate::month(date),
+ WEEKDAY = lubridate::wday(date),
+ P_CA = CA / Total,
+ P_TX = TX / Total,
+ P_WI = WI / Total)
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "function"
> # Average per Year
> df1 %>%
+ dplyr::group_by(YEAR) %>%
+ dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
+ AV_TX = mean(P_TX, na.rm = TRUE),
+ AV_WI = mean(P_WI, na.rm = TRUE))
Error in eval(lhs, parent, parent) : object 'df1' not found
It comes with an error : Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "function"
I cant figure out whats wrong , i double checked the code and the correctness of the data .
Please give a solution .
The issue would be that df is not created as an object in the global env and there is a function with name df if we do ?df
df(x, df1, df2, ncp, log = FALSE)
Basically, the error is based on applying mutate on a function df rather than an object
Checking on a fresh R session with no objects created
df %>%
dplyr::mutate(YEAR = lubridate::year(date),
MONTH = lubridate::month(date),
WEEKDAY = lubridate::wday(date),
P_CA = CA / Total,
P_TX = TX / Total,
P_WI = WI / Total)
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "function"
Now, we define 'df' as
df <- WalData
df %>%
dplyr::mutate(YEAR = lubridate::year(date),
MONTH = lubridate::month(date),
WEEKDAY = lubridate::wday(date),
P_CA = CA / Total,
P_TX = TX / Total,
P_WI = WI / Total)
# A tibble: 6 x 15
# CA TX WI Total date event_type snap_CA snap_TX snap_WI YEAR MONTH WEEKDAY P_CA P_TX P_WI
# <dbl> <dbl> <dbl> <dbl> <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 11047 7381 6984 25412 2011-02-01 00:00:00 NA 1 1 0 2011 2 3 0.435 0.290 0.275
#2 9925 5912 3309 19146 2011-02-02 00:00:00 NA 1 0 1 2011 2 4 0.518 0.309 0.173
#3 11322 9006 8883 29211 2011-02-03 00:00:00 NA 1 1 1 2011 2 5 0.388 0.308 0.304
#4 12251 6226 9533 28010 2011-02-04 00:00:00 NA 1 0 0 2011 2 6 0.437 0.222 0.340
#5 16610 9440 11882 37932 2011-02-05 00:00:00 NA 1 1 1 2011 2 7 0.438 0.249 0.313
#6 14696 9376 8664 32736 2011-02-06 00:00:00 Sporting 1 1 1 2011 2 1 0.449 0.286 0.265
I found a few solutions on here but none seem to work to add a summary row to dplyr output.
#mock up data
df <- data.frame("Market" = sample(c("East", "North", "West"), 100, replace = TRUE, prob = c(0.33, 0.33, 0.34)),
"var1" = sample(c("Y", "N"), 100, replace = TRUE, prob = c(0.4, 0.6)),
"var2" = sample(c("Y", "N"), 100, replace = TRUE, prob = c(0.7, 0.3)),
"var3" = sample(c("Y", "N"), 100, replace = TRUE, prob = c(0.5, 0.5)))
Here is the code:
df_report <- df %>%
group_by(Market) %>%
filter(Market == "East" | Market == "West") %>%
summarise(n = n(),
var1_y = sum(var1 == "Y"),
var1_n = sum(var1 == "N")) %>%
mutate(total = var1_y + var1_n,
var1_y_pct = (var1_y/total),
var1_n_pct = (var1_n/total),
pct_total = total/sum(total))
Here is the output:
# A tibble: 2 x 8
Market n var1_y var1_n total var1_y_pct var1_n_pct pct_total
<fct> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 East 29 13 16 29 0.448 0.552 0.453
2 West 35 16 19 35 0.457 0.543 0.547
Here are the two solutions I tried:
Option 1
df_report %>%
add_row(Market = "Total", n = sum(n), var1_y = sum(var1_y), var1_n = sum(var1_n),
total = sum(total), var1_y_pct = sum(var1_y_pct), var1_n_pct = sum(varn_y_pct), pct_total = sum(pct_total))
Option 2
df_report %>%
rbind(c("Total", sum(n), sum(var1_y), sum(var1_n), sum(total), sum(var1_y_pct), sum(varn_y_pct), sum(pct_total)))
Both give me the same error: Error in sum(n) : invalid 'type' (closure) of argument
I'm unable to determine why these solutions, while working for others and seeming very reasonable, are not working for me.
You should try
df_report %>% janitor::adorn_totals("row")
Which produces
Market n var1_y var1_n total var1_y_pct var1_n_pct pct_total
East 30 11 19 30 0.3666667 0.6333333 0.4285714
West 40 19 21 40 0.4750000 0.5250000 0.5714286
Total 70 30 40 70 0.8416667 1.1583333 1.0000000
The long way of doing this is going for summarise (watch out, you have a typo in var1_n_pct). Then bind the rows.
row_to_add <- df_report %>%
summarise(Market = "Total",
n = sum(n),
var1_y = sum(var1_y),
var1_n = sum(var1_n),
total = sum(total),
var1_y_pct = sum(var1_y_pct),
var1_n_pct = sum(var1_n_pct),
pct_total = sum(pct_total))
df_report %>% bind_rows(row_to_add)