I am trying perform dplyr summarize iteratively using concatenated string as column names
Category=c("a","a","b","b","b","c","c","c")
A1=c(1,2,3,4,3,2,1,2)
A2=c(10,11,12,13,14,15,16,17)
tt=cbind(Category,A1,A2)
tdat=data.frame(tt)
colnames(tdat)=c("Category","M1","M2")
ll=matrix(1:2,nrow=2)
for(i in 1:nrow(ll)) {
Aone=tdat %>% group_by(Category) %>%
summarize(Msum=sum(paste("M",i,sep="")))
}
I end up the following error
x invalid 'type' (character) of argument
ℹ Input Msum is sum(paste("M", i, sep = "")).
ℹ The error occurred in group 1: Category = "A".
Run rlang::last_error() to see where the error occurred.```
The goal is to iteratively get arithmentic functions within summarize function in dplyr. But this concatenated string is not recognized as column name.
If we want to pass a string as column name, then convert to symbol and evaluate (!!)
library(dplyr)
Aone <- vector('list', nrow(ll))
for(i in seq_len(nrow(ll))) {
Aone[[i]] <- tdat %>%
group_by(Category) %>%
summarize(Msum = sum(!! rlang::sym(paste("M", i, sep=""))))
}
Or assuming the column name is 'M-1', 'M-2', etc, it should work as well
Aone <- vector('list', 2)
for(i in seq_along(Aone)) {
Aone[[i]] <- tdat %>%
group_by(Category) %>%
summarise(Msum = sum(!! rlang::sym(paste("M-", i, sep=""))),
.groups = 'drop')
}
NOTE: The ll was not clear in the original post. Here, we create a list with length equal to the number of 'M-' columns and assign the output back to the list element by looping over the sequence of that list
data
tdat <- data.frame(Category, M1, M2)
tdat <- structure(list(Category = c("A", "A", "A", "A", "B", "B", "B",
"B"), `M-1` = c(1, 2, 3, 4, 3, 2, 1, 2), `M-2` = c(10, 11, 12,
13, 14, 15, 16, 17)), class = "data.frame", row.names = c(NA,
-8L))
Related
I have a dataframe and try to create a function that calculate number of records by TRT01AN and another variable chosen by the user (I just send a reduced DF with only one extra variable to make it simpler)
dataframe <- as.data.frame(cbind(ID,=c(1,2,3,4,5,6),TRT01AN = c(1, 1, 3, 2, 2, 2),
AGEGR1 =c("Adult","Child","Adolescent","Adolescent","Adolescent","Child")))
sub1 <- function(SUB1) {
# Calculate number of subjects in each treatment arm
bigN1 <- dataframe %>%
group_by_(SUB1,TRT01AN) %>%
summarise(N = n_distinct(ID))
return(bigN1)
}
bigN1<-sub1(SUB1="AGEGR1")
If I do that , with group_by_ I have an error that TRT01AN doesn't exist and if I use group_by, SUB1 can't be found... Any idea how I can have both variables, a "permanent" one and on defined as the argument of the function?
Thank you!
Try using curly braces (works with or without quotation marks in function call):
library(dplyr)
dataframe <-
as.data.frame(cbind(
ID = c(1, 2, 3, 4, 5, 6),
TRT01AN = c(1, 1, 3, 2, 2, 2),
AGEGR1 = c(
"Adult",
"Child",
"Adolescent",
"Adolescent",
"Adolescent",
"Child"
)
))
sub1 <- function(SUB1) {
# Calculate number of subjects in each treatment arm
bigN1 <- dataframe %>%
group_by({{SUB1}}, TRT01AN) %>%
summarise(N = n_distinct(ID))
return(bigN1)
}
bigN1 <- sub1(AGEGR1)
Here is a tidy dataframe
df_tidy <- tibble(
company = c("A", "B", "A", "B", "A", "B"),
line_data = c(1, 2, 2, 2, 1, 1)
)
The format required is:
df_ll <- structure(list(company = c("A", "B"), line_data = list(list(c(1, 2, 1)), list(c(2, 2, 1)))), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"))
How do I transform df_tidy into df_ll?
Grouped by 'company' summarise the 'line_data' in a list
df_ll2 <- df_tidy %>%
group_by(company) %>%
summarise(line_data = list(list(line_data)))
-checking with expected
all.equal(df_ll, df_ll2)
[1] TRUE
Or another option is nest or nest_by and then convert the tibble to a list
df_tidy %>%
nest_by(company, .key = "line_data") %>%
mutate(line_data = list(list(unlist(line_data)))) %>%
ungroup
You can also use plyr package:
df_ll <- dlply(df_tidy,.(company),c)
library(tidyverse)
# Create a tibble comprised of: df_ll2 => tibble
df_ll2 <- tibble(
# Uniquify the company vector: company => character vector
company = unique(df_tidy$company),
# Split the data into a list by the company vector, coerce each
# element to an unnamed list:
line_data = unname(
lapply(
with(df_tidy, split(line_data, company)),
list
)
)
)
This block runs below, and produces df_all as intended, but when I uncomment the single function at the top (not even apply it here but I do need for other things) and rerun the same block, I get: Error in bind_rows_(x, .id): Argument 1 must be a data frame or a named atomic vector, not a function
library(data.table)
# addxtoy_newy_csv <- function(df) {
# zdf1 <- df %>% filter(Variable == "s44")
# setDT(df)
# setDT(zdf1)
# df[zdf1, Value := Value + i.Value, on=.(tstep, variable, Scenario)]
# setDF(df)
#}
tstep <- rep(c("a", "b", "c", "d", "e"), 5)
Variable <- c(rep(c("v"), 5), rep(c("w"), 5), rep(c("x"), 5), rep(c("y"), 5), rep(c("x"), 5))
Value <- c(1,2,3,4,5,10,11,12,13,14,33,22,44,57,5,3,2,1,2,3,34,24,11,11,7)
Scenario <- c(rep(c("i"), 20), rep(c("j"), 5) )
df1 <- data.frame(tstep, Variable, Value, Scenario)
tstep <- c("a", "b", "c", "d", "e")
Variable <- rep(c("x"), 5)
Value <- c(100, 34, 100,22, 100)
Scenario <- c(rep(c("i"), 5))
df2<- data.frame(tstep, Variable, Value, Scenario)
setDT(df1)
setDT(df2)
df1[df2, Value := Value + i.Value, on=.(tstep, Variable, Scenario)]
setDF(df1)
df_all <- mget(ls(pattern="df*")) %>% bind_rows()
The pattern you use in ls() will match any object with a "d" in its name, so addxtoy_newy_csv gets included in the list of object names. The f* in your pattern means you currently search for "d, followed by zero or more f's". I think a safer pattern to use would be ^df.*, to match objects that start with "df":
df1 = data.frame(x = 1:3)
df2 = data.frame(x = 4:6)
adder = function(x) x + 1
ls(pattern = "df*")
ls(pattern = "^df.*")
I am working on a report for which I have to export a large number of similar data frames into nice looking tables in Word. My goal is to achieve this in one go, using flextable to generate the tables and purrr / tidyverse to apply all the formatting procedures to all rows in a nested data frame. This is what my data frame looks like:
df <- data.frame(school = c("A", "B", "A", "B", "A", "B"),
students = c(round(runif(6, 1, 10), 0)),
grade = c(1, 1, 2, 2, 3, 3))
I want to generate separate tables for all groups in column 'school' and started by using the nest() function within tidyr.
list <- df %>%
group_by(school) %>%
nest()
This gives me a nested data frame to which I can apply the functions in flextable using purrr:
list <- list %>%
mutate(ftables = map(data, flextable)) %>%
mutate(ftables = purrr::map(ftables, ~ set_header_labels(.,
students = "No of students",
grade = "Grade")))
The first mutate generates a new column with flextable objects for each school, and the second mutate applies header labels to the table, based on the column names that are saved in the object.
My goal is now to add another header that is based on the name of the school. This value resides in the list column entitled school, which corresponds row-wise to the tables generated in the list column ftables. How can I pass the name of the school to the add_header function within ftables, using purrr or any other procedure?
Expected output
I have been able to achieve what I want for individual schools with this procedure (identical header cells will later be merged):
school.name <- "A"
ftable.a <- df %>%
filter(school == "A") %>%
select(-school) %>%
flextable() %>%
set_header_labels(students = "No of students",
grade = "Grade") %>%
add_header(students = school.name,
grade = school.name)
ftable.a
package purrr provides function map2 that you should use:
library(flextable)
library(magrittr)
library(dplyr)
library(tidyr)
library(purrr)
df <- data.frame(school = c("A", "B", "A", "B", "A", "B"),
students = c(round(runif(6, 1, 10), 0)),
grade = c(1, 1, 2, 2, 3, 3))
byschool <- df %>%
group_by(school) %>%
nest()
byschool <- byschool %>%
mutate(ftables = map(data, flextable)) %>%
mutate(ftables = purrr::map(
ftables, ~ set_header_labels(.,
students = "No of students",
grade = "Grade"))) %>%
mutate(ftables = purrr::map2(ftables, school, function(ft, h){
add_header(ft, students = h, grade = h)
} ))
I'm new to R, and I'm trying to write a function that will add the entries
of a data frame column by row, and return the data frame with
a column of the new row of sums
that column named.
Here's a sample df of my data:
Ethnicity <- c('A', 'B', 'H', 'N', 'O', 'W', 'Unknown')
Texas <- c(2,41,56,1,3,89,7)
Tenn <- c(1,9,2,NA,1,32,3)
When I directly try the following code, the columns are summed by row as desired:
new_df <- df %>% rowwise() %>%
mutate(TN_TX = sum(Tenn, Texas, na.rm = TRUE))
new_df
But when I try to use my function code, rowwise() seems not to work. My function code is:
df.sum.col <- function(df.in, col.1, col.2) {
if(is.data.frame(df.in) != TRUE){ #warning if first arg not df
warning('df.in is not a dataframe')}
if(is.numeric(col.1) != TRUE){
warning('col.1 is not a numeric vector')}
if(is.numeric(col.2) != TRUE){
warning('col.2 is not a numeric vector')} #warning if col not numeric
df.out <- rowwise(df.in) %>%
mutate(name = sum(col.1, col.2, na.rm = TRUE))
df.out
}
bad_df <- df.sum(df,Texas, Tenn)
This results in
bad_df
.
I don't understand why the core of the function works outside it but not within. I also tried piping df.in to rowsum() like this:
f.out <- df.in %>% rowwise() %>%
mutate(name = sum(col.1, col.2, na.rm = TRUE))
But that doesn't resolve the problem.
As far as naming the new column, I tried doing so by adding the name as an argument, but didn't have any success. Thoughts on this?
Any help appreciated!
As suggested by #thelatemail, it's down to non-standard evaluation. rowwise() ha nothing to do with it. You need to rewrite your function to use mutate_. It can be tricky to understand, but here's one version of what you're trying to do:
library(dplyr)
df <- tibble::tribble(
~Ethnicity, ~Texas, ~Tenn,
"A", 2, 1,
"B", 41, 9,
"H", 56, 2,
"N", 1, NA,
"O", 3, 1,
"W", 89, 32,
"Unknown", 7, 3
)
df.sum.col <- function(df.in, col.1, col.2, name) {
if(is.data.frame(df.in) != TRUE){ #warning if first arg not df
warning('df.in is not a dataframe')}
if(is.numeric(lazyeval::lazy_eval(substitute(col.1), df.in)) != TRUE){
warning('col.1 is not a numeric vector')}
if(is.numeric(lazyeval::lazy_eval(substitute(col.2), df.in)) != TRUE){
warning('col.2 is not a numeric vector')} #warning if col not numeric
dots <- setNames(list(lazyeval::interp(~sum(x, y, na.rm = TRUE),
x = substitute(col.1), y = substitute(col.2))),
name)
df.out <- rowwise(df.in) %>%
mutate_(.dots = dots)
df.out
}
In practice, you shouldn't need to use rowwise at all here, but can use rowSums, after selecting only the columns you need to sum.