dynamic dplyr column name calculation - r

I have the following code.
colName is passed in. I've been trying to get it to be evaluated as the value of colName but have not had much success. I've tried "eval", "setNames", etc. Using the "_", still has not provided success.
Essentially, if my colName = "MyCol", I want the dplyr chain to execute as if the last line read:
mutate(MyCol = ifelse(is.na(MyCol), "BLANK", MyCol))
makeSummaryTable <- function(colName,originalData){
result <- originalData %>%
group_by_(colName) %>%
summarise(numObs = n()) %>%
ungroup() %>%
arrange(desc(numObs)) %>%
rowwise() %>%
mutate_(colName = ifelse(is.na(colName), "BLANK",colName))
return(result)
}

Here's how to do it with dplyr 0.6.0 using the new tidyeval approach to non-standard evaluation. (I'm not sure if it's even possible to do with standard evaluation, at least in a straightforward manner):
library(dplyr)
makeSummaryTable <- function(colName, originalData){
colName <- enquo(colName)
originalData %>%
count(!!colName) %>%
arrange(desc(n)) %>%
mutate(
old_col = !!colName,
!!quo_name(colName) := if_else(is.na(!!colName), "BLANK",!!colName)
)
}
makeSummaryTable(hair_color, starwars)
#> # A tibble: 13 x 3
#> hair_color n old_col
#> <chr> <int> <chr>
#> 1 none 37 none
#> 2 brown 18 brown
#> 3 black 13 black
#> 4 BLANK 5 <NA>
#> 5 white 4 white
#> 6 blond 3 blond
#> 7 auburn 1 auburn
#> 8 auburn, grey 1 auburn, grey
#> 9 auburn, white 1 auburn, white
#> 10 blonde 1 blonde
#> 11 brown, grey 1 brown, grey
#> 12 grey 1 grey
#> 13 unknown 1 unknown
enquo turns the unquoted column name into some fancy object called a quosure. !! then unquotes the quosure so that it can get evaluated as if it would be typed directly in the function. For a more in-depth and accurate explanation, see Hadley's "Programming with dplyr".
EDIT: I realized that the original question was to name the new column with the user-supplied value of colName and not just colName so I updated my answer. To accomplish that, the quosure needs to be turned into a string (or label) using quo_name. Then, it can be "unquoted" using !! just as a regular quosure would be. The only caveat is that since R can't make head or tails of the expression mutate(!!foo = bar), tidyeval introduces the new definition operator := (which might be familiar to users from data.table where it has a somewhat different use). Unlike the traditional assignment operator =, the := operator allows unquoting on both the right-hand and left-hand side.
(updated the answer to use a dataframe that has NA in one of its rows, to illustrate that the last mutate works. I also used count instead of group by + summarize, and I dropped the unnecessary rowwise.)

Related

How to split one column whith multiples delimiters in multiple columns in R?

I have values ​​with the following structure: string OR string_string.interger
EX:
df<-data.frame(Objs=c("Windows","Door_XYZ.1", "Door_XYY.1", "Chair_XYYU.2" ))
Objs
Windows
Door_XYZ.1
Door_XYY.1
Chair_XYYU.2
Using the command split(), separate() or something similar I need to generate a dataframe similar to this one:
Obs: The split must be performed for the characters "_" and "."
Objs
IND
TAG
Control
Windows
NA
NA
NA
Door_XYZ.1
Door
XYZ
1
Door_XYY.1
Door
XYY
1
Chair_XYYU.2
Chair
XYYU
2
The closest solution was suggested by #Tommy, in similar context.
df %>% data.frame(.,do.call(rbind,str_split(.$Objs,"_")))
The default value of the sep argument in separate() will nearly get the result you need. A conditional mutate was also needed to remove the Windows entry from the IND column.
library(tidyverse)
df <- data.frame(Objs=c("Windows","Door_XYZ.1", "Door_XYY.1", "Chair_XYYU.2" ))
df %>%
separate(Objs, into = c("IND", "TAG", "Control"), remove = FALSE, fill = "right") %>%
mutate(IND = if_else(Objs == IND, NA_character_, IND))
#> Objs IND TAG Control
#> 1 Windows <NA> <NA> <NA>
#> 2 Door_XYZ.1 Door XYZ 1
#> 3 Door_XYY.1 Door XYY 1
#> 4 Chair_XYYU.2 Chair XYYU 2
Created on 2022-05-05 by the reprex package (v1.0.0)

How to decode base64 strings in a vectorized way within dplyr::mutate?

I have a tibble which contains a column of base64-encoded strings like so:
mytib <- tibble(encoded_var = c("VGVzdGluZ3Rlc3Rpbmc=", "QW5vdGhlcnRlc3Q="))
When I try to decode it with base64::base64decode
mytib %>%
mutate(decoded_var = base64decode(encoded_var))
I receive an error:
Error in `mutate()`:
! Problem while computing `decoded_var = base64decode(encoded_var)`.
x `decoded_var` must be size 2 or 1, not 25.
I'm looking to have a tibble with a column of decoded, human-readable base64 strings. I'd also like to do that using the mutate tidyverse syntax. How can I achieve that?
Update: The tibble should look like this
# A tibble: 2 × 2
encoded_var decoded_var
<chr> <chr>
1 VGVzdGluZ3Rlc3Rpbmc= Testingtesting
2 QW5vdGhlcnRlc3Q= Anothertest
base64enc::base64decode produces a raw vector, so you need to carry out the conversion rowwise and wrap the result with rawToChar:
mytib %>%
rowwise() %>%
mutate(decoded_var = rawToChar(base64decode(encoded_var)))
#> # A tibble: 2 x 2
#> # Rowwise:
#> encoded_var decoded_var
#> <chr> <chr>
#> 1 VGVzdGluZ3Rlc3Rpbmc= Testingtesting
#> 2 QW5vdGhlcnRlc3Q= Anothertest
The problem is that the caTools::base64decode function only works on one string at a time, because a single string could contain several values. If you always have a single character value in your variable, then you can vectorize it:
library(tidyverse)
mytib <- tibble(encoded_var = c("VGVzdGluZ3Rlc3Rpbmc=", "QW5vdGhlcnRlc3Q="))
mytib %>%
mutate(decoded_var = Vectorize(caTools::base64decode)(encoded_var, "character"))
#> # A tibble: 2 × 2
#> encoded_var decoded_var
#> <chr> <chr>
#> 1 VGVzdGluZ3Rlc3Rpbmc= Testingtesting
#> 2 QW5vdGhlcnRlc3Q= Anothertest
Created on 2022-03-14 by the reprex package (v2.0.1)
EDITED TO ADD: Actually, there are (at least) four different packages that provide base64decode functions. I used caTools. There are also versions in the processx, xfun and base64enc packages. (The one in xfun is actually named base64_decode.) This is why it's important to show reproducible code here on StackOverflow. The reprex package makes this very easy.

Iteratively create global environment objects from tibble

I'm trying to make objects directly from information listed in a tibble that can be called on by later functions/tibbles in my environment. I can make the objects manually but I'm working to do this iteratively.
library(tidyverse)
##determine mean from 2x OD Negatives in experimental plates, then save summary for use in appending table
ELISA_negatives = "my_file.csv"
neg_tibble <- as_tibble(read_csv(ELISA_negatives, col_names = TRUE)) %>%
group_by(Species_ab, Antibody, Protein) %>%
filter(str_detect(Animal_ID, "2x.*")) %>%
summarize(ave_neg_U_mL = mean(U_mL, na.rm = TRUE), n=sum(!is.na(U_mL)))
neg_tibble
# A tibble: 4 x 5
# Groups: Species_ab, Antibody [2]
Species_ab Antibody Protein ave_neg_U_mL n
<chr> <chr> <chr> <dbl> <int>
1 Mouse IgG GP 28.2 6
2 Mouse IgG NP 45.9 6
3 Rat IgG GP 5.24 4
4 Rat IgG NP 1.41 1
I can write the object manually based off the above tibble:
Mouse_IgG_GP_cutoff <- as.numeric(neg_tibble[1,4])
Mouse_IgG_GP_cutoff
[1] 28.20336
In my attempt to do this iteratively, I can make a new tibble neg_tibble_string with the information I need. All I would need to do now is make a global object from the Name in the first column Test_Name, and assign it to the numeric value in the second column ave_neg_U_mL (which is where I'm getting stuck).
neg_tibble_string <- neg_tibble %>%
select(Species_ab:Protein) %>%
unite(col='Test_Name', c('Species_ab', 'Antibody', 'Protein'), sep = "_") %>%
mutate(Test_Name = str_c(Test_Name, "_cutoff")) %>%
bind_cols(neg_tibble[4])
neg_tibble_string
# A tibble: 4 x 2
Test_Name ave_neg_U_mL
<chr> <dbl>
1 Mouse_IgG_GP_cutoff 28.2
2 Mouse_IgG_NP_cutoff 45.9
3 Rat_IgG_GP_cutoff 5.24
4 Rat_IgG_NP_cutoff 1.41
I feel like there has to be a way to do this to get this from the above tibble neg_tibble_string, and make this for all four of the rows. I've tried a variant of this and this, but can't get anywhere.
> list_df <- mget(ls(pattern = "neg_tibble_string"))
> list_output <- map(list_df, ~neg_tibble_string$ave_neg_U_mL)
Warning message:
Unknown or uninitialised column: `ave_neg_U_mL`.
> list_output
$neg_tibble_string
NULL
As always, any insight is appreciated! I'm making progress on my R journey but I know I am missing large gaps in knowledge.
As we already returned the object value in a list, we need only to specify the lambda function i.e. .x returns the value of the list element which is a tibble and extract the column
library(purrr)
list_output <- map(list_df, ~.x$ave_neg_U_ml)
If the intention is to create global objects, deframe, convert to a list and then use list2env
library(tibble)
list2env(as.list(deframe(neg_tibble_string)), .GlobalEnv)

How can I write a tidyverse-friendly function that respects group_by() earlier in the pipe?

I've started working on writing functions to make table generation quicker, but want to make the function respect earlier grouping choices made by the user in the pipe.
Example data:
df<-data.frame(ID=c("A","B","C","A","C","D","A","C","E","B","C","A"),
Year=c(1,1,1,2,2,2,3,3,3,4,4,4),
Credits=c(1,3,4,5,6,7,2,1,1,6,1,2),
Major=c("GS","GS","LA","GS","GS","LA","GS","LA","LA","GS","LA","LA"),
Status=c("green","blue","green","blue","green","blue","green","blue","green","blue","green","blue"),
Group=c("Art","Music","Science","Art","Music","Science","Art","Music","Science","Art","Music","Science"))
The following is the function I'm working on, and it requires/accepts a variable to define cohorts, a credit variable, and a term variable.
table_headsfte_cohorts<-function(.data,cohortvar,credits,term){
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!!term,Pidm)%>%
group_by(!!term,!!cohortvar,group_cols())%>%
mutate(on3=1)%>%
mutate(`Headcount`=sum(on3),
`FTE`=round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE,group_cols())
}
For a user that may be interested in using additional grouping variables beyond the cohort variable they choose, I am hoping that the end result function would allow usage as follows:
df2<-df%>%
group_by(Status,Group)%>%
table_headsfte_cohorts(Major,Credits,Year)
The desired end result would be a table that respects and preserves the levels of the two grouping variables in the group_by statement above in addition to the cohortvar and term columns coming from the table_headsfte_cohorts() arguments.
I need to generate this same table, but for a wide range of grouping variables, and varying numbers of grouping variables, so flexibility would be very helpful.
Edit:
The following seems to get close, by at least allowing multiple grouping variables. This isn't quite what I'm hoping for, as I'd prefer that the additional grouping arguments are read from up the pipe:
table_headsfte_cohorts<-function(.data,cohortvar,credits,term,...){
grps<-enquos(...)
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!!term,!!cohortvar,!!! grps)%>%
mutate(on3=1)%>%
mutate(`Headcount`=sum(on3),
`FTE`=round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE,!!!grps)
}
Using the above, I can successfully run:
fdfout<-fdf%>%
table_headsfte_cohorts(Major, Credits, Year), getting:
and I can also pass the other variables to the function to serve as additional grouping variables:
fdfout_alt<-fdf%>%
table_headsfte_cohorts(Major,Credits,Year,Status,Group)
yielding the desired result:
Unfortunately, when I use
fdf_no<-fdf%>%
group_by(Status, Group)%>%
table_headsfte_cohorts(Major, Credits, Year)
I get:
This output would likely confuse someone using my function, as their group_by() line seems to do nothing.
I added some lines that will merge both, the existing grouping variables and the new grouping variables inside the dots into one character vector. We can get the existing grouping variablers with group_vars. To merge old and new together we have to get the expression get_expr of the quoted grouping variables and turn them into strings. We can use !!! syms to evaluate and all_of to select the grouping variables.
Is this what you had in mind?
table_headsfte_cohorts <- function(.data, cohortvar, credits, term, ...){
new_grps <- enquos(...)
new_grps <- purrr::map_chr(new_grps, ~ as.character(rlang::get_expr(.x)))
ex_grps <- group_vars(.data)
grp_vars <- c(ex_grps, new_grps)
cohortvar<-rlang::ensym(cohortvar)
credits<-rlang::ensym(credits)
term<-rlang::ensym(term)
.data%>%
group_by(!! term,
!! cohortvar,
!!! syms(grp_vars))%>%
mutate(on3 = 1) %>%
mutate(`Headcount`= sum(on3),
`FTE`= round(sum(na.omit(!!credits))/15,1))%>%
mutate(Variable=paste0(cohortvar))%>%
mutate(Category=!!cohortvar)%>%
select(-!!cohortvar)%>%
select(Variable,Category,Headcount,FTE, all_of(grp_vars))
}
df %>%
group_by(Status, Group) %>%
table_headsfte_cohorts(Major, Credits, Year)
#> Adding missing grouping variables: `Major`
#> Adding missing grouping variables: `Year`, `Major`
#> # A tibble: 12 x 8
#> # Groups: Year, Major, Status, Group [12]
#> Year Major Variable Category Headcount FTE Status Group
#> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 1 GS Major GS 1 0.1 green Art
#> 2 1 GS Major GS 1 0.2 blue Music
#> 3 1 LA Major LA 1 0.3 green Science
#> 4 2 GS Major GS 1 0.3 blue Art
#> 5 2 GS Major GS 1 0.4 green Music
#> 6 2 LA Major LA 1 0.5 blue Science
#> 7 3 GS Major GS 1 0.1 green Art
#> 8 3 LA Major LA 1 0.1 blue Music
#> 9 3 LA Major LA 1 0.1 green Science
#> 10 4 GS Major GS 1 0.4 blue Art
#> 11 4 LA Major LA 1 0.1 green Music
#> 12 4 LA Major LA 1 0.1 blue Science

Unnest group of string across columns but keep them in the original row in R

I'm trying to find a way to unnest the group of string across columns but keep all the string in the original row. Take the example dataset starwars from dplyr as it has similar strcuture of my dataset.
The starwars dataset has 3 nested columns of films, vehicles, starships. The common approach is to do a unnest_longer so we unnest the group of string into multiple rows - each row contains one piece of the string. However, I'd prefer to keep all the ungrouped string in the original row.
The alternative approach is to use rowwise() and mutate with paste. This works but my dataset has 15 nested columns so I'd have to type 15 line of mutate with paste. That's a bit tedious.
df <- dplyr::starwars %>%
rowwise() %>%
mutate(films = paste(films, collapse=', '),
vehicles = paste(vehicles, collapse=', '),
starships = paste(starships, collapse=', '))
My current thinking is to come up with a wrap function and perhaps I can do it through purrr at scale. But my poor function writing is not working - perhaps I am not too familiar with the dplyr hood.
ungroup_string <- function(data, x){
a <- rowwise(data)
a %>% mutate(x = paste(x, collapse=','))
}
Any way I can do this ungroup string across a number of columns?
You can use across :
library(dplyr)
starwars %>%
select(name, films, vehicles, starships) %>%
rowwise() %>%
mutate(across(c(films,vehicles, starships), toString))
# name films vehicles starships
# <chr> <chr> <chr> <chr>
# 1 Luke Skyw… The Empire Strikes Back, Revenge of… "Snowspeeder, I… "X-wing, Imperial shuttle"
# 2 C-3PO The Empire Strikes Back, Attack of … "" ""
# 3 R2-D2 The Empire Strikes Back, Attack of … "" ""
# 4 Darth Vad… The Empire Strikes Back, Revenge of… "" "TIE Advanced x1"
# 5 Leia Orga… The Empire Strikes Back, Revenge of… "Imperial Speed… ""
# 6 Owen Lars Attack of the Clones, Revenge of th… "" ""
# 7 Beru Whit… Attack of the Clones, Revenge of th… "" ""
# 8 R5-D4 A New Hope "" ""
# 9 Biggs Dar… A New Hope "" "X-wing"
#10 Obi-Wan K… The Empire Strikes Back, Attack of … "Tribubble bong… "Jedi starfighter, Trade Federation c…
# … with 77 more rows
across accepts tidy-select variables. So you don't have to specify each of your 15 columns one by one. You can select column names by position 1:15, by range col1:col15 or by some pattern in their name starts_with('col').

Resources