Combining pipes and the magrittr dot (.) placeholder - r

I am fairly new to R and I am trying to understand the %>% operator and the usage of the " ." (dot) placeholder. As a simple example the following code works
library(magrittr)
library(ensurer)
ensure_data.frame <- ensures_that(is.data.frame(.))
data.frame(x = 5) %>% ensure_data.frame
However the following code fails
ensure_data.frame <- ensures_that(. %>% is.data.frame)
data.frame(x = 5) %>% ensure_data.frame
where I am now piping the placeholder into the is.data.frame method.
I am guessing that it is my understanding of the limitations/interpretation of the dot placeholder that is lagging, but can anyone clarify this?

The "problem" is that magrittr has a short-hand notation for anonymous functions:
. %>% is.data.frame
is roughly the same as
function(.) is.data.frame(.)
In other words, when the dot is the (left-most) left-hand side, the pipe has special behaviour.
You can escape the behaviour in a few ways, e.g.
(.) %>% is.data.frame
or any other way where the LHS is not identical to .
In this particular example, this may seem as undesirable behaviuour, but commonly in examples like this there's really no need to pipe the first expression, so is.data.frame(.) is as expressive as . %>% is.data.frame, and
examples like
data %>%
some_action %>%
lapply(. %>% some_other_action %>% final_action)
can be argued to be clearner than
data %>%
some_action %>%
lapply(function(.) final_action(some_other_action(.)))

This is the problem:
. = data.frame(x = 5)
a = data.frame(x = 5)
a %>% is.data.frame
#[1] TRUE
. %>% is.data.frame
#Functional sequence with the following components:
#
# 1. is.data.frame(.)
#
#Use 'functions' to extract the individual functions.
Looks like a bug to me, but dplyr experts can chime in.
A simple workaround in your expression is to do .[] %>% is.data.frame.

Related

How to pass user defined variable to filter dplr function in R? it seems that select works fine but filter gives wrong results

Here is the sample data:
sample,fit_result,Site,Dx_Bin,dx,Hx_Prev,Hx_of_Polyps,Age,Gender,Smoke,Diabetic,Hx_Fam_CRC,Height,Weight,NSAID,Diabetes_Med,stage
2003650,0,U Michigan,High Risk Normal,normal,0,1,64,m,,0,1,182,120,0,0,0
2005650,0,U Michigan,High Risk Normal,normal,0,1,61,m,0,0,0,167,78,0,0,0
2007660,26,U Michigan,High Risk Normal,normal,0,1,47,f,0,0,1,170,63,0,0,0
2009650,10,Toronto,Adenoma,adenoma,0,1,81,f,1,0,0,168,65,1,0,0
2013660,0,U Michigan,Normal,normal,0,0,44,f,0,0,0,170,72,1,0,0
2015650,0,Dana Farber,High Risk Normal,normal,0,1,51,f,1,0,0,160,67,0,0,0
2017660,7,Dana Farber,Cancer,cancer,1,1,78,m,1,1,0,172,78,0,1,3
2019651,19,U Michigan,Normal,normal,0,0,59,m,0,0,0,177,65,0,0,0
2023680,0,Dana Farber,High Risk Normal,normal,1,1,63,f,1,0,0,154,54,0,0,0
2025653,1509,U Michigan,Cancer.,cancer,1,1,67,m,1,0,0,167,58,0,0,4
2027653,0,Toronto,Normal,normal,0,0,65,f,0,0,0,167,60,0,0,0
below is the R code
library(tidyverse)
h <- 'Height'
w <- 'Weight'
data %>% select(h) %>% filter(h > 180)
I can see only height column in output but filter is not applied. I dont get any error when i run the code. similarly, below code also does not work
s <- 'Site'
data %>% select(s) %>% mutate(s = str_replace(s," ","_"))
Output:
Site s
1 U Michigan Site
2 U Michigan Site
3 U Michigan Site
4 Toronto Site
I want to replce the space in Site column but obviously its not recognizing s and creating a new column s.
I tried running below code and still face the same issue.
exp <- substitute(s <- 'Site')
r <- eval(exp,data)
data %>% select(r) %>% mutate(r = str_replace(s," ","_"))
I searched everywhere and could not find a solution, Any help would be great. Thanks in advance (i know the normal way to do it i just want to be able to pass variables to the function)
We may either convert to sym and evaluate (!!). Also, if we want to assign on the lhs of the operator, use := instead of = and evaluate with !!
library(dplyr)
library(stringr)
data %>%
select(all_of(s)) %>%
mutate(!!s := str_replace(!! rlang::sym(s)," ","_"))
Similarly for the filter
data %>%
select(all_of(h)) %>%
filter(!! rlang::sym(h) > 180)
Yet another option would be to pass the variable objects in across (for filter can also use if_any/if_all) where we can pass one or more variables to loop across the columns
data %>%
select(all_of(s)) %>%
mutate(across(all_of(s), ~ str_replace(.x, " ", "_")))
Or use .data
data %>%
select(all_of(s)) %>%
mutate(!!s := str_replace(.data[[s]]," ","_"))

how to unquote (!!) inside `map` inside `mutate`

I'm modifying nested data frames inside of foo with map2 and mutate, and I'd like to name a variable in each nested data frame according to foo$name. I'm not sure what the proper syntax for nse/tidyeval unquotation would be here.
My attempt:
library(tidyverse)
foo <- mtcars %>%
group_by(gear) %>%
nest %>%
mutate(name = c("one", "two", "three")) %>%
mutate(data = map2(data, name, ~
mutate(.x, !!(.y) := "anything")))
#> Error in quos(...): object '.y' not found
I want the name of the newly created variable inside the nested data frames to be "one", "two", and "three", respectively.
I'm basing my attempt off the normal syntax I'd use if I was doing a normal mutate on a normal df, and where name is a string:
name <- "test"
mtcars %>% mutate(!!name := "anything") # works fine
If successful, the following line should return TRUE:
foo[1,2] %>% unnest %>% names %>% .[11] == "one"
This seems to be a feature/bug (not sure, see linked GitHub issue below) of how !! works within mutate and map. The solution is to define a custom function, in which case the unquoting works as expected.
library(tidyverse)
custom_mutate <- function(df, name, string = "anything")
mutate(df, !!name := string)
foo <- mtcars %>%
group_by(gear) %>%
nest %>%
mutate(name = c("one", "two", "three")) %>%
mutate(data = map2(data, name, ~
custom_mutate(.x, .y)))
foo[1,2] %>% unnest %>% names %>% .[11] == "one"
#[1] TRUE
You find more details on GitHub under issue #541: map2() call in dplyr::mutate() error while standalone map2() call works; note that the issue has been closed in September 2018, so I am assuming this is intended behaviour.
An alternative might be to use group_split instead of nest, in which case we
avoid the unquoting issue
nms <- c("one", "two", "three")
mtcars %>%
group_split(gear) %>%
map2(nms, ~.x %>% mutate(!!.y := "anything"))
This is because of the timing of unquoting. Nesting tidy eval functions can be a bit tricky because it is the very first tidy eval function that processes the unquoting operators.
Let's rewrite this:
mutate(data = map2(data, name, ~ mutate(.x, !!.y := "anything")))
to
mutate(data = map2(data, name, function(x, y) mutate(x, !!y := "anything")))
The x and y bindings are only created when the function is called by map2(). So when the first mutate() runs, these bindings don't exist yet and you get an object not found error. With the formula it's a bit harder to see but the formula expands to a function taking .x and .y arguments so we have the same problem.
In general, it's better to avoid complex nested logic in your code because it makes it harder to read. With tidy eval that's even more complexity, so best do things in steps. As an added bonus, doing things in steps requires creating intermediate variables which, if well named, help understand what the function is doing.

Use multiple command chains with piping

EDIT: I reworked the question to make it clearer and integrate what I found by myself
Pipes are a great way to make the code more readable when using a single command chain
In some cases however, I feel one is forced to be inconsistent to its philosophy, either by creating unnecessary temp variables, mixing piping and embedded parenthesis, or defining custom functions.
See this SO question for example, where OP wants to know how to convert colnames to lower case with pipes: Dplyr or Magrittr - tolower?
I'll forget about the existence of names<- to make my point
There's basically 3 ways to do it:
Use a temp variable
temp <- df %>% names %>% tolower
df %>% setNames(temp)
Use embedded parenthesis
df %>% setNames(tolower(names(.)))
Define custom function
upcase <- function(df) {names(df) <- tolower(names(df)); df}
df %>% upcase
I think it would be more consistent to be able to do something like this:
df %T>% # create new branch with %T%>%
{names(.) %>% tolower %as% n} %>% # parallel branch assigned to alias n, then going back to main branch with %>%
setNames(n) # combine branches
For more complex cases, it is in my opinion more readable than the 3 examples above and I'm not polluting my workspace.
So far I've been able to come quite close, I can type:
df %T>%
{names(.) %>% tolower %as% n} %>%
setNames(A(n));fp()
OR (a little tribute to old school calculators)
df %1% # puts lhs in first memory slot (notice "%1%", I define these up to "%9%")
names %>%
tolower %>%
setNames(M(1),.);fp() # call the first stored value
(see code at bottom)
My issues are the following:
I create a new environment in my global environment, and I have to flush it manually with fp(), it's quite ugly
I'd like to get rid of this A function, but I don't understand well enough the environment structure of pipe chains to do so
Here's my code :
It creates an environment named PipeAliasEnv for aliases
%as% creates an alias in an isolated environment
%to% creates a variable in the calling environment
A calls an alias
fp removes all objects from PipeAliasEnv
This is the code that I used and a reproducible example solved in 4 different ways:
library(magrittr)
alias_init <- function(){
assign("PipeAliasEnv",new.env(),envir=.GlobalEnv)
assign("%as%" ,function(value,variable) {assign(as.character(substitute(variable)),value,envir=PipeAliasEnv)},envir=.GlobalEnv)
assign("%to%" ,function(value,variable) {assign(as.character(substitute(variable)),value,envir=parent.frame())},envir=.GlobalEnv)
assign("A" ,function(variable) { get(as.character(substitute(variable)), envir=PipeAliasEnv)},envir=.GlobalEnv)
assign("fp" ,function(remove_envir=FALSE){if(remove_envir) rm(PipeAliasEnv,envir=.GlobalEnv) else rm(list=ls(envir=PipeAliasEnv),envir=PipeAliasEnv)},envir=.GlobalEnv) # flush environment
# to handle `%i%` and M(i) notation, 9 should be enough :
sapply(1:9,function(i){assign(paste0("%",i,"%"),eval(parse(text=paste0('function(lhs,rhs){lhs <- eval(lhs)
rhs <- as.character(substitute(rhs))
str <- paste("lhs %>%",rhs[1],"(",paste(rhs[-1],collapse=","),")")
assign("x',i,'",lhs,envir=PipeAliasEnv)
eval(parse(text= str))}'))),envir=.GlobalEnv)})
assign("M" ,function(i) { get(paste0("x",as.character(substitute(i))), envir=PipeAliasEnv)},envir=.GlobalEnv)
}
alias_init()
# using %as%
df <- iris %T>%
{names(.) %>% toupper %as% n} %>%
setNames(A(n)) %T>%
{. %>% head %>% print}(.) ;fp()
# still using %as%, choosing another main chain
df <- iris %as% dataset %>%
names %>%
toupper %>%
setNames(A(dataset),.) %T>%
{. %>% head %>% print}(.);fp()
# using %to% (notice no assignment on 1st line)
iris %T>%
{names(.) %>% toupper %as% n} %>%
{setNames(.,A(n))} %to% df %>% # no need for '%T>%' and '{}' here
head %>% print;fp()
# or using the old school calculator fashion (probably the clearest for this precise task)
df <- iris %1%
names %>%
toupper %>%
setNames(M(1),.) %T>%
{. %>% head %>% print}(.);fp()
My question in short:
How do I get rid of A and fp ?
Bonus: %to% doesn't work when inside {}, how can I solve this ?

Order by column using infix operator

It's possibly very simple question, but I couldn't find an answer. I'm trying to apply abs on my matrix and then apply order by the first column (descending).
In separate rows it looks like:
pcaRotaMat <- abs(pcaImportance$rotation)
temp <- pcaRotaMat[order(-pcaRotaMat[,1]),]
However, when I'm trying to use the infix operator (%>%), I'm getting the following error:
t <- pcaImprtance$rotation %>% abs() %>% order(-[,1],)
Error: unexpected '[' in "t <- pcaImprtance$rotation %>% abs() %>% order(["
Your help will be appreciated.
If you are comfortable with something more verbose:
sort_fn = function(x) {
x[order(-x[ ,1]), ]
}
t <- pcaImprtance$rotation %>% abs() %>% sort_fn
Option 2:
If you don't want to create a function to sort:
t <- pcaImprtance$rotation %>% abs %>% .[order(-.[, 1]), ]
"." is the placeholder here for the matrix. I would also not recommend assigning variables to "t", as this is the function that transposes matrices.

gather_ does not work. Shouldn't quoting and ~ing have the same effect in standard evaluation mode?

I have issues getting tidyr's gather to work in it's standard evaluation version gather_ :
require(tidyr)
require(dplyr)
require(lazyeval)
df = data.frame(varName=c(1,2))
gather works:
df %>% gather(variable,value,varName)
but I'd like to be able to take the name varName from a variable in standard evaluation mode, and can't seem to get it right:
name='varName'
df %>% gather_("variable","value",interp(~v,v=name))
Error in match(x, y, 0L) : 'match' requires vector arguments
I'm also confused by the following.
This works as expected:
df %>% gather_("variable","value","varName")
The next line should be equivalent to last line (from my understanding of http://cran.r-project.org/web/packages/dplyr/vignettes/nse.html ), but doesn't work:
df %>% gather_(~variable,~value,~varName)
Error in match(x, y, 0L) : 'match' requires vector arguments
Looking at the source of tidyr:::gather_.data.frame, you can see that it is just a wrapper for reshape2::melt. As such, it only works for character or numeric arguments. Acutally the following (which I would consider a bug) works:
df %>% gather_("variable", "value", 1)
As far as I can tell the nse vignette only refers to dplyr and not to tidyr.
Although this question has been answered, the following code could be used for defining keys and values for gathering purposes more generally in a function, using a vector of inputs for key and value:
data <- data.frame(a = runif(10), b = runif(10), c = runif(10))
Key <- "ColId"
Value <- "ColValue"
data %>% gather(key = KeyTmp, value = ValTmp) %>%
rename_(.dots = setNames("KeyTmp", Key) ) %>%
rename_(.dots = setNames("ValTmp", Value) )

Resources