Here is the sample data:
sample,fit_result,Site,Dx_Bin,dx,Hx_Prev,Hx_of_Polyps,Age,Gender,Smoke,Diabetic,Hx_Fam_CRC,Height,Weight,NSAID,Diabetes_Med,stage
2003650,0,U Michigan,High Risk Normal,normal,0,1,64,m,,0,1,182,120,0,0,0
2005650,0,U Michigan,High Risk Normal,normal,0,1,61,m,0,0,0,167,78,0,0,0
2007660,26,U Michigan,High Risk Normal,normal,0,1,47,f,0,0,1,170,63,0,0,0
2009650,10,Toronto,Adenoma,adenoma,0,1,81,f,1,0,0,168,65,1,0,0
2013660,0,U Michigan,Normal,normal,0,0,44,f,0,0,0,170,72,1,0,0
2015650,0,Dana Farber,High Risk Normal,normal,0,1,51,f,1,0,0,160,67,0,0,0
2017660,7,Dana Farber,Cancer,cancer,1,1,78,m,1,1,0,172,78,0,1,3
2019651,19,U Michigan,Normal,normal,0,0,59,m,0,0,0,177,65,0,0,0
2023680,0,Dana Farber,High Risk Normal,normal,1,1,63,f,1,0,0,154,54,0,0,0
2025653,1509,U Michigan,Cancer.,cancer,1,1,67,m,1,0,0,167,58,0,0,4
2027653,0,Toronto,Normal,normal,0,0,65,f,0,0,0,167,60,0,0,0
below is the R code
library(tidyverse)
h <- 'Height'
w <- 'Weight'
data %>% select(h) %>% filter(h > 180)
I can see only height column in output but filter is not applied. I dont get any error when i run the code. similarly, below code also does not work
s <- 'Site'
data %>% select(s) %>% mutate(s = str_replace(s," ","_"))
Output:
Site s
1 U Michigan Site
2 U Michigan Site
3 U Michigan Site
4 Toronto Site
I want to replce the space in Site column but obviously its not recognizing s and creating a new column s.
I tried running below code and still face the same issue.
exp <- substitute(s <- 'Site')
r <- eval(exp,data)
data %>% select(r) %>% mutate(r = str_replace(s," ","_"))
I searched everywhere and could not find a solution, Any help would be great. Thanks in advance (i know the normal way to do it i just want to be able to pass variables to the function)
We may either convert to sym and evaluate (!!). Also, if we want to assign on the lhs of the operator, use := instead of = and evaluate with !!
library(dplyr)
library(stringr)
data %>%
select(all_of(s)) %>%
mutate(!!s := str_replace(!! rlang::sym(s)," ","_"))
Similarly for the filter
data %>%
select(all_of(h)) %>%
filter(!! rlang::sym(h) > 180)
Yet another option would be to pass the variable objects in across (for filter can also use if_any/if_all) where we can pass one or more variables to loop across the columns
data %>%
select(all_of(s)) %>%
mutate(across(all_of(s), ~ str_replace(.x, " ", "_")))
Or use .data
data %>%
select(all_of(s)) %>%
mutate(!!s := str_replace(.data[[s]]," ","_"))
I'm modifying nested data frames inside of foo with map2 and mutate, and I'd like to name a variable in each nested data frame according to foo$name. I'm not sure what the proper syntax for nse/tidyeval unquotation would be here.
My attempt:
library(tidyverse)
foo <- mtcars %>%
group_by(gear) %>%
nest %>%
mutate(name = c("one", "two", "three")) %>%
mutate(data = map2(data, name, ~
mutate(.x, !!(.y) := "anything")))
#> Error in quos(...): object '.y' not found
I want the name of the newly created variable inside the nested data frames to be "one", "two", and "three", respectively.
I'm basing my attempt off the normal syntax I'd use if I was doing a normal mutate on a normal df, and where name is a string:
name <- "test"
mtcars %>% mutate(!!name := "anything") # works fine
If successful, the following line should return TRUE:
foo[1,2] %>% unnest %>% names %>% .[11] == "one"
This seems to be a feature/bug (not sure, see linked GitHub issue below) of how !! works within mutate and map. The solution is to define a custom function, in which case the unquoting works as expected.
library(tidyverse)
custom_mutate <- function(df, name, string = "anything")
mutate(df, !!name := string)
foo <- mtcars %>%
group_by(gear) %>%
nest %>%
mutate(name = c("one", "two", "three")) %>%
mutate(data = map2(data, name, ~
custom_mutate(.x, .y)))
foo[1,2] %>% unnest %>% names %>% .[11] == "one"
#[1] TRUE
You find more details on GitHub under issue #541: map2() call in dplyr::mutate() error while standalone map2() call works; note that the issue has been closed in September 2018, so I am assuming this is intended behaviour.
An alternative might be to use group_split instead of nest, in which case we
avoid the unquoting issue
nms <- c("one", "two", "three")
mtcars %>%
group_split(gear) %>%
map2(nms, ~.x %>% mutate(!!.y := "anything"))
This is because of the timing of unquoting. Nesting tidy eval functions can be a bit tricky because it is the very first tidy eval function that processes the unquoting operators.
Let's rewrite this:
mutate(data = map2(data, name, ~ mutate(.x, !!.y := "anything")))
to
mutate(data = map2(data, name, function(x, y) mutate(x, !!y := "anything")))
The x and y bindings are only created when the function is called by map2(). So when the first mutate() runs, these bindings don't exist yet and you get an object not found error. With the formula it's a bit harder to see but the formula expands to a function taking .x and .y arguments so we have the same problem.
In general, it's better to avoid complex nested logic in your code because it makes it harder to read. With tidy eval that's even more complexity, so best do things in steps. As an added bonus, doing things in steps requires creating intermediate variables which, if well named, help understand what the function is doing.
EDIT: I reworked the question to make it clearer and integrate what I found by myself
Pipes are a great way to make the code more readable when using a single command chain
In some cases however, I feel one is forced to be inconsistent to its philosophy, either by creating unnecessary temp variables, mixing piping and embedded parenthesis, or defining custom functions.
See this SO question for example, where OP wants to know how to convert colnames to lower case with pipes: Dplyr or Magrittr - tolower?
I'll forget about the existence of names<- to make my point
There's basically 3 ways to do it:
Use a temp variable
temp <- df %>% names %>% tolower
df %>% setNames(temp)
Use embedded parenthesis
df %>% setNames(tolower(names(.)))
Define custom function
upcase <- function(df) {names(df) <- tolower(names(df)); df}
df %>% upcase
I think it would be more consistent to be able to do something like this:
df %T>% # create new branch with %T%>%
{names(.) %>% tolower %as% n} %>% # parallel branch assigned to alias n, then going back to main branch with %>%
setNames(n) # combine branches
For more complex cases, it is in my opinion more readable than the 3 examples above and I'm not polluting my workspace.
So far I've been able to come quite close, I can type:
df %T>%
{names(.) %>% tolower %as% n} %>%
setNames(A(n));fp()
OR (a little tribute to old school calculators)
df %1% # puts lhs in first memory slot (notice "%1%", I define these up to "%9%")
names %>%
tolower %>%
setNames(M(1),.);fp() # call the first stored value
(see code at bottom)
My issues are the following:
I create a new environment in my global environment, and I have to flush it manually with fp(), it's quite ugly
I'd like to get rid of this A function, but I don't understand well enough the environment structure of pipe chains to do so
Here's my code :
It creates an environment named PipeAliasEnv for aliases
%as% creates an alias in an isolated environment
%to% creates a variable in the calling environment
A calls an alias
fp removes all objects from PipeAliasEnv
This is the code that I used and a reproducible example solved in 4 different ways:
library(magrittr)
alias_init <- function(){
assign("PipeAliasEnv",new.env(),envir=.GlobalEnv)
assign("%as%" ,function(value,variable) {assign(as.character(substitute(variable)),value,envir=PipeAliasEnv)},envir=.GlobalEnv)
assign("%to%" ,function(value,variable) {assign(as.character(substitute(variable)),value,envir=parent.frame())},envir=.GlobalEnv)
assign("A" ,function(variable) { get(as.character(substitute(variable)), envir=PipeAliasEnv)},envir=.GlobalEnv)
assign("fp" ,function(remove_envir=FALSE){if(remove_envir) rm(PipeAliasEnv,envir=.GlobalEnv) else rm(list=ls(envir=PipeAliasEnv),envir=PipeAliasEnv)},envir=.GlobalEnv) # flush environment
# to handle `%i%` and M(i) notation, 9 should be enough :
sapply(1:9,function(i){assign(paste0("%",i,"%"),eval(parse(text=paste0('function(lhs,rhs){lhs <- eval(lhs)
rhs <- as.character(substitute(rhs))
str <- paste("lhs %>%",rhs[1],"(",paste(rhs[-1],collapse=","),")")
assign("x',i,'",lhs,envir=PipeAliasEnv)
eval(parse(text= str))}'))),envir=.GlobalEnv)})
assign("M" ,function(i) { get(paste0("x",as.character(substitute(i))), envir=PipeAliasEnv)},envir=.GlobalEnv)
}
alias_init()
# using %as%
df <- iris %T>%
{names(.) %>% toupper %as% n} %>%
setNames(A(n)) %T>%
{. %>% head %>% print}(.) ;fp()
# still using %as%, choosing another main chain
df <- iris %as% dataset %>%
names %>%
toupper %>%
setNames(A(dataset),.) %T>%
{. %>% head %>% print}(.);fp()
# using %to% (notice no assignment on 1st line)
iris %T>%
{names(.) %>% toupper %as% n} %>%
{setNames(.,A(n))} %to% df %>% # no need for '%T>%' and '{}' here
head %>% print;fp()
# or using the old school calculator fashion (probably the clearest for this precise task)
df <- iris %1%
names %>%
toupper %>%
setNames(M(1),.) %T>%
{. %>% head %>% print}(.);fp()
My question in short:
How do I get rid of A and fp ?
Bonus: %to% doesn't work when inside {}, how can I solve this ?
It's possibly very simple question, but I couldn't find an answer. I'm trying to apply abs on my matrix and then apply order by the first column (descending).
In separate rows it looks like:
pcaRotaMat <- abs(pcaImportance$rotation)
temp <- pcaRotaMat[order(-pcaRotaMat[,1]),]
However, when I'm trying to use the infix operator (%>%), I'm getting the following error:
t <- pcaImprtance$rotation %>% abs() %>% order(-[,1],)
Error: unexpected '[' in "t <- pcaImprtance$rotation %>% abs() %>% order(["
Your help will be appreciated.
If you are comfortable with something more verbose:
sort_fn = function(x) {
x[order(-x[ ,1]), ]
}
t <- pcaImprtance$rotation %>% abs() %>% sort_fn
Option 2:
If you don't want to create a function to sort:
t <- pcaImprtance$rotation %>% abs %>% .[order(-.[, 1]), ]
"." is the placeholder here for the matrix. I would also not recommend assigning variables to "t", as this is the function that transposes matrices.
I have issues getting tidyr's gather to work in it's standard evaluation version gather_ :
require(tidyr)
require(dplyr)
require(lazyeval)
df = data.frame(varName=c(1,2))
gather works:
df %>% gather(variable,value,varName)
but I'd like to be able to take the name varName from a variable in standard evaluation mode, and can't seem to get it right:
name='varName'
df %>% gather_("variable","value",interp(~v,v=name))
Error in match(x, y, 0L) : 'match' requires vector arguments
I'm also confused by the following.
This works as expected:
df %>% gather_("variable","value","varName")
The next line should be equivalent to last line (from my understanding of http://cran.r-project.org/web/packages/dplyr/vignettes/nse.html ), but doesn't work:
df %>% gather_(~variable,~value,~varName)
Error in match(x, y, 0L) : 'match' requires vector arguments
Looking at the source of tidyr:::gather_.data.frame, you can see that it is just a wrapper for reshape2::melt. As such, it only works for character or numeric arguments. Acutally the following (which I would consider a bug) works:
df %>% gather_("variable", "value", 1)
As far as I can tell the nse vignette only refers to dplyr and not to tidyr.
Although this question has been answered, the following code could be used for defining keys and values for gathering purposes more generally in a function, using a vector of inputs for key and value:
data <- data.frame(a = runif(10), b = runif(10), c = runif(10))
Key <- "ColId"
Value <- "ColValue"
data %>% gather(key = KeyTmp, value = ValTmp) %>%
rename_(.dots = setNames("KeyTmp", Key) ) %>%
rename_(.dots = setNames("ValTmp", Value) )