Subsetting and then binding works as expected
var <- c("wt", "mpg")
mtcars %>% select(!!!var) -> df1
mtcars %>% select(!!!var) -> df2
bind_rows(df1, df2)
But if we skip intermediate steps
bind_rows(
mtcars %>% select(!!!var),
mtcars %>% select(!!!var)
)
it fails with Error: only lists can be spliced
This is a bug in rlang that has to do with value splicing. All functions taking dots support splicing, even if they are not quoting their input. This is handy because you don't have to use do.call() with these functions when you have a list of arguments, you can just splice the list.
The mechanism is a bit different for technical reasons. There's currently a bug and value-splicing instead of call-splicing is used within the select() call. This should be fixed shortly.
I never use !! or !!! because there is often something that goes wrong.
Instead, I use UQ. I don't know if it's good practice, but it works.
bind_rows(
UQ(mtcars %>% select(var)),
UQ(mtcars %>% select(var))
)
Related
How to avoid assigning a value to a variable, then calling it separately right after to manipulate it? (Like so:)
df.3 <- filter(df.1, !(Patient_ID %in% df.2))
df.3 %>% count(MIPSGroup)
The only way I know how to do it is:
(df.3 <- filter(df.1, !(Patient_ID %in% df.2))) %>%
df.3 %>% count(MIPSGroup)
But there's gotta be a better way...
Thanks!
If you were willing to modify the original input, you could use magrittr's %<>% (compound assignment) operator:
(mtcars %<>% filter(cyl==6)) %>% count(mpg)
This modifies the value of mtcars according to the filter and prints the results of the count operation on the result.
There may be some way to use magrittr's other operators (e.g. %T>%) to get this done, but I haven't figured it out yet. I tried
((mtcars -> tmpcars) %<>% filter(cyl==6)) %>% count(mpg)
but R's parsing magic can't quite handle it.
I'm trying as per
dplyr mutate using variable columns
&
dplyr - mutate: use dynamic variable names
to use dynamic names in mutate. What I am trying to do is to normalize column data by groups subject to a minimum standard deviation. Each column has a different minimum standard deviation
e.g. (I omitted loops & map statements for convenience)
require(dplyr)
require(magrittr)
data(iris)
iris <- tbl_df(iris)
minsd <- c('Sepal.Length' = 0.8)
varname <- 'Sepal.Length'
iris %>% group_by(Species) %>% mutate(!!varname := mean(pluck(iris,varname),na.rm=T)/max(sd(pluck(iris,varname)),minsd[varname]))
I got the dynamic assignment & variable selection to work as suggested by the reference answers. But group_by() is not respected which, for me at least, is the main benefit of using dplyr here
desired answer is given by
iris %>% group_by(Species) %>% mutate(!!varname := mean(Sepal.Length,na.rm=T)/max(sd(Sepal.Length),minsd[varname]))
Is there a way around this?
I actually did not know much about pluck, so I don't know what went wrong, but I would go for this and this works:
iris %>%
group_by(Species) %>%
mutate(
!! varname :=
mean(!!as.name(varname), na.rm = T) /
max(sd(!!as.name(varname)),
minsd[varname])
)
Let me know if this isn't what you were looking for.
The other answer is obviously the best and it also solved a similar problem that I have encountered. For example, with !!as.name(), there is no need to use group_by_() (or group_by_at or arrange_() (or arrange_at()).
However, another way is to replace pluck(iris,varname) in your code with .data[[varname]]. The reason why pluck(iris,varname) does not work is that, I suppose, iris in pluck(iris,varname) is not grouped. However, .data refer to the tibble that executes mutate(), and so is grouped.
An alternative to as.name() is rlang::sym() from the rlang package.
The article on dplyr here says "[]" (square brackets) can be used to subset filtered Tibbles like this:
filter(mammals, adult_body_mass_g > 1e7)[ , 3]
But I am getting an "object not found" error.
Here is the replication of the error on a more known dataset "iris"
library(dplyr)
iris %>% filter(Sepal.Length>6) [,c(1:3)]
Error in filter_(.data, .dots = lazyeval::lazy_dots(...)) :
object 'Sepal.Length' not found
I also want to mention that I am deliberately not preferring to use the native subsetting in dplyr using select() as I need a vector output and not a data frame on a single column. Unfortunately, dplyr always forces a data frame output (for good reasons).
You need an extra pipe:
iris %>% filter(Sepal.Length>6) %>% .[,1:3]
Sorry, forgot the . before the brackets.
Note: Your code will probably be more readable if you stick to the tidyverse syntax and use select as the last operation.
iris %>%
filter(Sepal.Length > 6) %>%
select(1:3)
The dplyr-native way of doing this is to use select:
iris %>% filter(Sepal.Length > 6) %>% select(1:3)
You could also use {} so that the filtering is done before [ is applied:
{iris %>% filter(Sepal.Length>6)}[,c(1:3)]
Or, as suggested in another answer, use the . notation to indicated where the data should go in relation to [:
iris %>% filter(Sepal.Length>6) %>% .[,1:3]
You can also load magrittr explicitly and use extract, which is a "pipe-able" version of [:
library(magrittr)
iris %>% filter(Sepal.Length>6) %>% extract( ,1:3)
The blog entry you reference is old in dplyr time - about 3 years old. dplyr has been changing a lot. I don't know whether the blog's suggestion worked at the time it was written or not, but I'd recommend finding more recent sources to learn about this frequently changing package.
I recently discovered the pipe operator %>%, which can make code more readable. Here is my MWE.
library(dplyr) # for the pipe operator
library(lsr) # for the cohensD function
set.seed(4) # make it reproducible
dat <- data.frame( # create data frame
subj = c(1:6),
pre = sample(1:6, replace = TRUE),
post = sample(1:6, replace = TRUE)
)
dat %>% select(pre, post) %>% sapply(., mean) # works as expected
However, I struggle using the pipe operator in this particular case
dat %>% select(pre, post) %>% cohensD(.$pre, .$post) # piping returns an error
cohensD(dat$pre, dat$post) # classical way works fine
Why is it not possible to subset columns using the placeholder .in combination with $? Is it worthwhile to write this line using a pipe operator %>%, or does it complicate syntax? The classical way of writing this seems more concise.
This would work:
dat %>% select(pre, post) %>% {cohensD(.$pre, .$post)}
Wrapping the last call into curly braces makes it be treated like an expression and not a function call. When you pipe something into an expression, the . gets replaced as expected. I often use this trick to call a function which does not interface well with piping.
What is inside the braces happens to be a function call but could really be any expression of . .
Since you're going from a bunch of data into one (row of) value(s), you're summarizing. in a dplyr pipeline you can then use the summarize function, within the summarize function you don't need to subset and can just call pre and post
Like so:
dat %>% select(pre, post) %>% summarize(CD = cohensD(pre, post))
(The select statement isn't actually necessary in this case, but I left it in to show how this works in a pipeline)
It doesn't work because the . operator has to be used directly as an argument, and not inside a nested function (like $...) in your call.
If you really want to use piping, you can do it with the formula interface, but with a little reshaping before (melt is from reshape2 package):
dat %>% select(pre, post) %>% melt %>% cohensD(value~variable, .)
#### [1] 0.8115027
Is it possible to set all column names to upper or lower within a dplyr or magrittr chain?
In the example below I load the data and then, using a magrittr pipe, chain it through to my dplyr mutations. In the 4th line I use the tolower function , but this is for a different purpose: to create a new variable with lowercase observations.
mydata <- read.csv('myfile.csv') %>%
mutate(Year = mdy_hms(DATE),
Reference = (REFNUM),
Event = tolower(EVENT)
I'm obviously looking for something like colnames = tolower but know this doesn't work/exist.
I note the dplyr rename function but this isn't really helpful.
In magrittr the colname options are:
set_colnames instead of base R's colnames<-
set_names instead of base R's names<-
I've tried numerous permutations with these but no dice.
Obviously this is very simple in base r.
names(mydata) <- tolower(names(mydata))
However it seems incongruous with the dplyr/magrittr philosophies that you'd have to do that as a clunky one liner, before moving on to an elegant chain of dplyr/magrittr code.
with {dplyr} we can do :
mydata %>% rename_all(tolower)
or
mydata %>% rename(across(everything(), tolower))
iris %>% setNames(tolower(names(.))) %>% head
Or equivalently use replacement function in non-replacement form:
iris %>% `names<-`(tolower(names(.))) %>% head
iris %>% `colnames<-`(tolower(names(.))) %>% head # if you really want to use `colnames<-`
Using magrittr's "compound assignment pipe-operator" %<>% might be, if I understand your question correctly, an even more succinct option.
library("magrittr")
names(iris) %<>% tolower
?`%<>%` # for more
mtcars %>%
set_colnames(value = casefold(colnames(.), upper = FALSE)) %>%
head
casefold is available in base R and can convert in both direction, i.e. can convert to either all upper case or all lower case by using the flag upper, as need might be.
Also colnames() will use only column headers for case conversion.
You could also define a function:
upcase <- function(df) {
names(df) <- toupper(names(df))
df
}
library(dplyr)
mtcars %>% upcase %>% select(MPG)