dplyr mutate using character vector of column names - r

data is a data.frame containing: date, a, b, c, d columns. Last 4 is numeric
Y.columns <- c("a")
X.columns <- c("b","c","d")
what i need:
data.mutated <- data %>%
mutate(Y = a, X = b+c+d) %>%
select(date,Y,X)
but i would like to pass mutate arguments from character vector,
i tried the following:
Y.string <- paste(Y.columns, collapse='+')
X.string <- paste(X.columns, collapse='+')
data.mutated <- data %>%
mutate(Y = UQ(Y.string), X = UQ(X.string)) %>%
select(date,Y,X)
But it didn't work. any help is appreciated.

To use tidyeval with UQ, you need to first parse your expressions to a quosure with parse_quosure from rlang (Using mtcars as example, since OP's question is not reproducible):
Y.columns <- c("cyl")
X.columns <- c("disp","hp","drat")
Y.string <- paste(Y.columns, collapse='+')
X.string <- paste(X.columns, collapse='+')
library(dplyr)
library(rlang)
mtcars %>%
mutate(Y = UQ(parse_quosure(Y.string)),
X = UQ(parse_quosure(X.string))) %>%
select(Y,X)
or with !!:
mtcars %>%
mutate(Y = !!parse_quosure(Y.string),
X = !!parse_quosure(X.string)) %>%
select(Y,X)
Result:
Y X
1 6 273.90
2 6 273.90
3 4 204.85
4 6 371.08
5 8 538.15
6 6 332.76
7 8 608.21
8 4 212.39
9 4 239.72
10 6 294.52
...
Note:
mutate_ has now deprecated, so I think tidyeval with quosure's and UQ is the new way to go.

Related

dplyr arrange data frame based on column position with new R pipe |>

I want to sort my data frame based on a column that I pass to dplyr's arrange function with its position. This works as long as I'm using the "old" tidyverse/magrittr pipe operator. However, changing it to the new R pipe returns an error:
df <- data.frame(x = c(3, 4, 1, 5),
y = 1:4)
# Works
df %>%
arrange(.[1])
x y
1 1 3
2 3 1
3 4 2
4 5 4
# Throws error
df |>
arrange(.[1])
Error:
! arrange() failed at implicit mutate() step.
Problem with `mutate()` column `..1`.
i `..1 = .[1]`.
x object '.' not found
Run `rlang::last_error()` to see where the error occurred.
How can I still arrange by column position when using the new R pipe?
I realize that the |> operator does not accept the "." as an argument, but I still don't know how else I could address the data then.
Update:
This seems to work, but wondering if there is something more straightforward:
df |>
arrange(cur_data() |> select(1))
You can pass a lambda function (suggestion by #Martin Morgan in the comments to specify the columns position instead of names):
df <- data.frame(x = c(3, 4, 1, 5),
y = 1:4)
df |>
(\(z) arrange(z, z[[1]]))()
# x y
# 1 1 3
# 2 3 1
# 3 4 2
# 4 5 4
With order, this looks okay:
df |>
(\(z) z[order(z[,1]), ])()
x y
3 1 3
1 3 1
2 4 2
4 5 4
|> does not support dot but tidyverse functions do support cur_data().
# 1
df |> arrange(cur_data()[1])
Another possibility is the Bizarro pipe which is not really a pipe but does look like one and uses only base R.
# 2
df ->.; arrange(., .[1])
or any of these work-arounds
# 3
arrange1 <- function(.) arrange(., .[1])
df |> arrange1()
# 4
df |> (function(.) arrange(., .[1]))()
# 5
df |> list() |> setNames(".") |> with(arrange(., .[1]))
# 6
with. <- function(data, expr, ...) {
eval(substitute(expr), list(. = data), enclos = parent.frame())
}
df |> with.(arrange(., .[1]))
# these hard code variable names so are not directly comparable
# but can be used if that is ok
# 7
df |> arrange(x)
# 8
df |> with(arrange(data.frame(x, y), x))

R functions: use argument as name within the function

I have created a user function in R to multiply two columns to create a third (within a series), so this function creates 4 new columns.
create_mult_var <- function(.data){
.data <-.data%>%
mutate(Q4_1_4 = Q4_1_2_TEXT*Q4_1_3_TEXT) %>%
mutate(Q4_2_4 = Q4_2_2_TEXT*Q4_2_3_TEXT) %>%
mutate(Q4_3_4 = Q4_3_2_TEXT*Q4_3_3_TEXT) %>%
mutate(Q4_4_4 = Q4_4_2_TEXT*Q4_4_3_TEXT)
.data
I am trying to modify this function so that I can apply it to a different set of columns that match the same type. For instance, if I want to repeat this on the series of columns that start with "Q8", I know I can do the following:
create_mult_var_2 <- function(.data){
.data <-.data%>%
mutate(Q8_1_4 = Q8_1_2_TEXT*Q8_1_3_TEXT) %>%
mutate(Q8_2_4 = Q8_2_2_TEXT*Q8_2_3_TEXT) %>%
mutate(Q8_3_4 = Q8_3_2_TEXT*Q8_3_3_TEXT) %>%
mutate(Q8_4_4 = Q8_4_2_TEXT*Q8_4_3_TEXT)
.data
}
Instead of creating a different function for each of the Q4 and Q8 series, I would like to add the "Q4" or "Q8" as an argument. I tried this below, but R would not accept this as an argument this way. Is there a way to achieve my desired outcome?
This does not work:
create_mult_var <- function(.data,question){
.data <-.data%>%
mutate(question_1_4 = question_1_2_TEXT*question_1_3_TEXT) %>%
mutate(question_2_4 = question_2_2_TEXT*question_2_3_TEXT) %>%
mutate(question_3_4 = question_3_2_TEXT*question_3_3_TEXT) %>%
mutate(question_4_4 = question_4_2_TEXT*question_4_3_TEXT)
.data
}
I would like to modify the function, such as that I can use the following:
data_in %>% create_mult_var("Q4") %>% create_mult_var("Q8")
Or something similar to create these new columns? Any suggestions are appreciated! Thank you! If this is a bad idea, any suggestions for how I should approach this?
We could use paste and evaluate with !!
create_mult_var_2 <- function(.data, pat){
.data <-.data%>%
mutate(!! str_c(pat, '_1_4') :=
!! rlang::sym(str_c(pat, '_1_2_TEXT')) *
!! rlang::sym(str_c(pat, '_1_3_TEXT')))
.data
}
create_mult_var_2(data_in, "Q4")
# Q4_1_2_TEXT Q4_1_3_TEXT Q4_1_4
#1 1 5 5
#2 2 6 12
#3 3 7 21
#4 4 8 32
Also, based on the pattern showed, this can be automated as well
library(dplyr)
library(stringr)
create_mult_var_3 <- function(.data, pat) {
.data %>%
mutate(across(matches(str_c("^", pat, "_\\d+_2")), ~
.* get(str_replace(cur_column(), '_2_TEXT', '_3_TEXT')),
.names = '{.col}_new')) %>%
rename_at(vars(ends_with('_new')),
~ str_replace(., '\\d+_TEXT_new', '4'))
}
-testing
create_mult_var_3(data_in, "Q4")
# Q4_1_2_TEXT Q4_1_3_TEXT Q4_1_4
#1 1 5 5
#2 2 6 12
#3 3 7 21
#4 4 8 32
data
data_in <- data.frame(Q4_1_2_TEXT = 1:4, Q4_1_3_TEXT = 5:8)

Transform data to data.frame with the pipe operator

Lets say i have the following data: x <- 1:2.
My desired output is a data.frame() like the following:
a b
1 2
With base R i would do something along:
df <- data.frame(t(x))
colnames(df) <- c("a", "b")
Question: How would i do this with the pipe operator?
What i tried so far:
library(magrittr)
x %>% data.frame(a = .[1], b = .[2])
After the transpose, convert to tibble with as_tibble and change the column names with set_names
library(dplyr)
library(tibble)
x %>%
t %>%
as_tibble(.name_repair = "unique") %>%
setNames(c("a", "b"))
# A tibble: 1 x 2
# a b
# <int> <int>
#1 1 2
Or another option if we want to use the OP's syntax would be to wrap the code with {}
x %>%
{data.frame(a = .[1], b = .[2])}

dplyr pipeline in a function

I'm trying to put a dplyr pipeline in a function but after reading the vignette multiple times as well as the tidy evaluation (https://tidyeval.tidyverse.org/dplyr.html).
I still can't get it to work...
#Sample data:
dat <- read.table(text = "A ID B
1 X 83
2 X NA
3 X NA
4 Y NA
5 X 2
6 Y 2
12 Y 10
7 Y 18
8 Y 85", header = TRUE)
# What I'm trying to do:
x <- dat %>% filter(!is.na(B)) %>% count('ID') %>% filter(freq>3)
x$ID
# Now in a function:
n_occurences <- function(df, n, column){
# Group by ID and return IDs with number of non-na > n in column
column <- enquo(column)
x <- df %>%
filter(!is.na(!!column)) %>%
count('ID') %>% filter(freq>n)
x$ID
}
# Let's try:
col <- 'B'
n_occurences(dat, n=3, column = col)
There is no error, but the output is wrong. This as something to do with the tidy evaluation, but I just can't get my head around it.
With rlang_0.40, we can do this much easier by using the {{...}} or curly-curly operator
library(rlang)
library(dplyr)
n_occurences <- function(df, n1, column){
df %>%
filter(!is.na({{column}})) %>%
count(ID) %>%
filter(n > n1) %>%
pull(ID)
}
n_occurences(dat, n1 = 3, column = B)
#[1] Y
#Levels: X Y
If we intend to pass a quoted string, convert it to symbol (sym) and then do the evaluation (!!)
n_occurences <- function(df, n1, column){
column <- rlang::sym(column)
df %>%
filter(!is.na(!!column)) %>%
count(ID) %>%
filter(n > n1) %>%
pull(ID)
}
col <- 'B'
n_occurences(dat, n1=3, column = col)
#[1] Y
#Levels: X Y

Renaming columns according to vector inside pipe

I have a data.frame df with columns A and B:
df <- data.frame(A = 1:5, B = 11:15)
There's another data.frame, df2, which I'm building by various calculations that ends up having generic column names X1 and X2, which I cannot control directly (because it passes through being a matrix at one point). So it ends up being something like:
mtrx <- matrix(1:10, ncol = 2)
mtrx %>% data.frame()
I would like to rename the columns in df2 to be the same as df. I could, of course, do it after I finish building df2 with a simple assigning:
names(df2)<-names(df)
My question is - is there a way to do this directly within the pipe? I can't seem to use dplyr::rename, because these have to be in the form of newname=oldname, and I can't seem to vectorize it. Same goes to the data.frame call itself - I can't just give it a vector of column names, as far as I can tell. Is there another option I'm missing? What I'm hoping for is something like
mtrx %>% data.frame() %>% rename(names(df))
but this doesn't work - gives error Error: All arguments must be named.
Cheers!
You can use setNames
mtrx %>%
data.frame() %>%
setNames(., nm = names(df))
# A B
#1 1 6
#2 2 7
#3 3 8
#4 4 9
#5 5 10
Or use purrr's equivalent set_names
mtrx %>%
data.frame() %>%
purrr::set_names(., nm = names(df))
A third option is "names<-"
mtrx %>%
data.frame() %>%
"names<-"(names(df))
We can use rename_all from tidyverse
library(tidyverse)
mtrx %>%
as.data.frame %>%
rename_all(~ names(df))
# A B
# 1 1 6
# 2 2 7
# 3 3 8
# 4 4 9
# 5 5 10

Resources