Mutating columns with reduce2 and rlang - r

I am trying the following:
library(tidyverse)
library(rlang)
df <- data.frame(a = 1:2)
reduce2(list(df, df, df), letters[2:3], ~ mutate(.x, !!(.y) := 2:3))
#> Error in local_error_context(dots = dots, .index = i, mask = mask): promise already under evaluation: recursive default argument reference or earlier problems?
I do know many ways of mutating columns to a dataframe, but I am trying to learn rlang.
The expected output:
a b c
1 1 2 2
2 2 3 3

A method to combine purrr::reduce() and rlang is:
library(dplyr)
library(purrr)
reduce(letters[2:3], ~ .x %>% mutate(!!.y := 2:3), .init = df)
# a b c
# 1 1 2 2
# 2 2 3 3
where the trick is to assign df to the argument .init.

I don't think reduce2 is the correct function here, since you aren't actually using any items an the list of data frames after the first iteration. The function that is passed to reduce2 takes three arguments - the first is the object being reduced, the second is the next item in .x and the third being the next item in .y.
That means you can still use reduce2 if you want, by doing:
reduce2(.x = list(df, df, df), .y = letters[2:3],
.f = function(A, B, C) mutate(A, {{C}} := 2:3))
#> a b c
#> 1 1 2 2
#> 2 2 3 3
But note that you are not using the second argument in the function body. You could do it just with reduce:
reduce(list(df, 'b', 'c'), ~ mutate(.x, !!.y := 2:3))

I am sure you are aware that you can do df[letters[2:3]] <- 2:3 to achieve the same output but I don't think this is what you are looking for.
To use purrr and rlang you may use -
library(dplyr)
library(purrr)
bind_cols(df, map_dfc(letters[2:3], ~df %>% transmute(!!.x := 2:3)))
# a b c
#1 1 2 2
#2 2 3 3
And another way would be -
map(letters[2:3], ~df %>% mutate(!!.x := 2:3)) %>% reduce(inner_join, by = 'a')

Related

dplyr arrange data frame based on column position with new R pipe |>

I want to sort my data frame based on a column that I pass to dplyr's arrange function with its position. This works as long as I'm using the "old" tidyverse/magrittr pipe operator. However, changing it to the new R pipe returns an error:
df <- data.frame(x = c(3, 4, 1, 5),
y = 1:4)
# Works
df %>%
arrange(.[1])
x y
1 1 3
2 3 1
3 4 2
4 5 4
# Throws error
df |>
arrange(.[1])
Error:
! arrange() failed at implicit mutate() step.
Problem with `mutate()` column `..1`.
i `..1 = .[1]`.
x object '.' not found
Run `rlang::last_error()` to see where the error occurred.
How can I still arrange by column position when using the new R pipe?
I realize that the |> operator does not accept the "." as an argument, but I still don't know how else I could address the data then.
Update:
This seems to work, but wondering if there is something more straightforward:
df |>
arrange(cur_data() |> select(1))
You can pass a lambda function (suggestion by #Martin Morgan in the comments to specify the columns position instead of names):
df <- data.frame(x = c(3, 4, 1, 5),
y = 1:4)
df |>
(\(z) arrange(z, z[[1]]))()
# x y
# 1 1 3
# 2 3 1
# 3 4 2
# 4 5 4
With order, this looks okay:
df |>
(\(z) z[order(z[,1]), ])()
x y
3 1 3
1 3 1
2 4 2
4 5 4
|> does not support dot but tidyverse functions do support cur_data().
# 1
df |> arrange(cur_data()[1])
Another possibility is the Bizarro pipe which is not really a pipe but does look like one and uses only base R.
# 2
df ->.; arrange(., .[1])
or any of these work-arounds
# 3
arrange1 <- function(.) arrange(., .[1])
df |> arrange1()
# 4
df |> (function(.) arrange(., .[1]))()
# 5
df |> list() |> setNames(".") |> with(arrange(., .[1]))
# 6
with. <- function(data, expr, ...) {
eval(substitute(expr), list(. = data), enclos = parent.frame())
}
df |> with.(arrange(., .[1]))
# these hard code variable names so are not directly comparable
# but can be used if that is ok
# 7
df |> arrange(x)
# 8
df |> with(arrange(data.frame(x, y), x))

R: rowwise nth element ordered_by row values

I have this input:
t <- data.frame(x=c(1,2,8,4), y=c(2,3,4,5), k=c(3,4,5,1))
And want to have the rowwise nth-lowest element of the dataframe ordered by the rowwise values, so that the output is something like this (example for nth_element = 2):
[1] 2 3 5 4
I tried a function like this:
apply(t, 1, nth, n=1, order_by = .)
But this does not work. Two questions:
What should I type in the order_by gument to make this function work?
Which is the best way to summarise rows with an own summary function if I don't want to mention the column names in the rowwise summary function?
Sidenote:
I don't want to mention the column names specifically, I want the function to use all rows in the dataset.
I tried the rownth function from the Rfast package but it only provides one result. Does anybody know what I do wrong?
We can use apply and sort to do this.
d <- data.frame(x=c(1,2,8,4), y=c(2,3,4,5), k=c(3,4,5,1))
nth_lowest <- 2
apply(d, 1, FUN = function(x) sort(x)[nth_lowest])
# [1] 2 3 5 4
Note that I am calling the data d instead of t. t is already a reserved name in R (matrix transpose function).
Not as elegant as #bouncyball's answer, but using dplyr (and tidyr), one possibility is to do:
library(dplyr)
library(tidyr)
t %>% mutate(Row = row_number()) %>%
pivot_longer(-Row, names_to = "Col", values_to = "Val") %>%
group_by(Row) %>%
arrange(Val) %>%
slice(2) %>%
select(Val)
Adding missing grouping variables: `Row`
# A tibble: 4 x 2
# Groups: Row [4]
Row Val
<int> <dbl>
1 1 2
2 2 3
3 3 5
4 4 4
Using Rfast you could reduce run time for big matrices and for matrices only.
d <- data.frame(x=c(1,2,8,4), y=c(2,3,4,5), k=c(3,4,5,1))
d<- Rfast::data.frame.to_matrix(d)
nth_lowests <- rep(2,ncol(d))
Rfast::rownth(d,nth_lowests)
# [1] 2 3 5 4
You could also use the parallel version of Rfast::rownth

Dplyr _if verbs with predicate function referring to the column names & multiple conditions?

I'm trying to use mutate_if or select_if, etc, verbs with column names within the predicate function.
See example below:
> btest <- data.frame(
+ sjr_first = c('1','2','3',NA, NA, '6'),
+ jcr_first = c('1','2','3',NA, NA, '6'),
+ sjr_second = LETTERS[1:6],
+ jcr_second = LETTERS[1:6],
+ sjr_third = as.character(seq(6)),
+ jcr_fourth = seq(6) + 5,
+ stringsAsFactors = FALSE)
>
> btest %>% select_if(.predicate = ~ str_match(names(.), 'jcr'))
Error in selected[[i]] <- eval_tidy(.p(column, ...)) :
replacement has length zero
I'm aware I could use btest %>% select_at(vars(dplyr::matches('jcr'))) but my goal here is actually to combine the column name condition with another condition (e.g. is.numeric) using mutate_if() to operate on a subset of my columns. However I'm not sure how to get the first part with the name matching to work...
You can do:
btest %>%
select_if(str_detect(names(.), "jcr") & sapply(., is.numeric))
jcr_fourth
1 6
2 7
3 8
4 9
5 10
6 11
Tidyverse solution:
require(dplyr)
# Return (get):
btest %>%
select_if(grepl("jcr", names(.)) & sapply(., is.numeric))
# Mutate (set):
btest %>%
mutate_if(grepl("jcr", names(.)) & sapply(., is.numeric), funs(paste0("whatever", .)))
Base R solution:
# Return (get):
btest[,grepl("jcr", names(btest)) & sapply(btest, is.numeric), drop = FALSE]
# Mutate (set):
btest[,grepl("jcr", names(btest)) & sapply(btest, is.numeric)] <- paste0("whatever", unlist(btest[,grepl("jcr", names(btest)) & sapply(btest, is.numeric)]))
You could separate two select_if calls
library(dplyr)
library(stringr)
btest %>% select_if(str_detect(names(.), 'jcr')) %>% select_if(is.numeric)
# jcr_fourth
#1 6
#2 7
#3 8
#4 9
#5 10
#6 11
We cannot combine the two calls because the first one operates on entire dataframe together whereas the second one operates column-wise.

Renaming columns according to vector inside pipe

I have a data.frame df with columns A and B:
df <- data.frame(A = 1:5, B = 11:15)
There's another data.frame, df2, which I'm building by various calculations that ends up having generic column names X1 and X2, which I cannot control directly (because it passes through being a matrix at one point). So it ends up being something like:
mtrx <- matrix(1:10, ncol = 2)
mtrx %>% data.frame()
I would like to rename the columns in df2 to be the same as df. I could, of course, do it after I finish building df2 with a simple assigning:
names(df2)<-names(df)
My question is - is there a way to do this directly within the pipe? I can't seem to use dplyr::rename, because these have to be in the form of newname=oldname, and I can't seem to vectorize it. Same goes to the data.frame call itself - I can't just give it a vector of column names, as far as I can tell. Is there another option I'm missing? What I'm hoping for is something like
mtrx %>% data.frame() %>% rename(names(df))
but this doesn't work - gives error Error: All arguments must be named.
Cheers!
You can use setNames
mtrx %>%
data.frame() %>%
setNames(., nm = names(df))
# A B
#1 1 6
#2 2 7
#3 3 8
#4 4 9
#5 5 10
Or use purrr's equivalent set_names
mtrx %>%
data.frame() %>%
purrr::set_names(., nm = names(df))
A third option is "names<-"
mtrx %>%
data.frame() %>%
"names<-"(names(df))
We can use rename_all from tidyverse
library(tidyverse)
mtrx %>%
as.data.frame %>%
rename_all(~ names(df))
# A B
# 1 1 6
# 2 2 7
# 3 3 8
# 4 4 9
# 5 5 10

dplyr summarise when function return is vector-valued?

The dplyr::summarize() function can apply arbitrary functions over the data, but it seems that function must return a scalar value. I'm curious if there is a reasonable way to handle functions that return a vector value without making multiple calls to the function.
Here's a somewhat silly minimal example. Consider a function that gives multiple values, such as:
f <- function(x,y){
coef(lm(x ~ y, data.frame(x=x,y=y)))
}
and data that looks like:
df <- data.frame(group=c('A','A','A','A','B','B','B','B','C','C','C','C'), x=rnorm(12,1,1), y=rnorm(12,1,1))
I'd like to do something like:
df %>%
group_by(group) %>%
summarise(f(x,y))
and get back a table that has 2 columns added for each of the returned values instead of the usual 1 column. Instead, this errors with: Expecting single value
Of course we can get multiple values from dlpyr::summarise() by giving the function argument multiple times:
f1 <- function(x,y) coef(lm(x ~ y, data.frame(x=x,y=y)))[[1]]
f2 <- function(x,y) coef(lm(x ~ y, data.frame(x=x,y=y)))[[2]]
df %>%
group_by(group) %>%
summarise(a = f1(x,y), b = f2(x,y))
This gives the desired output:
group a b
1 A 1.7957245 -0.339992915
2 B 0.5283379 -0.004325209
3 C 1.0797647 -0.074393457
but coding in this way is ridiculously crude and ugly.
data.table handles this case more succinctly:
dt <- as.data.table(df)
dt[, f(x,y), by="group"]
but creates an output that extend the table using additional rows instead of additional columns, resulting in an output that is both confusing and harder to work with:
group V1
1: A 1.795724536
2: A -0.339992915
3: B 0.528337890
4: B -0.004325209
5: C 1.079764710
6: C -0.074393457
Of course there are more classic apply strategies we could use here,
sapply(levels(df$group), function(x) coef(lm(x~y, df[df$group == x, ])))
A B C
(Intercept) 1.7957245 0.528337890 1.07976471
y -0.3399929 -0.004325209 -0.07439346
but this sacrifices both the elegance and I suspect the speed of the grouping. In particular, note that we cannot use our pre-defined function f in this case, but have to hard code the grouping into the function definition.
Is there a dplyr function for handling this case? If not, is there a more elegant way to handle this process of evaluating vector-valued functions over a data.frame by group?
You could try do
library(dplyr)
df %>%
group_by(group) %>%
do(setNames(data.frame(t(f(.$x, .$y))), letters[1:2]))
# group a b
#1 A 0.8983217 -0.04108092
#2 B 0.8945354 0.44905220
#3 C 1.2244023 -1.00715248
The output based on f1 and f2 are
df %>%
group_by(group) %>%
summarise(a = f1(x,y), b = f2(x,y))
# group a b
#1 A 0.8983217 -0.04108092
#2 B 0.8945354 0.44905220
#3 C 1.2244023 -1.00715248
Update
If you are using data.table, the option to get similar result is
library(data.table)
setnames(setDT(df)[, as.list(f(x,y)) , group], 2:3, c('a', 'b'))[]
This is why I still love plyr::ddply():
library(plyr)
f <- function(z) setNames(coef(lm(x ~ y, z)), c("a", "b"))
ddply(df, ~ group, f)
# group a b
# 1 A 0.5213133 0.04624656
# 2 B 0.3020656 0.01450137
# 3 C 0.2189537 0.22998823

Resources