dplyr arrange data frame based on column position with new R pipe |> - r

I want to sort my data frame based on a column that I pass to dplyr's arrange function with its position. This works as long as I'm using the "old" tidyverse/magrittr pipe operator. However, changing it to the new R pipe returns an error:
df <- data.frame(x = c(3, 4, 1, 5),
y = 1:4)
# Works
df %>%
arrange(.[1])
x y
1 1 3
2 3 1
3 4 2
4 5 4
# Throws error
df |>
arrange(.[1])
Error:
! arrange() failed at implicit mutate() step.
Problem with `mutate()` column `..1`.
i `..1 = .[1]`.
x object '.' not found
Run `rlang::last_error()` to see where the error occurred.
How can I still arrange by column position when using the new R pipe?
I realize that the |> operator does not accept the "." as an argument, but I still don't know how else I could address the data then.
Update:
This seems to work, but wondering if there is something more straightforward:
df |>
arrange(cur_data() |> select(1))

You can pass a lambda function (suggestion by #Martin Morgan in the comments to specify the columns position instead of names):
df <- data.frame(x = c(3, 4, 1, 5),
y = 1:4)
df |>
(\(z) arrange(z, z[[1]]))()
# x y
# 1 1 3
# 2 3 1
# 3 4 2
# 4 5 4
With order, this looks okay:
df |>
(\(z) z[order(z[,1]), ])()
x y
3 1 3
1 3 1
2 4 2
4 5 4

|> does not support dot but tidyverse functions do support cur_data().
# 1
df |> arrange(cur_data()[1])
Another possibility is the Bizarro pipe which is not really a pipe but does look like one and uses only base R.
# 2
df ->.; arrange(., .[1])
or any of these work-arounds
# 3
arrange1 <- function(.) arrange(., .[1])
df |> arrange1()
# 4
df |> (function(.) arrange(., .[1]))()
# 5
df |> list() |> setNames(".") |> with(arrange(., .[1]))
# 6
with. <- function(data, expr, ...) {
eval(substitute(expr), list(. = data), enclos = parent.frame())
}
df |> with.(arrange(., .[1]))
# these hard code variable names so are not directly comparable
# but can be used if that is ok
# 7
df |> arrange(x)
# 8
df |> with(arrange(data.frame(x, y), x))

Related

Mutating columns with reduce2 and rlang

I am trying the following:
library(tidyverse)
library(rlang)
df <- data.frame(a = 1:2)
reduce2(list(df, df, df), letters[2:3], ~ mutate(.x, !!(.y) := 2:3))
#> Error in local_error_context(dots = dots, .index = i, mask = mask): promise already under evaluation: recursive default argument reference or earlier problems?
I do know many ways of mutating columns to a dataframe, but I am trying to learn rlang.
The expected output:
a b c
1 1 2 2
2 2 3 3
A method to combine purrr::reduce() and rlang is:
library(dplyr)
library(purrr)
reduce(letters[2:3], ~ .x %>% mutate(!!.y := 2:3), .init = df)
# a b c
# 1 1 2 2
# 2 2 3 3
where the trick is to assign df to the argument .init.
I don't think reduce2 is the correct function here, since you aren't actually using any items an the list of data frames after the first iteration. The function that is passed to reduce2 takes three arguments - the first is the object being reduced, the second is the next item in .x and the third being the next item in .y.
That means you can still use reduce2 if you want, by doing:
reduce2(.x = list(df, df, df), .y = letters[2:3],
.f = function(A, B, C) mutate(A, {{C}} := 2:3))
#> a b c
#> 1 1 2 2
#> 2 2 3 3
But note that you are not using the second argument in the function body. You could do it just with reduce:
reduce(list(df, 'b', 'c'), ~ mutate(.x, !!.y := 2:3))
I am sure you are aware that you can do df[letters[2:3]] <- 2:3 to achieve the same output but I don't think this is what you are looking for.
To use purrr and rlang you may use -
library(dplyr)
library(purrr)
bind_cols(df, map_dfc(letters[2:3], ~df %>% transmute(!!.x := 2:3)))
# a b c
#1 1 2 2
#2 2 3 3
And another way would be -
map(letters[2:3], ~df %>% mutate(!!.x := 2:3)) %>% reduce(inner_join, by = 'a')

how to create for loop in r

I'd like to create for loop throgh norm1~norm31.
norm1 = norm1 %>% group_by(ID_Pair) %>% summarize(Norm_avg_PCK=mean(ID_avg_PCK,na.rm=T)
, Norm_avg_BT=mean(ID_avg_BT, na.rm=T))
norm2 = norm2 %>% group_by(ID_Pair) %>% summarize(Norm_avg_PCK=mean(ID_avg_PCK,na.rm=T)
, Norm_avg_BT=mean(ID_avg_BT, na.rm=T))
I'd like to create norm1~norm31 as above.
I've tried this code but keep getting the error message.
for (i in 1:31){
nam=paste("norm",i,sep="")
assign(nam,nam %>% group_by(ID_Pair) %>% summarize(Norm_avg_PCK=mean(ID_avg_PCK,na.rm=T)
, Norm_avg_BT=mean(ID_avg_BT, na.rm=T)))}
[Error]
Error in UseMethod ("group_by_"): no applicable methid for "group_by_" applied to an object of class "Character"
There may be better ways to organize your data. But if you like, using get() may solve your problem.
for (i in 1:31){
nam=get(paste("norm",i,sep=""))
assign(nam,nam %>% group_by(ID_Pair) %>% summarize(Norm_avg_PCK=mean(ID_avg_PCK,na.rm=T)
, Norm_avg_BT=mean(ID_avg_BT, na.rm=T)))
}
First, let's put all your norms into a single list:
normlist<-lapply(paste0("norm",1:31),get)
Now, we can use lapply to do your thing for each of the norms:
thing<-function(x) {x %>% group_by(ID_Pair) %>%
summarize(Norm_avg_PCK=mean(ID_avg_PCK,na.rm=T),
Norm_avg_BT=mean(ID_avg_BT, na.rm=T))}
lapply(normlist,thing)
Example with some fake data:
a1<-data.frame(id=rep(letters[1:5],3),nums=1:15)
a2<-data.frame(id=rep(letters[6:10],3),nums=16:30)
alist<-lapply(paste0("a",1:2),get)
thing<-function(x) {x %>% group_by(id) %>% summarize(means=mean(nums))}
lapply(alist,thing)
[[1]]
# A tibble: 5 x 2
id means
<fct> <dbl>
1 a 6.
2 b 7.
3 c 8.
4 d 9.
5 e 10.
[[2]]
# A tibble: 5 x 2
id means
<fct> <dbl>
1 f 21.
2 g 22.
3 h 23.
4 i 24.
5 j 25.
If you want to preserve the names, you can use sapply with simplify=FALSE instead of lapply.

dplyr mutate using character vector of column names

data is a data.frame containing: date, a, b, c, d columns. Last 4 is numeric
Y.columns <- c("a")
X.columns <- c("b","c","d")
what i need:
data.mutated <- data %>%
mutate(Y = a, X = b+c+d) %>%
select(date,Y,X)
but i would like to pass mutate arguments from character vector,
i tried the following:
Y.string <- paste(Y.columns, collapse='+')
X.string <- paste(X.columns, collapse='+')
data.mutated <- data %>%
mutate(Y = UQ(Y.string), X = UQ(X.string)) %>%
select(date,Y,X)
But it didn't work. any help is appreciated.
To use tidyeval with UQ, you need to first parse your expressions to a quosure with parse_quosure from rlang (Using mtcars as example, since OP's question is not reproducible):
Y.columns <- c("cyl")
X.columns <- c("disp","hp","drat")
Y.string <- paste(Y.columns, collapse='+')
X.string <- paste(X.columns, collapse='+')
library(dplyr)
library(rlang)
mtcars %>%
mutate(Y = UQ(parse_quosure(Y.string)),
X = UQ(parse_quosure(X.string))) %>%
select(Y,X)
or with !!:
mtcars %>%
mutate(Y = !!parse_quosure(Y.string),
X = !!parse_quosure(X.string)) %>%
select(Y,X)
Result:
Y X
1 6 273.90
2 6 273.90
3 4 204.85
4 6 371.08
5 8 538.15
6 6 332.76
7 8 608.21
8 4 212.39
9 4 239.72
10 6 294.52
...
Note:
mutate_ has now deprecated, so I think tidyeval with quosure's and UQ is the new way to go.

Can you use a data.frame twice in a dplyr chain? dplyr says " Error: cannot handle "

I am trying to use a data.frame twice in a dplyr chain. Here is a simple example that gives an error
df <- data.frame(Value=1:10,Type=rep(c("A","B"),5))
df %>%
group_by(Type) %>%
summarize(X=n()) %>%
mutate(df %>%filter(Value>2) %>%
group_by(Type) %>%
summarize(Y=sum(Value)))
Error: cannot handle
So the idea is that first a data.frame is created with two columns Value which is just some data and Type which indicates which group the value is from.
I then try to use summarize to get the number of objects in each group, and then mutate, using the object again to get the sum of the values, after the data has been filtered. However I get the Error: cannot handle. Any ideas what is happening here?
Desired Output:
Type X Y
A 5 24
B 5 28
You could try the following
df %>%
group_by(Type) %>%
summarise(X = n(), Y = sum(Value[Value > 2]))
# Source: local data frame [2 x 3]
#
# Type X Y
# 1 A 5 24
# 2 B 5 28
The idea is to filter only Value by the desired condition, instead the whole data set
And a bonus solution
library(data.table)
setDT(df)[, .(X = .N, Y = sum(Value[Value > 2])), by = Type]
# Type X Y
# 1: A 5 24
# 2: B 5 28
Was going to suggest that to #nongkrong but he deleted, with base R we could also do
aggregate(Value ~ Type, df, function(x) c(length(x), sum(x[x>2])))
# Type Value.1 Value.2
# 1 A 5 24
# 2 B 5 28
This is also pretty easy to do with ifelse()
df %>% group_by(Type) %>% summarize(X=n(),y=sum( ifelse(Value>2, Value, 0 )))
outputs:
Source: local data frame [2 x 3]
Type X y
1 A 5 24
2 B 5 28

Using select_if with variable name and type conditions

There are plenty of posts on using dplyr's select_if for multiple conditions. However, in whatever way, selecting for both is.factor and variable names has not worked for me so far.
Ultimately, I would like to select all factors in a df/tibble and exclude certain variables by name.
Example:
df <- tibble(A = factor(c(0,1,0,1)),
B = factor(c("Yes","No","Yes","No")),
C = c(1,2,3,4))
Various attempts:
Attempt 1
df %>%
select_if(function(col) is.factor(col) & !str_detect(names(col), "A"))
Error in selected[[i]] <- .p(.tbl[[tibble_vars[[i]]]], ...) : replacement has length zero
Attempt 2
df %>%
select_if(function(col) is.factor(col) & negate(str_detect(names(col)), "A"))
Error: Can't convert a logical vector to function Call `rlang::last_error()` to see a backtrace
Attempt 3
df %>%
select_if(function(col) is.factor(col) && !str_detect(names(col), "A"))
Error: Only strings can be converted to symbols Call `rlang::last_error()` to see a backtrace
Attempt 4
df %>%
select_if(is.factor(.) && !str_detect(names(.), "A"))
Error in tbl_if_vars(.tbl, .predicate, caller_env(), .include_group_vars = TRUE) : length(.p) == length(tibble_vars) is not TRUE
In the meanwhile, individual conditions have no problem working:
> df %>%
+ select_if(is.factor)
# A tibble: 4 x 2
A B
<fct> <fct>
1 0 Yes
2 1 No
3 0 Yes
4 1 No
> df %>%
+ select_if(!str_detect(names(.), "A"))
# A tibble: 4 x 2
B c
<fct> <dbl>
1 Yes 1
2 No 2
3 Yes 3
4 No 4
The problem probably lies here:
df %>%
select_if(function(col) !str_detect(names(col), "A"))
Error in selected[[i]] <- .p(.tbl[[tibble_vars[[i]]]], ...) : replacement has length zero
However, I have little clue how to fix this.
Perhaps I'm missing something, but is there any reason you couldn't do the following:
df <- tibble(A = factor(c(0,1,0,1)),
B = factor(c("Yes","No","Yes","No")),
C = c(1,2,3,4))
df %>% select_if(function(col) is.factor(col)) %>% select_if(!str_detect(names(.), "A"))
# A tibble: 4 x 1
B
<fct>
1 Yes
2 No
3 Yes
4 No
Just for completeness, not sure if it is acceptable for you, but base R may save you some pain here (a first, very quick shot):
df[, sapply(names(df),
function(coln, df) !grepl("A", coln) && is.factor(df[[coln]]), df = df),
drop = FALSE]

Resources