use pipe operator in mutate - r

I am wondering with the following code does not work. Because pipe is not compatible in mutate?
tibble(x = c(1,2), y = c(3,4)) %>%
mutate(z = {. %>% (function(tb) {tb$x + tb$y})})
I know a workaround is
tibble(x = c(1,2), y = c(3,4)) %>%
mutate(z = map_depth(., .depth = 0, function(tb) {tb$x + tb$y}))
or
tibble(x = c(1,2), y = c(3,4)) %>%
mutate(z = exec(function(tb) {tb$x + tb$y}, .))

This works as you are expecting:
tibble(x = c(1,2), y = c(3,4)) %>%
mutate(z = {(.) %>% (function(tb) {tb$x + tb$y})})
# # A tibble: 2 x 3
# x y z
# <dbl> <dbl> <dbl>
# 1 1 3 4
# 2 2 4 6
The problem isn't the pipe, but rather that . seems to be interpreted as a function (which throws off the pipe).
Edit:
#Aramis7d provided a link to the documentation for magrittr in a comment. The relevant line is:
Using the dot-place holder as lhs
When the dot is used as lhs, the result will be a functional sequence, i.e. a function which applies the entire chain of right-hand sides in turn to its input. See the examples.
So in your example, you were trying to assign an entire function to z within the mutate. You can see this based on the error message returned. By using (.), we force evaluation of the . and get results as expected.
tibble(x = c(1,2), y = c(3,4)) %>%
mutate(z = {. %>% (function(tb) {tb$x + tb$y})})
# Error: Column `z` is of unsupported type function

Interesting scenario indeed.
Without any more specific use cases, this seems like the %>% operator is not at all required even if you want to use anonymous functions within the mutate() .
tibble(x = c(1,2), y = c(3,4)) %>%
mutate(z = {(function(tb){tb$x + tb$y})(.)})
returns:
# A tibble: 2 x 3
x y z
<dbl> <dbl> <dbl>
1 1 3 4
2 2 4 6

Related

How to write a function in R where one of the inputs is meant to go in quotation marks? (" ")

Let's take this hypothetical code for instance:
```{r}
dataset_custom <- function(top, dataset, variable) {
{{dataset}} %>%
count({{variable}}) %>%
top_n(top, n) %>%
arrange(-n) %>%
left_join({{dataset}}, by = "{{variable}}")
}
```
I know this will return an error when I try to run (say) dataset_custom(5, dataset, variable) because of the by = "{{variable}}" in left_join. How do I get around this issue?
I know that when you left join and you want to join it by a particular variable, you do by = "variable" where variable has quotations around it, but how do I do it when I write it as a function and I want the stuff in the quotations to change as depending on the input to the function I'm trying to create?
Thank you!
It is useful if you provide some toy data, like the one found in the example of ?left_join. Note that left_join(df1, df1) is just df1. Instead, we can use a 2nd data argument.
df1 <- tibble(x = 1:3, y = c("a", "a", "b"))
df2 <- tibble(x = c(1, 1, 2), z = c("first", "second", "third"))
df1 %>% left_join(df2, by = "x")
f <- function(data, data2, variable) {
var <- deparse(substitute(variable))
data %>%
count({{ variable }}) %>%
arrange(-n) %>%
left_join(data2, by = var)
}
f(df1, df2, x)
x n z
<dbl> <int> <chr>
1 1 1 first
2 1 1 second
3 2 1 third
4 3 1 NA
# and
f(df2, df1, x)
x n y
<dbl> <int> <chr>
1 1 2 a
2 2 1 a
for this to work we need to use defusing operations so that the input is evaluated correctly. Figuratively speaking, using {{ }} as the by argument is like using a hammer instead of sandpaper for polishing things - it is a forcing operation where none should happen.

decimals not being read properly in R

I am trying to get the whole number 193525.0768 but it gets its decimals removed (?). Please explain it to me.
df <- tibble(
x = "193525.0768"
) %>%
mutate(x = as.numeric(x))
print(df, digits = 10) # decimals removed. I expect it to maintain the decimals numbers
# A tibble: 1 x 1
x
<dbl>
1 193525.
df[1,1][[1]] # decimals removed
# 193525
x <- "193525.0768"
print(as.numeric(x), digits = 10) # decimals not removed
# 193525.0768
You have a printing issue, not a reading-in issue. The tibble print method doesn't take a digits argument - see ?print.tbl for details. You can use print.data.frame explicitly to bypass the tibble print method and use the data.frame print method instead, which does take a digits argument:
tibble(x = "193525.0768") %>%
mutate(x = as.numeric(x)) %>%
print.data.frame(digits = 10)
# x
# 1 193525.0768
Or you can change the default with the pillar.sigfig option (which is mentioned in ?print.tbl). The default is 3 - which is confusing because if I were to take that literally I would expect 193525.0768 to print as 194000... there's probably documentation in the pillar package explaining the reasoning.
options(pillar.sigfig = 10)
tibble(x = "193525.0768") %>%
mutate(x = as.numeric(x))
# x
# 1 193525.0768
Alternately, use a data frame instead of a tibble:
data.frame(x = "193525.0768") %>%
mutate(x = as.numeric(x)) %>%
print(digits = 10)
# x
# 1 193525.0768

Parse and Evaluate Column of String Expressions in R?

How can I parse and evaluate a column of string expressions in R as part of a pipeline?
In the example below, I produce my desired column, evaluated. But I know this isn't the right approach. I tried taking a tidyverse approach. But I'm just very confused.
library(tidyverse)
df <- tibble(name = LETTERS[1:3],
to_evaluate = c("1-1+1", "iter+iter", "4*iter-1"),
evaluated = NA)
iter = 1
for (i in 1:nrow(df)) {
df[i,"evaluated"] <- eval(parse(text=df$to_evaluate[[i]]))
}
print(df)
# # A tibble: 3 x 3
# name to_evaluate evaluated
# <chr> <chr> <dbl>
# 1 A 1-1+1 1
# 2 B iter+iter 2
# 3 C 4*iter-1 3
As part of a pipeline, I tried:
df %>% mutate(evaluated = eval(parse(text=to_evaluate)))
df %>% mutate(evaluated = !!parse_exprs(to_evaluate))
df %>% mutate(evaluated = parse_exprs(to_evaluate))
df %>% mutate(evaluated = eval(parse_expr(to_evaluate)))
df %>% mutate(evaluated = parse_exprs(to_evaluate))
df %>% mutate(evaluated = eval(parse_exprs(to_evaluate)))
df %>% mutate(evaluated = eval_tidy(parse_exprs(to_evaluate)))
None of these work.
You can try:
df %>%
rowwise() %>%
mutate(iter = 1,
evaluated = eval(parse(text = to_evaluate))) %>%
select(-iter)
name to_evaluate evaluated
<chr> <chr> <dbl>
1 A 1-1+1 1
2 B iter+iter 2
3 C 4*iter-1 3
Following this logic, also other possibilities could work. Using rlang::parse_expr():
df %>%
rowwise() %>%
mutate(iter = 1,
evaluated = eval(rlang::parse_expr(to_evaluate))) %>%
select(-iter)
On the other hand, I think it is important to quote #Martin Mächler:
The (possibly) only connection is via parse(text = ....) and all good
R programmers should know that this is rarely an efficient or safe
means to construct expressions (or calls). Rather learn more about
substitute(), quote(), and possibly the power of using
do.call(substitute, ......).
Here's a slightly different way that does everything within mutate.
df %>% mutate(
evaluated = pmap_dbl(., function(name, to_evaluate, evaluated)
eval(parse(text=to_evaluate)))
)
# A tibble: 3 x 3
name to_evaluate evaluated
<chr> <chr> <dbl>
1 A 1-1+1 1
2 B iter+iter 2
3 C 4*iter-1 3
Note that values of additional variables (such as iter=1 in your case) can be passed directly to eval():
df %>%
mutate( evaluated = map_dbl(to_evaluate, ~eval(parse(text=.x), list(iter=1))) )
One advantage is that it automatically restricts the scope of the variable, keeping its value right next to where it is used.

Is there an helper function to make this code cleaner on tibble?

I need to sum sequences generated by one of column. I have done it in that way:
test <- tibble::tibble(
x = c(1,2,3)
)
test %>% dplyr::mutate(., s = plyr::aaply(x, .margins = 1, .fun = function(x_i){sum(seq(x_i))}))
Is there a cleaner way to do this? Is there some helper functions, construction which allows me to reduce this:
plyr::aaply(x, .margins = 1, .fun = function(x_i){sum(seq(x_i))})
I am looking for a generic solution, here sum and seq is only an example. Maybe the real problem is that I do want to execute function on element not all vector.
This is my real case:
test <- tibble::tibble(
x = c(1,2,3),
y = c(0.5,1,1.5)
)
d <- c(1.23, 0.99, 2.18)
test %>% mutate(., s = (function(x, y) {
dn <- dnorm(x = d, mean = x, sd = y)
s <- sum(dn)
s
})(x,y))
test %>% plyr::ddply(., c("x","y"), .fun = function(row) {
dn <- dnorm(x = d, mean = row$x, sd = row$y)
s <- sum(dn)
s
})
I would like to do that by mutate function in a row way not vectorized way.
For the specific example, it is a direct application of cumsum
test %>%
mutate(s = cumsum(x))
For generic cases to loop through the sequence of rows, we can use map
test %>%
mutate(s = map_dbl(row_number(), ~ sum(seq(.x))))
# A tibble: 3 x 2
# x s
# <dbl> <dbl>
#1 1 1
#2 2 3
#3 3 6
Update
For the updated dataset, use map2, as we are using corresponding arguments in dnorm from the 'x' and 'y' columns of the dataset
test %>%
mutate(V1 = map2_dbl(x, y, ~ dnorm(d, mean = .x, sd = .y) %>%
sum))
# A tibble: 3 x 3
# x y V1
# <dbl> <dbl> <dbl>
#1 1 0.5 1.56
#2 2 1 0.929
#3 3 1.5 0.470

What is the right way to use the R function `tally()` of `tidyverse` by different category?

I just read textbook from Benjamin, Modern Data Science with R. At the page 180, I find the useful function tally() similar to table() or some crosstable function. But I can't reproduce this function in my r.
The author uses this function like this waytally(income_dtree ~ income, data = train, format = "count").
I simulate an example, but fail.
library(dplyr)
data_frame(
x = rnorm(100),
y = c(rep("A",50),rep("B",50))
) %>%
tally(~y)
The warning message is Error in summarise_impl(.data, dots) : Evaluation error: invalid 'type' (language) of argument.
Does anyone know how to use it?
Thx for #ycw. The answer is here.
library(tidyverse)
library(mosaic)
data_frame(
x = rnorm(100),
y = c(rep("A",50),rep("B",50)),
z = c(rep("C",70),rep("D",30)),
) %>%
tally(~ y + z, data = .)
z
y C D
A 50 0
B 20 30
And the users have to add the data = . in the tally() even they use pipes.
This is probably what you want:
library(dplyr)
data_frame(
x = rnorm(100),
y = c(rep("A",50),rep("B",50))) %>%
group_by(y) %>%
tally()
# A tibble: 2 x 2
y n
<chr> <int>
1 A 50
2 B 50
Which is the same as the follows
data_frame(
x = rnorm(100),
y = c(rep("A",50),rep("B",50))) %>%
count(y)
# A tibble: 2 x 2
y n
<chr> <int>
1 A 50
2 B 50
Or this
data_frame(
x = rnorm(100),
y = c(rep("A",50),rep("B",50))) %>%
group_by(y) %>%
summarise(n = n())
# A tibble: 2 x 2
y n
<chr> <int>
1 A 50
2 B 50

Resources