How to replace vector tidier - r

I am looking to find another function that can replace the broom::tidy() function after it get removed. Here is what the broom package warning says:
Tidy Atomic Vectors
Vector tidiers are deprecated and will be removed from an upcoming release of broom.
Here is a description of function:
tidy() produces a tibble() where each row contains information about an important component of the model. For regression models, this often corresponds to regression coefficients. This is can be useful if you want to inspect a model or create custom visualizations.
Thanks you,
John

As I understand the warning, there is no general deprecation of the function broom::tidy, this warning only occurs when it is called with an atomic vector. In this case tibble() seems to be a slot-in replacement:
No deprecation warning for tidy() when called for a linear model:
library(broom)
fit <- lm(Volume ~ Girth + Height, trees)
tidy(fit)
## A tibble: 3 x 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 (Intercept) -58.0 8.64 -6.71 2.75e- 7
#2 Girth 4.71 0.264 17.8 8.22e-17
#3 Height 0.339 0.130 2.61 1.45e- 2
#Deprecation warning:
tidy(1:5)
## A tibble: 5 x 1
# x
# <int>
#1 1
#2 2
#3 3
#4 4
#5 5
#Warning messages:
#1: 'tidy.numeric' is deprecated.
#See help("Deprecated")
#2: `data_frame()` is deprecated as of tibble 1.1.0.
#Please use `tibble()` instead.
No warning for tibble, same output :
tibble(1:5)
## A tibble: 5 x 1
# `1:5`
# <int>
#1 1
#2 2
#3 3
#4 4
#5 5

The deprecation warning is letting you know that the method tidy.numeric is being removed.
broom:::tidy.numeric
function (x, ...)
{
.Deprecated()
if (!is.null(names(x))) {
dplyr::data_frame(names = names(x), x = unname(x))
}
else {
dplyr::data_frame(x = x)
}
}
You can see the call to .Deprecated there, and the rest of the function just calls data_frame. As this function is also being deprecated, tibble is the new solution. As tibble does not honour row names, if you want to save the names, you could create something similar to the above.
tidy.numeric <- function (x, ...)
{
if (!is.null(names(x))) {
tibble::tibble(names = names(x), x = unname(x))
}
else {
tibble::tibble(x = x)
}
}

If you try to convert a named vector as mentioned by #Miff, you can also use the function enframe(). It creates a tibble with two columns, one with the names in the vector and one column with the values.

Related

using quasiquotation in functions with formula interface

I want to write a custom function that can take bare and "string" inputs, and can handle both functions with and without the formula interface.
custom function example
# setup
set.seed(123)
library(tidyverse)
# custom function
foo <- function(data, x, y) {
# function without formula
print(table(data %>% dplyr::pull({{ x }}), data %>% dplyr::pull({{ y }})))
# function with formula
print(
broom::tidy(stats::t.test(
formula = rlang::new_formula({{ rlang::ensym(y) }}, {{ rlang::ensym(x) }}),
data = data
))
)
}
bare
works for both functions with and without formula interface
foo(mtcars, am, cyl)
#>
#> 4 6 8
#> 0 3 4 12
#> 1 8 3 2
#> # A tibble: 1 x 10
#> estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1.87 6.95 5.08 3.35 0.00246 25.9 0.724 3.02
#> # ... with 2 more variables: method <chr>, alternative <chr>
string
works for both functions with and without formula interface
foo(mtcars, "am", "cyl")
#>
#> 4 6 8
#> 0 3 4 12
#> 1 8 3 2
#> # A tibble: 1 x 10
#> estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1.87 6.95 5.08 3.35 0.00246 25.9 0.724 3.02
#> # ... with 2 more variables: method <chr>, alternative <chr>
as colnames
works only for functions without the formula interface
foo(mtcars, colnames(mtcars)[9], colnames(mtcars)[2])
#>
#> 4 6 8
#> 0 3 4 12
#> 1 8 3 2
#> Error: Only strings can be converted to symbols
#> Backtrace:
#> x
#> 1. \-global::foo(mtcars, colnames(mtcars)[9], colnames(mtcars)[2])
#> 2. +-base::print(...)
#> 3. +-broom::tidy(...)
#> 4. +-stats::t.test(...)
#> 5. +-rlang::new_formula(...)
#> 6. \-rlang::ensym(y)
How can I modify the original function so that it will work with all the above-mentioned ways of entering the inputs and for both kinds of functions used?
The nice philosophy of rlang is that you get to control when you want values to be evaluated via the !! and {{}} operators. You seem to want to make a function that takes strings, symbols, and (possibly evaluated) expressions all in the same parameter. Using symbols or bare strings is actually easy with ensym but also wanting to allow for code like colnames(mtcars)[9] that has to be evaulated before returning a string is the problem. This potentially can be quite confusing. For example, what's the behavior you expect when you run the following?
am <- 'disp'
cyl <- 'gear'
foo(mtcars, am, cyl)
You could write a helper function if you want to assume all "calls" should be evaluated but symbols and literals should not. Here's a "cleaner" function
clean_quo <- function(x) {
if (rlang::quo_is_call(x)) {
x <- rlang::eval_tidy(x)
} else if (!rlang::quo_is_symbolic(x)) {
x <- rlang::quo_get_expr(x)
}
if (is.character(x)) x <- rlang::sym(x)
if (!rlang::is_quosure(x)) x <- rlang::new_quosure(x)
x
}
and you could use that in your function with
foo <- function(data, x, y) {
x <- clean_quo(rlang::enquo(x))
y <- clean_quo(rlang::enquo(y))
# function without formula
print(table(data %>% dplyr::pull(!!x), data %>% dplyr::pull(!!y)))
# function with formula
print(
broom::tidy(stats::t.test(
formula = rlang::new_formula(rlang::quo_get_expr(y), rlang::quo_get_expr(x)),
data = data
))
)
}
Doing so will allow all these to return the same values
foo(mtcars, am, cyl)
foo(mtcars, "am", "cyl")
foo(mtcars, colnames(mtcars)[9], colnames(mtcars)[2])
But you are probably just delaying possible other problems. I would not recommend over-interpreting user intentions with this kind of code. That's why it's better to explicitly allow them to un-escape themselves. Perhaps provide two different versions of the function that can be used with parameter that require evaluation and those that do not.
I have to agree with #MrFlick and others about inherent ambiguity when mixing standard and non-standard evaluation. (I also pointed this out in your similar question from a while ago.)
However, one can argue that dplyr::select() works with symbols, strings and expressions of the form colnames(.)[.]. If you absolutely must have the same interface, then you can leverage tidyselect to resolve your inputs:
library( rlang )
library( tidyselect )
ttest <- function(data, x, y) {
## Identify locations of x and y in data, get column names as symbols
s <- eval_select( expr(c({{x}},{{y}})), data ) %>% names %>% syms
## Use the corresponding symbols to build the formula by hand
broom::tidy(stats::t.test(
formula = new_formula( s[[2]], s[[1]] ),
data = data
))
}
## All three now work
ttest( mtcars, am, cyl )
ttest( mtcars, "am", "cyl" )
ttest( mtcars, colnames(mtcars)[9], colnames(mtcars)[2] )

R: `Error in f(x): could not find function "f"` when trying to use column of functions as argument in a tibble

I'm experimenting with using functions in dataframes (tidyverse tibbles) in R and I ran into some difficulties. The following is a minimal (trivial) example of my problem.
Suppose I have a function that takes in three arguments: x and y are numbers, and f is a function. It performs f(x) + y and returns the output:
func_then_add = function(x, y, f) {
result = f(x) + y
return(result)
}
And I have some simple functions it might use as f:
squarer = function(x) {
result = x^2
return(result)
}
cuber = function(x) {
result = x^3
return(result)
}
Done on its own, func_then_add works as advertised:
> func_then_add(5, 2, squarer)
[1] 27
> func_then_add(6, 11, cuber)
[1] 227
But lets say I have a dataframe (tidyverse tibble) with two columns for the numeric arguments, and one column for which function I want:
library(tidyverse)
library(magrittr)
test_frame = tribble(
~arg_1, ~arg_2, ~func,
5, 2, squarer,
6, 11, cuber
)
> test_frame
# A tibble: 2 x 3
arg_1 arg_2 func
<dbl> <dbl> <list>
1 5 2 <fn>
2 6 11 <fn>
I then want to make another column result that is equal to func_then_add applied to those three columns. It should be 27 and 227 like before. But when I try this, I get an error:
> test_frame %>% mutate(result=func_then_add(.$arg_1, .$arg_2, .$func))
Error in f(x) : could not find function "f"
Why does this happen, and how do I get what I want properly? I confess that I'm new to "functional programming", so maybe I'm just making an obvious syntax error ...
Not the most elegant but we can do:
test_frame %>%
mutate(Res= map(seq_along(.$func), function(x)
func_then_add(.$arg_1, .$arg_2, .$func[[x]])))
EDIT: The above maps both over the entire data which isn't really what OP desires. As suggested by #January this can be better applied as:
Result <- test_frame %>%
mutate(Res= map(seq_along(.$func), function(x)
func_then_add(.$arg_1[x], .$arg_2[x], .$func[[x]])))
Result$Res
The above again is not very efficient since it returns a list. A better alternative(again as suggested by #January is to use map_dbl which returns the same data type as its objects:
test_frame %>%
mutate(Res= map_dbl(seq_along(.$func), function(x)
func_then_add(.$arg_1[x], .$arg_2[x], .$func[[x]])))
# A tibble: 2 x 4
arg_1 arg_2 func Res
<dbl> <dbl> <list> <dbl>
1 5 2 <fn> 27
2 6 11 <fn> 227
This is because you should map instead of mutating. Mutate calls the function once, and supplies the whole columns as arguments.
The second problem is that test_frame$func[1] is not a function, but a list with one element. You can't have "function" columns, only list columns.
Try this:
test_frame$result <- with(test_frame,
map_dbl(1:2, ~ func_then_add(arg_1[.], arg_2[.], func[[.]])))
Result:
# A tibble: 2 x 4
arg_1 arg_2 func result
<dbl> <dbl> <list> <dbl>
1 5 2 <fn> 27
2 6 11 <fn> 227
EDIT: a simpler solution using dplyr, mutate and rowwise:
test_frame %>% rowwise %>% mutate(res=func_then_add(arg_1, arg_2, func))
Quite frankly, I am slightly puzzled by this last one. Why func and not func[[1]]? func should be a list, and not function. mutate and rowwise are doing here something sinister, like automatically converting a list to a vector.
Edit 2: actually, this is written explicitly in the rowwise manual:
Its main impact is to allow you to work with list-variables in
‘summarise()’ and ‘mutate()’ without having to use ‘[[1]]’.
Final edit: I became so fixated on tidyverse recently that I did not think of the simplest option – using base R:
apply(test_frame, 1, function(x) func_then_add(x$arg_1, x$arg_2, x$func))

How does ddply split the data?

I have this data frame.
mydf<- data.frame(c("a","a","b","b","c","c"),c("e","e","e","e","e","e")
,c(1,2,3,10,20,30),
c(5,10,20,20,15,10))
colnames(mydf)<-c("Model", "Class","Length", "Speed")
I'm trying to get a better understanding on how ddply works.
I'd like to get the average length and speed for each pairing of model and class.
I know this is one way to do it: ddply(mydf, .(Model, Class), .fun = summarize, mSpeed = mean(Speed), mLength = mean(Length)).
I wonder if I can get the mean using ddply and without specifying it one at a time.
I tried ddply(mydf, .(Model, Class), .fun = mean) but I get the error
Warning messages: 1: In mean.default(piece, ...) : argument is not
numeric or logical: returning NA
What does ddply pass on to the function argument? Is there a way to apply one function to every column using ddply?
My goal is to learn more about ddply. I will only accept answers will ddply
Here's a solution using dplyr and the summarize function.
library(dplyr)
mydf<- data.frame(c("a","a","b","b","c","c"),c("e","e","e","e","e","e")
,c(1,2,3,10,20,30),
c(5,10,20,20,15,10))
colnames(mydf)<-c("Model", "Class","Length", "Speed")
#summarize data by Model & Class
mydf %>% group_by(Model, Class) %>% summarize_if(is.numeric, mean)
#> # A tibble: 3 x 4
#> # Groups: Model [3]
#> Model Class Length Speed
#> <fct> <fct> <dbl> <dbl>
#> 1 a e 1.5 7.5
#> 2 b e 6.5 20
#> 3 c e 25 12.5
Created on 2019-04-16 by the reprex package (v0.2.1)

Renaming doesn't work for column names starting with two dots

I updated my tidyverse and my read_excel() function (from readxl) has also changed. Columns without titles are are now called ..1, ..2 and so on, when they used to be called X__1, X__2.
I'm trying to rename() these columns starting with two dots, but I'm getting an error message.
Here's an example:
library(tidyverse)
df <- tibble(a = 1:3,
..1 = 4:6)
df <- df %>%
rename(b = ..1)
Throws the error:
Error in .f(.x[[i]], ...) :
..1 used in an incorrect context, no ... to look in
I get the same error if I use backticks around the name: rename(b = `..1`).
..1 is a reserved word in R. See help("reserved") and help("..1"). Try quoting it:
df %>% rename(b = "..1")
giving:
# A tibble: 3 x 2
a b
<int> <int>
1 1 4
2 2 5
3 3 6
The janitor package has a very handy function clean_names for tasks like this. In this case, it replaces any .. that come from readxl with x. I added another .. column to show how the replacement works.
library(tidyverse)
df <- tibble(a = 1:3,
..1 = 4:6,
..5 = 10:12)
df %>%
janitor::clean_names()
#> # A tibble: 3 x 3
#> a x1 x5
#> <int> <int> <int>
#> 1 1 4 10
#> 2 2 5 11
#> 3 3 6 12
It seems like the naming setup in readxl is a topic of debate: see this issue, among others on the best way to convert unusable names from Excel sheets. There's also a vignette on it. To be honest, the last couple times I've needed to mess with readxl names, I just passed the data frame to janitor.

Tidyeval: pass list of columns as quosure to select()

I want to pass a bunch of columns to pmap() inside mutate(). Later, I want to select those same columns.
At the moment, I'm passing a list of column names to pmap() as a quosure, which works fine, although I have no idea whether this is the "right" way to do it. But I can't figure out how to use the same quosure/list for select().
I've got almost no experience with tidyeval, I've only got this far by playing around. I imagine there must be a way to use the same thing both for pmap() and select(), preferably without having to put each of my column names in quotation marks, but I haven't found it yet.
library(dplyr)
library(rlang)
library(purrr)
df <- tibble(a = 1:3,
b = 101:103) %>%
print
#> # A tibble: 3 x 2
#> a b
#> <int> <int>
#> 1 1 101
#> 2 2 102
#> 3 3 103
cols_quo <- quo(list(a, b))
df2 <- df %>%
mutate(outcome = !!cols_quo %>%
pmap_int(function(..., word) {
args <- list(...)
# just to be clear this isn't what I actually want to do inside pmap
return(args[[1]] + args[[2]])
})) %>%
print()
#> # A tibble: 3 x 3
#> a b outcome
#> <int> <int> <int>
#> 1 1 101 102
#> 2 2 102 104
#> 3 3 103 106
# I get why this doesn't work, but I don't know how to do something like this that does
df2 %>%
select(!!cols_quo)
#> Error in .f(.x[[i]], ...): object 'a' not found
This is a bit tricky because of the mix of semantics involved in this problem. pmap() takes a list and passes each element as its own argument to a function (it's kind of equivalent to !!! in that sense). Your quoting function thus needs to quote its arguments and somehow pass a list of columns to pmap().
Our quoting function can go one of two ways. Either quote (i.e., delay) the list creation, or create an actual list of quoted expressions right away:
quoting_fn1 <- function(...) {
exprs <- enquos(...)
# For illustration purposes, return the quoted inputs instead of
# doing something with them. Normally you'd call `mutate()` here:
exprs
}
quoting_fn2 <- function(...) {
expr <- quo(list(!!!enquos(...)))
expr
}
Since our first variant does nothing but return a list of quoted inputs, it's actually equivalent to quos():
quoting_fn1(a, b)
#> <list_of<quosure>>
#>
#> [[1]]
#> <quosure>
#> expr: ^a
#> env: global
#>
#> [[2]]
#> <quosure>
#> expr: ^b
#> env: global
The second version returns a quoted expression that instructs R to create a list with quoted inputs:
quoting_fn2(a, b)
#> <quosure>
#> expr: ^list(^a, ^b)
#> env: 0x7fdb69d9bd20
There is a subtle but important difference between the two. The first version creates an actual list object:
exprs <- quoting_fn1(a, b)
typeof(exprs)
#> [1] "list"
On the other hand, the second version does not return a list, it returns an expression for creating a list:
expr <- quoting_fn2(a, b)
typeof(expr)
#> [1] "language"
Let's find out which version is more appropriate for interfacing with pmap(). But first we'll give a name to the pmapped function to make the code clearer and easier to experiment with:
myfunction <- function(..., word) {
args <- list(...)
# just to be clear this isn't what I actually want to do inside pmap
args[[1]] + args[[2]]
}
Understanding how tidy eval works is hard in part because we usually don't get to observe the unquoting step. We'll use rlang::qq_show() to reveal the result of unquoting expr (the delayed list) and exprs (the actual list) with !!:
rlang::qq_show(
mutate(df, outcome = pmap_int(!!expr, myfunction))
)
#> mutate(df, outcome = pmap_int(^list(^a, ^b), myfunction))
rlang::qq_show(
mutate(df, outcome = pmap_int(!!exprs, myfunction))
)
#> mutate(df, outcome = pmap_int(<S3: quosures>, myfunction))
When we unquote the delayed list, mutate() calls pmap_int() with list(a, b), evaluated in the data frame, which is exactly what we need:
mutate(df, outcome = pmap_int(!!expr, myfunction))
#> # A tibble: 3 x 3
#> a b outcome
#> <int> <int> <int>
#> 1 1 101 102
#> 2 2 102 104
#> 3 3 103 106
On the other hand, if we unquote an actual list of quoted expressions, we get an error:
mutate(df, outcome = pmap_int(!!exprs, myfunction))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: Element 1 is not a vector (language).
That's because the quoted expressions inside the list are not evaluated in the data frame. In fact, they are not evaluated at all. pmap() gets the quoted expressions as is, which it doesn't understand. Recall what qq_show() has shown us:
#> mutate(df, outcome = pmap_int(<S3: quosures>, myfunction))
Anything inside angular brackets is passed as is. This is a sign that we should somehow have used !!! instead, to inline each element of the list of quosures in the surrounding expression. Let's try it:
rlang::qq_show(
mutate(df, outcome = pmap_int(!!!exprs, myfunction))
)
#> mutate(df, outcome = pmap_int(^a, ^b, myfunction))
Hmm... Doesn't look right. We're supposed to pass a list to pmap_int(), and here it gets each quoted input as separate argument. Indeed we get a type error:
mutate(df, outcome = pmap_int(!!!exprs, myfunction))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: `.x` is not a list (integer).
That's easy to fix, just splice into a call to list():
rlang::qq_show(
mutate(df, outcome = pmap_int(list(!!!exprs), myfunction))
)
#> mutate(df, outcome = pmap_int(list(^a, ^b), myfunction))
And voilà!
mutate(df, outcome = pmap_int(list(!!!exprs), myfunction))
#> # A tibble: 3 x 3
#> a b outcome
#> <int> <int> <int>
#> 1 1 101 102
#> 2 2 102 104
#> 3 3 103 106
We can use quos when there are more than one element and evaluate with !!!
cols_quo <- quos(a, b)
df2 %>%
select(!!!cols_quo)
The object 'df2' can be created with
df %>%
mutate(output = list(!!! cols_quo) %>%
reduce(`+`))
If we want to use the quosure as in the OP's post
cols_quo <- quo(list(a, b))
df2 %>%
select(!!! as.list(quo_expr(cols_quo))[-1])
# A tibble: 3 x 2
# a b
# <int> <int>
#1 1 101
#2 2 102
#3 3 103

Resources