How to write custom pipe-friendly functions? - r

I'm trying to create pipe-friendly functions using magrittr
For example, I tried to write a custom function to calculate the mean of a column:
library(magrittr)
custom_function <-
function(.data, x) {
mean(.data$x)
}
mtcars %>%
custom_function(mpg)
But I'm getting this error:
Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, :
object 'mpg' not found
Maybe my reference to the variable is not working. How do I fix this .data$x ?

.data$x does not refer to a column whose name is held in a variable x but refers to a column called "x". Use .data[[x]] to refer to the column whose name is the character string held in variable x and call your function using character string "mpg".
library(magrittr)
custom_function <- function(.data, x) mean(.data[[x]])
mtcars %>% custom_function("mpg")
## [1] 20.09062

In base R, we can change the $ to [[ and convert the unquoted column names to character with deparse/substitute
custom_function <- function(.data, x) {
mean(.data[[deparse(substitute(x))]])
}
Now, we apply the function
mtcars %>%
custom_function(mpg)
#[1] 20.09062
The issue with $ is that it is literally checking the column name 'x' without the associative value it stored. Thus, it is failing and returns NULL
With tidyverse, we can use the curly-curly operator ({{}}) to do the evaluation within summarise. As we need only a single summarised output, summarise can return that single value whereas if we need to create a new column in the original dataset, we need mutate. After we create summarised column, just pull that column as a vector
custom_function <- function(.data, x) {
.data %>%
summarise(out = mean({{x}})) %>%
pull(out)
}
mtcars %>%
custom_function(mpg)
[1] 20.09062

Related

What is causing 'object not found' error in filter() with the across() function?

This function filters/selects one or more variables from my dataset and writes it to a new CSV file. I'm getting an 'object not found' error when I call the function. Here is the function:
extract_ids <- function(filename, opp, ...) {
#Read in data
df <- read_csv(filename)
#Remove rows 2,3
df <- df[-c(1,2),]
#Filter and select
df_id <- filter(df, across(..., ~ !is.na(.x)) & gc == 1) %>%
select(...) #not sure if my use of ... here is correct
#String together variables for export file path
path <- c("/Users/stephenpoole/Downloads/",opp,"_",...,".csv") #not sure if ... here is correct
#Export the file
write_csv(df_id, paste(path,collapse=''))
}
And here is the function call. I'm trying to get columns "rid" and "cintid."
extract_ids(filename = "farmers.csv",
opp = "farmers",
rid, cintid)
When I run this, I get the below error:
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `across(..., ~!is.na(.x)) & gc == 1`.
x object 'cintid' not found
The column cintid is correct and appears in the data. I've also tried running it with just one column, rid, and get the same 'object not found' error.
If you are passing multiple values to across(), you need to collect them in the first parameter, otherwise they will spread into the other parameters of across(). Try
filter(df, across(c(...), ~ !is.na(.x))
Otherwise every value other than the first one will be passed along as a parameter to function you've specified in across()
Sorry for omitting this in my previous suggestion to you. Unfortunately, your original question was closed before I could post it as an answer:
If you want your function to resemble dplyr, here's a few
modifications you can make. Write your function header as
function(filename, opp, ...) verbatim. Then, replace !is.na(ID)
with across(..., ~ !is.na(.x)) verbatim. Now, you can call
extract_ids() and, just as you would with any dplyr verb, you can
specify any selection of columns you want to filter out NAs:
extract_ids(filename = "farmers.csv", opp = "farmers", rid, another_column_you_want_without_NAs).
Object Not Found
As MrFlick rightly suggests in their comment, you should wrap ... with c(), so everything you pass in ... is interpreted as the first argument to across(): a single tidy-selection of columns from df:
extract_ids <- function(filename, opp, ...) {
# ...
# Filter and select
df_id <- df %>%
# This format is preferred for dplyr workflows with pipes (%>%).
filter(across(c(...), ~ !is.na(.x)) & gc == 1) %>%
select(...)
# ...
}
Without this precaution, R interprets rid and cintid as multiple arguments to across(), rather than as simply columns named by the first argument (the tidy-selection).
Variable Names in the Filepath
To get those variable names within your filepath, use
extract_ids <- function(filename, opp, ...) {
# ...
# Expand the '...' into a list of given variable names, which will get pasted.
path <- c("/Users/stephenpoole/Downloads/", opp, "_", match.call(expand.dots = FALSE)$`...`, ".csv")
# ...
}
though you might want to consider replacing match.call(expand.dots = FALSE)$`...`, which currently mushes together the variable names:
"/Users/stephenpoole/Downloads/farmers_ridcintid.csv"
In exactly the same place, you might use the expression paste(match.call(expand.dots = FALSE)$`...`, collapse = "-"), which will separate those variable names using -
"/Users/stephenpoole/Downloads/farmers_rid-cintid.csv"
or any other separator of your choice that gives a valid filename.

Get all combinations of a character vector

I am trying to write a function to dynamically group_by every combination of a character vector.
This is how I set it up my list:
stuff <- c("type", "country", "color")
stuff_ListStr <- do.call("c", lapply(seq_along(stuff), function(i) combn(stuff, i, FUN = list)))
stuff_ListChar <- sapply(stuff_ListStr, paste, collapse = ", ")
stuff_ListSym <- lapply(stuff_ListChar, as.symbol)
Then I threw it into a loop.
b <- list()
for (each in stuff_ListSym) {
a <- answers_wfh %>%
group_by(!!each) %>%
summarize(n=n())
b <- append(b, a)
}
So essentially I want to replicate this
... group_by(type),
... group_by(country),
... group_by(type, country),
... and the rest of the combinations. Then I want put all the summaries into one list (a list of tibbles/lists)
It's totally failing. This is my error message:
Error: Column `type, country` is unknown.
Not only that, b is not giving me what I want. It's a list with length 12 already when I only expected 2 before it failed. One tibble grouped by 'type' and the second by 'country'.
I'm new to R in general but thought tidy eval was really cool and wanted to try. Any tips here?
I think you have a problem of standard evaluation. !! is sometimes not enough to unquote variables and get dplyr to work. Use !!! and rlang::syms for multiple unquotes
b <- list()
for (each in stuff_ListSym) {
a <- answers_wfh %>%
group_by(!!!rlang::syms(each)) %>%
summarize(n=n())
b <- append(b, a)
}
I think lapply would be better in your situation than for since you want to end-up with a list
Since you use variable names as arguments of functions, you might be more comfortable with data.table than dplyr. If you want the equivalent data.table implementation:
library(data.table)
setDT(answers_wfh)
lapply(stuff_ListSym, function(g) answers_wfh[,.(n = .N), by = g])
You can have a look at this blog post I wrote on the subject of SE vs NSE in dplyr and data.table
I think stuff_ListStr is enough to get what you want. You cold use group_by_at which accepts character vector.
library(dplyr)
library(rlang)
purrr::map(stuff_ListStr, ~answers_wfh %>% group_by_at(.x) %>% summarize(n=n()))
A better option is to use count but count does not accept character vectors so using some non-standard evaluation.
purrr::map(stuff_ListStr, ~answers_wfh %>% count(!!!syms(.x)))

In R how to pass a column as parameter to strsplit?

What is the proper way to pass a column as parameter to a str_split function and have it recognized as a column?
library(tidyverse)
library(lazyeval)
df = data.frame("x"=c("apple/pear","pear/banana/kiwi","orange/pear"))
function (col) {
mtcars %>%
select(col) %>%
transform(col = interp(strsplit(~v, "/"), v=as.name(col)) )
}
currently this is returning error 'Error in strsplit(~v, "-") : non-character argument'
We can use tidyverse options instead of mixing base R with tidyverse. separate_rows from tidyr splits the column and reshape it to 'long' format. Inside the function, we can make use of the curly-curly operator ({{}}) that evaluates unquoted argument to the function
library(dplyr)
library(tidyr)
f1 <- function(data, col) {
data %>%
separate_rows({{col}}, sep="/")
}
f1(df, x)

Order by column using infix operator

It's possibly very simple question, but I couldn't find an answer. I'm trying to apply abs on my matrix and then apply order by the first column (descending).
In separate rows it looks like:
pcaRotaMat <- abs(pcaImportance$rotation)
temp <- pcaRotaMat[order(-pcaRotaMat[,1]),]
However, when I'm trying to use the infix operator (%>%), I'm getting the following error:
t <- pcaImprtance$rotation %>% abs() %>% order(-[,1],)
Error: unexpected '[' in "t <- pcaImprtance$rotation %>% abs() %>% order(["
Your help will be appreciated.
If you are comfortable with something more verbose:
sort_fn = function(x) {
x[order(-x[ ,1]), ]
}
t <- pcaImprtance$rotation %>% abs() %>% sort_fn
Option 2:
If you don't want to create a function to sort:
t <- pcaImprtance$rotation %>% abs %>% .[order(-.[, 1]), ]
"." is the placeholder here for the matrix. I would also not recommend assigning variables to "t", as this is the function that transposes matrices.

How to write a function that accepts col name and a string in R?

I have a tbl_df data set called ds, and I want to write a function so that I can filter a column based on some String value. Here is my attempt:
myCol <- as.name(names(ds)[5]) # define which col to pass to the function
myFunction <- function(ds, myCol, myString="XXXX" ){
myQuant <- ds %>%
filter(myCol %in% myString )%>%
group_by(x)%>%
summarize(count = n())
return(myQuant)
}
This produces the following error:
Error during wrapup: 'match' requires vector arguments
If extract the filtering block outside the function and pass the arguments manually, it works fine.
I guess all what the filter function requires is a column name, so why doesn't like it this way?

Resources