Basically my problem is I don't understand how to assign multiple variable names as a "comma separated list" for my own function. My function is written as below:
tsf <- function(df, ...){
df<- arrange_(df, .dots = substitute(...))
other stuff
return(df)
}
but if I use it as
t <- tsf(df, var1, var2)
the dataframe will only be arranged on the basis of var1, instead of as in arrange(df, var1, var2) which it will be arranged by both.
How should I correct my code?
You can use alist to capture the values of the dots arguments in a list.
tsf <- function(df, ...){
dots <- eval(substitute(alist(...)))
dots <- vapply(dots,
deparse,
character(1))
df <- dplyr::arrange_(df, .dots = dots)
return(df)
}
tsf(mtcars, am, carb, gear)
See the section on "Capturing unevaluated …" at http://adv-r.had.co.nz/Computing-on-the-language.html
See the vignette:
library(dplyr)
packageVersion("dplyr")
# [1] ‘0.5.0.9004’
tsf <- function(df, ...){
df<- arrange(df, !!!quos(...))
return(df)
}
tsf(mtcars, vs, cyl)
Also note
that the underscored version of each main verb (filter_(), select_()
etc). is no longer needed, and so these functions have been deprecated
(but remain around for backward compatibility).
Related
I want to write a helper function that summarizes the percentage change for column A, B and C in one shot. I want to pass a string to the "mutate" argument of dplyr with the help of rlang. Unfortunately, I get an error saying that I have an unexpected ",". Could you please take a look? Thanks in advance!
library(rlang) #read text inputs and return vars
library(dplyr)
set.seed(10)
dat <- data.frame(A=rnorm(10,0,1),
B=rnorm(10,0,1),
C=rnorm(10,0,1),
D=2001:2010)
calc_perct_chg <- function(input_data,
target_Var_list,
year_Var_name){
#create new variable names
mutate_varNames <- paste0(target_Var_list,rep("_pct_chg = ",length(target_Var_list)))
#generate text for formula
mutate_formula <- lapply(target_Var_list,function(x){output <- paste0("(",x,"-lag(",x,"))/lag(",x,")");return(output)})
mutate_formula <- unlist(mutate_formula) #convert list to a vector
#generate arguments for mutate
mutate_args <<- paste0(mutate_varNames,collapse=",",mutate_formula)
#data manipulation
output <- input_data %>%
arrange(!!parse_quo(year_Var_name,env=caller_env())) %>%
mutate(!!parse_quo(mutate_args,env=caller_env()))
#output data frame
return(output)
}
# error: unexpected ','
calc_perct_chg(input_data =dat,
target_Var_list=list("A","B","C"),
year_Var_name="D")
I don't think it's a good idea to evaluate string as code, also I think you are over-complicating it. Using across this should be easier.
library(dplyr)
calc_perct_chg <- function(input_data,
target_Var_list,
year_Var_name){
input_data %>%
arrange(across(all_of(year_Var_name))) %>%
mutate(across(all_of(target_Var_list), ~(.x - lag(.x))/lag(.x)))
}
calc_perct_chg(input_data = dat,
target_Var_list = c("A","B","C"),
year_Var_name = "D")
I would like to create a function that updates a data frame from a different environment. Specifically, I would like to update the labels of a data frame using the Hmisc::label() function.
assign_label <- function(df, col) {
col <- rlang::as_name(rlang::ensym(col))
Hmisc::label(df[,col]) <- fetch_label(col)
}
fetch_label <- function(col) {
val <- c("mpg" = "MPG",
"hp" = "HP")
unname(val[col])
}
The following code executes without issue: assign_label(mtcars, hp)
However, it does not actually alter the data frame in the calling environment. I just can't figure out how to make it do what I imagine.
Ideally, I would like to be able to pipe a dataframe to this function as such:
mtcars %>% assign_label(mpg)
1) Return modified object Modifying objects in place is discouraged in R. The usual way to do this is to return the data frame and then assign it to a new name or back to the original name clobbering or shadowing it.
assign_label <- function(df, col) {
col <- deparse(substitute(col))
Hmisc::label(df[[col]]) <- fetch_label(col)
df
}
mtcars_labelled <- mtcars %>% assign_label(mpg)
2) magrittr Despite what we have said above there are some facilties for modifying in place in R and in some R packages. The magrittr package provides a syntax for overwriting or shadowing the input. Using the definition in (1) we can write:
library(mtcars)
mtcars %<>% assign_label(mpg)
If mtcars were in the global environment it would ovewrite it with the new value but in this case mtcars is in datasets so a new mtcars is written to the caller and the original in datasets is unchanged.
3) replacement function Although not widely used, R does provide replacement functions which are defined and used like this. This does overwite or shadow the input.
`assign_label<-` <- function(df, value) {
Hmisc::label(df[[value]]) <- fetch_label(value)
df
}
assign_label(mtcars) <- "mpg"
Note
As an aside, if the aim is for an interface that is consistent with tidyverse then use tidyselect to retrieve the column name(s) so that examples like the following work:
assign_labels <- function(df, col) {
nms <- names(select(df, {{col}}))
for(nm in nms) Hmisc::label(df[[nm]]) <- fetch_label(nm)
df
}
mtcars_labelled <- mtcars %>% assign_labels(starts_with("mp"))
str(mtcars_labelled)
mtcars_labelled <- mtcars %>% assign_labels(mpg|hp)
str(mtcars_labelled)
In regards to the comments about not modifying outside of the scope of a function, I created two functions that assign new dataframes with labels.
fetch_label <- function(col) {
val <- c("mpg" = "MPG",
"hp" = "HP")
unname(val[col])
}
assign_label <- function(df, col) {
col <- rlang::as_name(rlang::ensym(col))
Hmisc::label(df[[col]]) <- fetch_label(col)
return(df)
}
assign_labels <- function(df) {
purrr::iwalk(df, function(.x, .y) {
lab <- fetch_label(.y)
Hmisc::label(df[[col]]) <<- lab
})
return(df)
}
mtcars <- mtcars %>% assign_label(hp)
mtcars <- mtcars %>% assign_labels()
I have four functions, clean, clean2, cleanFun, and trim. Currently I apply the functions to one column, like so.
library(tidyverse)
library(data.table)
py17$CE.Finding.Description <- clean(py17$CE.Finding.Description)
py17$CE.Finding.Description <- clean2(py17$CE.Finding.Description)
py17$CE.Finding.Description <- cleanFun(py17$CE.Finding.Description)
py17$CE.Finding.Description <- trim(py17$CE.Finding.Description)
This process does the trick but I have to copy and paste this multiple times, and I'd eventually like to expand this to multiple columns.
For now, I'd like to save time and add an apply function but I'm not sure how to create that apply function. I've tried creating this.
maxclean <- function(cleaner) {
c(clean(cleaner), clean2(cleaner), cleanFun(cleaner), trim(cleaner))
}
py17$CE.Finding.Description <- sapply(py17$CE.Finding.Description, maxclean)
After trying this I just get
Error in `$<-.data.frame`(`*tmp*`, CE.Finding.Description, value = c(NA, :
replacement has 4 rows, data has 4318
I do not get any errors doing this the long way. Where am I going wrong on this?
Your maxclean function should take the same arguments as the separate functions. In your case - a vector. And then call each function in a row. Like this:
maxclean <- function(x) {
x <- clean(x)
x <- clean2(x)
x <- cleanFun(x)
x <- trim(x)
return(x)
}
Apparently, the OP has created a cleaning pipeline where the output of one step is fed into the next step and the final result of the pipeline overwrites the original input.
The magrittr package has the freduce() function which applies one function after the other in the described way. Thus,
py17$CE.Finding.Description <- clean(py17$CE.Finding.Description)
py17$CE.Finding.Description <- clean2(py17$CE.Finding.Description)
py17$CE.Finding.Description <- cleanFun(py17$CE.Finding.Description)
py17$CE.Finding.Description <- trim(py17$CE.Finding.Description)
can be written as:
library(magrittr)
fcts <- list(clean, clean2, cleanFun, trim)
py17$CE.Finding.Description %<>% freduce(fcts)
which is a shortcut for
py17$CE.Finding.Description <- py17$CE.Finding.Description %>%
clean() %>%
clean2() %>%
cleanFun() %>%
trim()
Here, %>% is the magrittr forward-pipe operator and %<>% is the magrittr compound assignment pipe-operator which updates the left-hand side object with the resulting value.
Reproducible example
Using the mtcars dataset:
data(mtcars)
mycars <- mtcars
mycars$mpg %<>%
{. - mean(.)} %>%
abs() %>%
sqrt()
mycars
or
mycars <- mtcars
mycars$mpg %<>% freduce(list(function(.) {. - mean(.)}, abs, sqrt))
mycars
Applying on multiple columns
The OP has mentioned that he eventually like to expand this to multiple columns
This can be achieved by, e.g.,
mycars <- mtcars
fcts <- list(function(.) {. - mean(.)}, abs, sqrt)
mycars$mpg %<>% freduce(fcts)
mycars$disp %<>% freduce(fcts)
mycars
I am using the map function of the purrr package in R which gives as output a list. Now I would like the output to be a named list based on the input. An example is given below.
input <- c("a", "b", "c")
output <- purrr::map(input, function(x) {paste0("test-", x)})
From this I would like to access elements of the list using:
output$a
Or
output$b
We just need to name the list
names(output) <- input
and then extract the elements based on the name
output$a
#[1] "test-a"
If this needs to be done using tidyverse
library(tidyverse)
output <- map(input, ~paste0('test-', .)) %>%
setNames(input)
Update
Now in 2020 the answer form #mihagazvoda describes the correct approach: simply set_names before applying map
c("a", "b", "c") %>%
purrr::set_names() %>%
purrr::map(~paste0('test-', .))
Outdated answer
The accepted solution works, but suffers from a repeated argument (input) which may cause errors and interrupts the flow when using piping with %>%.
An alternative solution would be to use a bit more power of the %>% operator
1:5 %>% { set_names(map(., ~ .x + 3), .) } %>% print # ... or something else
This takes the argument from the pipe but still lacks some beauty. An alternative could be a small helper method such as
map_named = function(x, ...) map(x, ...) %>% set_names(x)
1:5 %>% map_named(~ .x + 1)
This already looks more pretty and elegant. And would be my preferred solution.
Finally, we could even overwrite purrr::map in case the argument is a character or integer vector and produce a named list in such a case.
map = function(x, ...){
if (is.integer(x) | is.character(x)) {
purrr::map(x, ...) %>% set_names(x)
}else {
purrr::map(x, ...)
}
}
1 : 5 %>% map(~ .x + 1)
However, the optimal solution would be if purrr would implement such behaviour out of the box.
The recommended solution:
c("a", "b", "c") %>%
purrr::set_names() %>%
purrr::map(~paste0('test-', .))
I have the following function where there is only one parameter, df. df is dataframe:
test_function <- function(df) {
df_name <- df #get name of dataframe (does not work)
df_name
}
test_function(mtcars)
How do I return name of the dataset from this function? For test_function(mtcars) I need to assign string mtcars to df_name.
You can use the combo substitute + deparse
test_function <- function(df)
deparse(substitute(df))
test_function(mtcars)
##[1] "mtcars"
Another option is to use ??match.call
returns a call in which all of the specified arguments are specified by their full names.
test_function <- function(df){
as.list(match.call())[-1]
}
test_function(mtcars)
$df
mtcars