Is it possible in R to create argument names in a function call dynamically?
For example, if we start with
name <- "variable"
I would like to create a new data frame like this
a.new.data.frame <- data.frame(name = c(1, 2))
which of course does not work.
The only solution I could invent was
arg <- list(c(1, 2))
names(arg) <- name
a.new.data.frame <- do.call(data.frame, arg)
a.new.data.frame
# variable
#1 1
#2 2
I don't like this code, since it seems not to be elegant.
Is there a better way to do it?
PS Important! This is a more general problem I have when writing R-programmes (e.g. when I use ggplot, and many other cases). So, I expect general solutions to this (creation of data.frame is only an example).
A more compact code for dynamic args could look like this:
df <- do.call(data.frame, list(name = c(1, 2)))
You could use the ?dotsMethods to encapsulate the do.call in a generic function like this to save the noisy list() part of the call:
call.with.dyn.args <- function(f, ...) {
args <- list(...)
do.call(f, args)
}
df1 <- call.with.dyn.args(data.frame, a = 1:2, b = letters[1:2])
df1
# a b
# 1 1 a
# 2 2 b
But you also have other options for dynamic argument passing to functions without a do.call, eg.:
dyn.values <- c(1:2)
name = "dyn.values"
df2 <- data.frame(dyn.values, # values from a variable
name = dyn.values, # values from a variable + new name
static.arg = letters[1:2], # usual direct passing of an arg
name.from.variable = get(name)) # get the values from a variable whose name is stored in another variable
df2
# dyn.values name static.arg name.from.variable
# 1 1 1 a 1
# 2 2 2 b 2
An option using tidyverse
library(tibble)
library(dplyr)
tibble(!! name := c(1, 2))
# A tibble: 2 x 1
# variable
# <dbl>
#1 1
#2 2
Related
I want to create a dataframe with a column whose value depends on another object's value.
Here's an example, I want my column to be called "conditional_colname":
x = "conditional_colname"
df <- data.frame(x = c(1, 2, 3))
df
> x
1 1
2 2
3 3
I could try the following indirection syntax in tidy evaluation, but it returns an error:
data.frame({{x}} := c(1, 2, 3))
> Error in `:=`({ : could not find function ":="
I can sort out the problem through the use of the rename function and indirection in tidy evaluation syntax, as in:
df %>% rename({{x}} := x)
> conditional_colname
1 1
2 2
3 3
but that involves creating the dataframe with a wrong name and then renaming it, is there any option to do it from the creation of the dataset?
{{..}} can be used with tibbles -
library(tibble)
library(rlang)
df <- tibble({{x}} := c(1, 2, 3))
df
# A tibble: 3 × 1
# conditional_colname
# <dbl>
#1 1
#2 2
#3 3
A solution with data.frame would be with setNames.
df <- setNames(data.frame(c(1, 2, 3)), x)
I have a vector containing "potential" column names:
col_vector <- c("A", "B", "C")
I also have a data frame, e.g.
library(tidyverse)
df <- tibble(A = 1:2,
B = 1:2)
My goal now is to create all columns mentioned in col_vector that don't yet exist in df.
For the above exmaple, my code below works:
df %>%
mutate(!!sym(setdiff(col_vector, colnames(.))) := NA)
# A tibble: 2 x 3
A B C
<int> <int> <lgl>
1 1 1 NA
2 2 2 NA
Problem is that this code fails as soon as a) more than one column from col_vector is missing or b) no column from col_vector is missing. I thought about some sort of if_else, but don't know how to make the column creation conditional in such a way - preferably in a tidyverse way. I know I can just create a loop going through all the missing columns, but I'm wondering if there is a more direc approach.
Example data where code above fails:
df2 <- tibble(A = 1:2)
df3 <- tibble(A = 1:2,
B = 1:2,
C = 1:2)
This should work.
df[,setdiff(col_vector, colnames(df))] <- NA
Solution
This base operation might be simpler than a full-fledged dplyr workflow:
library(tidyverse) # For the setdiff() function.
# ...
# Code to generate 'df'.
# ...
# Find the subset of missing names, and create them as columns filled with 'NA'.
df[, setdiff(col_vector, names(df))] <- NA
# View results
df
Results
Given your sample col_vector and df here
col_vector <- c("A", "B", "C")
df <- tibble(A = 1:2, B = 1:2)
this solution should yield the following results:
# A tibble: 2 x 3
A B C
<int> <int> <lgl>
1 1 1 NA
2 2 2 NA
Advantages
An advantage of my solution, over the alternative linked above by #geoff, is that you need not code by hand the set of column names, as symbols and strings within the dplyr workflow.
df %>% mutate(
#####################################
A = ifelse("A" %in% names(.), A, NA),
B = ifelse("B" %in% names(.), B, NA),
C = ifelse("C" %in% names(.), B, NA)
# ...
# etc.
#####################################
)
My solution is by contrast more dynamic
##############################
df[, setdiff(col_vector, names(df))] <- NA
##############################
if you ever decide to change (or even dynamically calculate!) your variable names midstream, since it determines the setdiff() at runtime.
Note
Incredibly, #AustinGraves posted their answer at precisely the same time (2021-10-25 21:03:05Z) as I posted mine, so both answers qualify as original solutions.
I have a list of functions, for example:
myFunctions = list(
calculateMean = function(x) {mean(x)},
calculateMedian = function(x) {median(x)}
)
I need to call stored functions in myFunctions based on some criteria for example, I have a table (myTable) with prices and I need to calculate means and medians (I also need to do more things like standardize names, join a specific value with a table with codes, etc).
If a value in a column in myTable is == "a" I want to use function calculateMean, if == "b" I want to use function calculateMedian, if == "c" use function calculateMean.
What is the best way to do this? I am saving functions as a list as I will have a lot of functions. And how can I call a function in the myFunctions based on a specific criteria?
Thanks!
Maybe the following does what the question asks for.
Depending on ID, function priceStat determines which function from myFunctions to apply to column price.
priceStat <- function(x, funlist) {
type <- unique(as.character(x[["ID"]]))
f <- switch(type,
pear = funlist[[1]],
orange = funlist[[2]])
f(x[["price"]])
}
myFunctions = list(
calculateMean = function(x) {mean(x)},
calculateMedian = function(x) {median(x)}
)
set.seed(1234)
df1 <- data.frame(ID = sample(c("pear", "orange"), 20, TRUE),
price = runif(20),
stringsAsFactors = FALSE)
sapply(split(df1, df1$ID), priceStat, myFunctions)
# orange pear
#0.3036828 0.5427695
Here is something that I think does what you are suggesting.
library(dplyr)
Create some data.
set.seed(1234)
data <- tibble(id = rep(letters[1:2], each = 3), price = rnorm(6, 100, 5))
data
# # A tibble: 6 x 2
# id price
# <chr> <dbl>
# 1 a 94.0
# 2 a 101.
# 3 a 105.
# 4 b 88.3
# 5 b 102.
# 6 b 103.
Create a list of functions. Note we named the list item for the id we want to apply it to.
myFunctions <- list(
a = mean,
b = median
)
Group the data on the id. Then iterate over each list item, calling summarize(). For each list (which is the subset of the data for that given id) call the function from the myFunctions list.
data %>%
group_by(id) %>%
group_modify(~ summarize(.x, calc = myFunctions[[pull(.y[1])]](.x$price)))
# # A tibble: 2 x 2
# id calc
# <chr> <dbl>
# 1 a 100.
# 2 b 102.
Testing it out.
> mean(data$price[data$id == "a"])
[1] 100.258
> median(data$price[data$id == "b"])
[1] 102.1456
I would like to write a function which uses dplyr::filter() within the function. When writing the function I ran into an issue with using a parameter name in the function that is also a name of one of the columns of the data frame I am filtering.
Suppose I call the data frame to be filtered dat:
library(dplyr)
dat <- data.frame(
a = c(1:10),
b = c(2,2,2,2,2,3,1,1,4,4)
)
and name the function test.filter(),
test.filter <- function(b, test.data = dat){
dat.t <- filter(test.data,
b == b)
return(dat.t)
}
Here I am passing a value b to the function and asking it to filter the column b based on the value b. I believe the function
test.filter(b = 4,
test.data = dat)
should produce the same result as
filter(dat,
b == 4)
However this is not the case. I am wondering if there is something I am not considering in terms of the scope of a function. Any help is appreciated!
It is a case where the argument 'b' of the function is the same as the column name. One option is to do !! inside the function argument
test.filter <- function(b, test.data = dat){
filter(test.data,
b == !!b)
}
test.filter(b = 4,
test.data = dat)
If the argument passed to the function is similar to one of the column name in the dataframe, we can use the curly-curly ({{ }}) operator from rlang to evaluate column name
library(rlang)
test.filter <- function(b, test.data = dat) {
dplyr::filter(dat,{{b}} == b)
}
test.filter(b = 4,test.data = dat)
# a b
#1 9 4
#2 10 4
test.filter(b = 2,test.data = dat)
# a b
#1 1 2
#2 2 2
#3 3 2
#4 4 2
#5 5 2
Thanks for the helpful answers. A friend let me know that the underlying reason for the issue is dplyr uses lazy eval, so b==b evaluates to all true.
dplyr's rename functions require the new column name to be passed in as unquoted variable names. However I have a function where the column name is constructed by pasting a string onto an argument passed in and so is a character string.
For example say I had this function
myFunc <- function(df, col){
new <- paste0(col, '_1')
out <- dplyr::rename(df, new = old)
return(out)
}
If I run this
df <- data.frame(a = 1:3, old = 4:6)
myFunc(df, 'x')
I get
a new
1 1 4
2 2 5
3 3 6
Whereas I want the 'new' column to be the name of the string I constructed ('x_1'), i.e.
a x_1
1 1 4
2 2 5
3 3 6
Is there anyway of doing this?
I think this is what you were looking for. It is the use of rename_ as #Henrik suggested, but the argument has an, lets say, interesting, name:
> myFunc <- function(df, col){
+ new <- paste0(col, '_1')
+ out <- dplyr::rename_(df, .dots=setNames(list(col), new))
+ return(out)
+ }
> myFunc(data.frame(x=c(1,2,3)), "x")
x_1
1 1
2 2
3 3
>
Note the use of setNames to use the value of new as name in the list.
Recent updates to tidyr and dplyr allow you to use the rename_with function.
Say you have a data frame:
library(tidyverse)
df <- tibble(V0 = runif(10), V1 = runif(10), V2 = runif(10), key=letters[1:10])
And you want to change all of the "V" columns. Usually, my reference for columns like this comes from a json file, which in R is a labeled list. e.g.,
colmapping <- c("newcol1", "newcol2", "newcol3")
names(colmapping) <- paste0("V",0:2)
You can then use the following to change the names of df to the strings in the colmapping list:
df <- rename_with(.data = df, .cols = starts_with("V"), .fn = function(x){colmapping[x]})