How to assign function argument to variable name - r

Say I have these data:
data <- data.frame(a = c(1,2,3))
And a simple function that creates a data frame. It also creates a new variable based on a simple function; this new variable takes the name that is passed in with varname. Here is my attempt (the assign line is wrong):
fun <- function(varname) {
data <- data.frame(a = c(1,2,3))
assign(paste0("data$", varname), sqrt(data$a))
data
}
fun("newvar")
Base R or tidyverse solutions are both great.

You were close! There are multiple ways of subsetting dataframes, including using the [[.]] notation (e.g., data[["var"]]. Simply assigning a value to a new column initializes the column.
fun <- function(varname) {
data <- data.frame(a = c(1,2,3))
data[[varname]] <- sqrt(data$a)
data
}
fun("newvar")

tidyverse
If you want to pass the variable name as a string then a tidyverse method would be:
library(dplyr)
fun <- function(varname) {
data.frame(a = c(1,2,3)) %>%
mutate(!! varname := sqrt(a))
}
fun("newvar")
Alternatively, you could use tidyeval so you don't have to quote the variable name:
library(dplyr)
fun <- function(varname) {
varname <- rlang::enquo(varname)
data.frame(a = c(1,2,3)) %>%
mutate(!! varname := sqrt(a))
}
fun(newvar)
base R
If you want to use base R I would recommend the solution posted by #Noah, but another base R option that is fairly obtuse would be:
fun <- function(varname) {
data.frame(a = c(1,2,3)) |>
within(eval(substitute(x <- sqrt(a), list(x = as.name(varname)))))
}
fun("newvar")

Related

Pass a function input as column name to data.frame function

I have a function taking a character input. Within the function, I want to use the data.frame() function. Within the data.frame() function, one column name should be the function's character input.
I tried it like this and it didn't work:
frame_create <- function(data, **character_input**){
...
some_vector <- c(1:50)
temp_frame <- data.frame(**character_input** = some_vector, ...)
return(temp_frame)
}
Either use, names to assign or with setNames as = wouldn't allow evaluation on the lhs of =. In package functions i.e tibble or lst, it can be created with := and !!
frame_create <- function(data, character_input){
some_vector <- 1:50
temp_frame <- data.frame(some_vector)
names(temp_frame) <- character_input
return(temp_frame)
}
Can you explain your requirement for using a function to create a new dataframe column? If you have a dataframe df and you want to make a copy with a new column appended then the trivial solution is:
df2 <- df
df2$new_col <- 1:50
Example of merging multiple dataframes in R:
cars1 <- mtcars
cars2 <- cars1
cars3 <- cars2
list1 <- list(cars1, cars2, cars3)
all_cars <- Reduce(rbind, list1)

Define the Output Name in a Function

I would like to manually define the name of the object/output of a function. A very simple example of what I have is:
x <- data.frame(name = c("A", "B", "C"),
value = c(50, 20, 100))
statistics <- function(data, name){
total <- data %>% mutate(New = value +50)
assign(paste0(name), data)
}
statistics(x, "NewName")
I would like to run this function and define what data to use and the name of the output. The idea is to create a uniquely named output for each dataset used.
Thanks!
One way is to just assign the data with the <- instead of using assign() in the function call. Or you can use assign but you have to specify the envir you would like the object to be assigned too. If it is left blank it will go to the functions environment.
A word of caution on using assign() in the function is that is will overwrite other objects in the global env if they have the same name. So be careful with your object names.
x <- data.frame(name = c("A", "B", "C"),
value = c(50, 20, 100))
statistics <- function(data){
data %>% mutate(New = value +50)
}
Newname <- statistics(x)
statistics2 <- function(data, name){
total <- data %>% mutate(New = value +50)
assign(paste0(name), total, envir = .GlobalEnv)
}
statistics2(x, "NewName2")
A slight aside in your code in your assign() it should say total not data.

Create variable based on other variable outside function

I'm trying to make my code general, I'd only want to change the YEAR variable without having to change everything in the code
YEAR = 1970
y <- data.frame(col1 = c(1:5))
function (y){
summarize(column_YEAR = sum(col1))
}
#Right now this gives
column_YEAR
1 15
#I would like this function to output this (so col1 is changed to column_1970)
column_1970
1 15
or for example this
df <- list("a_YEAR" = anotherdf)
#I would like to have a list with a df with the name a_1970
I tried things like
df <- list(assign(paste0(a_, YEAR), anotherdf))
But it does not work, does somebody have any advice? Thanks in advance :)
rlang provides a flexible way to defuse R expressions. You can use that functionality to create dynamic column names within dplyr flow. In this example dynamic column name is created using suffix argument passed to a wrapper function on dplyr's summarise.
library("tidyverse")
YEAR = 1970
y <- data.frame(col1 = c(1:5))
function (y) {
summarize(column_YEAR = sum(col1))
}
my_summarise <- function(.data, suffix, sum_col) {
var_name <- paste0("column_", suffix)
summarise(.data,
{{var_name}} := sum({{sum_col}}))
}
my_summarise(.data = y, suffix = YEAR, sum_col = col1)
Results
my_summarise(.data = y, suffix = YEAR, sum_col = col1)
# column_1970
# 1 15
You can also source arguments directly from global environment but from readability perspective this is poorer solution as it's not immediately clear how the function creates suffix.
my_summarise_two <- function(.data, sum_col) {
var_name <- paste0("column_", YEAR)
summarise(.data,
{{var_name}} := sum({{sum_col}}))
}
my_summarise_two(.data = y, sum_col = col1)

I give three arguments, the input df, the column I want to clean,the new column I want to be added with cleansed names. Where am I going wrong?

library(dplyr)
clean_name <- function(df,col_name,new_col_name){
#remove whitespace and common titles.
df$new_col_name <- mutate_all(df,
trimws(gsub("MR.?|MRS.?|MS.?|MISS.?|MASTER.?","",df$col_name)))
#remove any chunks of text where a number is present
df$new_col_name<- transmute_all(df,
gsub("[^\\s]*[\\d]+[^\\s]*","",df$col_name,perl = TRUE))
}
I get the following error
"Error: Column new_col_name must be a 1d atomic #vector or a list"
what you want to do is make sure that the output of the functions you're using is either a vector or a list with only one dimension so that you can add it as a new column in the desired data frame. You can verify the class of an object with the Class function which comes within the base package.
The mutate function by itself should do what you want, it returns the same data frame but with the new column:
library(dplyr)
clean_name <- function(df, col_name, new_col_name) {
# first_cleaning_to_colname = The first change you want to make to the col_name column. This should be a vector.
# second_cleaning_to_colname = The change you're going to make to the col_name column after the first one. This should be a vector too.
first_change <- mutate(df, col_name = first_cleaning_to_colname)
second_change <- mutate(first_change, new_col_name = second_cleaning_to_colname)
return(second_change)
}
You can make both this changes at the same time but I thought this way it's easier to read.
If we are passing unquoted column names, then use
library(tidyverse)
clean_name <- function(df,col_name, new_col_name){
col_name <- enquo(col_name)
new_col_name <- enquo(new_col_name)
df %>%
mutate(!! new_col_name :=
trimws(str_replace_all(!!col_name, "MR.?|MRS.?|MS.?|MISS.?|MASTER.?","")) ) %>%
transmute(!! new_col_name := trimws(str_replace_all(!! new_col_name,
"[^\\s]*[\\d]+[^\\s]*","")))
}
clean_name(dat1, col1, colN)
# colN
#1 one
#2 two
data
dat1 <- data.frame(col1 = c("MR. one", "MS. two 24"), stringsAsFactors = FALSE)

How do I pass names for new summary columns to data.table in a function?

Say I want to create a function that calculates a summary dataset from a data.table in R, and I want to be able to pass the name of the new calculated variable in programmatically.
For example:
library(data.table)
# generate some fake data
set.seed(919)
dt <- data.table(x = rnorm(50), by.var = rep(c("a", "b"), 25))
dt[, list(group.means = mean(x)), by = "by.var"] # This is what I want
# But I want to do in a function, so I can do it repeatedly:
groupMeans <- function(out.var, by.var, dat = dt) {
return(dat[, list(out.var = mean(x)), by = by.var]) # doesn't work
}
groupMeans("group.means", "by.var") # out.var should be "group.means"
How do I do this?
Courtesy of docendo discimus, you can use a named list created with setNames, like this:
groupMeans <- function(out.var, by.var, dat = dt) {
return(dat[, setNames(list(mean(x)), out.var), by = by.var])
}
groupMeans("group.means", "by.var")
# by.var group.means
# 1: a -0.1159832
# 2: b 0.2910531
You could consider changing the column names inside your function:
groupMeans <- function(out.var, by.var, dat = dt) {
res <- dat[, list(mean(x)), by=by.var]
setnames(res, "V1", out.var)
res
}
We could use setnames to name the summarised column with the 'out.var' vector.
groupMeans <- function(out.var, by.var, dat = dt) {
setnames(dat[, list(mean(x)), by = by.var],
length(by.var)+1L, out.var)
}
groupMeans("group.var","by.var", dt)[]
# by.var group.var
#1: a -0.1159832
#2: b 0.2910531
EDIT: Based on #Frank's suggestion.

Resources