I have been searching this and have found this link to be helpful with renaming passed columns from a function (the [,column_name] code actually made my_function1 work after I had been searching for a while. Is there a way to use the pipe operator to rename columns in a dataframe within a function?
My attempt is shown in my_function2 but it gives me an Error: All arguments to rename must be named or Error: Unknown variables: col2. I am guessing because I have not specified what col2 belongs to.
Also, is there a way to pass associated arguments into the function, like col1 and new_col1 so that you can associated the column name to be replaced and the column name that is replacing it. Thanks in advance!
library(dplyr)
my_df = data.frame(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))
my_function1 = function(input_df, col1, new_col1) {
df_new = input_df
df_new[,new_col1] = df_new[,col1]
return(df_new)
}
temp1 = my_function1(my_df, "a", "new_a")
my_function2 = function(input_df, col2, new_col2) {
df_new = input_df %>%
rename(new_col2 = col2)
return(df_new)
}
temp2 = my_function2(my_df, "b", "new_b")
rename_ (alongside other dyplyr verbs suffixed with an underscore) has been depreciated.
Instead, try:
my_function3 = function(input_df, cols, new_cols) {
input_df %>%
rename({{ new_cols }} := {{ cols }})
}
See this vignette for more information about embracing arguments with double braces and programming with dplyr.
Following #MatthewPlourde's answer to a similar question, we can do:
my_function3 = function(input_df, cols, new_cols) {
rename_(input_df, .dots = setNames(cols, new_cols))
}
# example
my_function3(my_df, "b", "new_b")
# a new_b c
# 1 1 4 7
# 2 2 5 8
# 3 3 6 9
Many dplyr functions have less-known variants with names ending in _. that allow you to work with the package more programmatically. One pattern is...
DF %>% dplyr_fun(arg1 = val1, arg2 = val2, ...)
# becomes
DF %>% dplyr_fun_(.dots = list(arg1 = "val1", arg2 = "val2", ...))
This has worked for me in a few cases, where the val* are just column names. There are more complicated patterns and techniques, covered in the document that pops up when you type vignette("nse"), but I do not know them well.
Related
I'm trying to make my code general, I'd only want to change the YEAR variable without having to change everything in the code
YEAR = 1970
y <- data.frame(col1 = c(1:5))
function (y){
summarize(column_YEAR = sum(col1))
}
#Right now this gives
column_YEAR
1 15
#I would like this function to output this (so col1 is changed to column_1970)
column_1970
1 15
or for example this
df <- list("a_YEAR" = anotherdf)
#I would like to have a list with a df with the name a_1970
I tried things like
df <- list(assign(paste0(a_, YEAR), anotherdf))
But it does not work, does somebody have any advice? Thanks in advance :)
rlang provides a flexible way to defuse R expressions. You can use that functionality to create dynamic column names within dplyr flow. In this example dynamic column name is created using suffix argument passed to a wrapper function on dplyr's summarise.
library("tidyverse")
YEAR = 1970
y <- data.frame(col1 = c(1:5))
function (y) {
summarize(column_YEAR = sum(col1))
}
my_summarise <- function(.data, suffix, sum_col) {
var_name <- paste0("column_", suffix)
summarise(.data,
{{var_name}} := sum({{sum_col}}))
}
my_summarise(.data = y, suffix = YEAR, sum_col = col1)
Results
my_summarise(.data = y, suffix = YEAR, sum_col = col1)
# column_1970
# 1 15
You can also source arguments directly from global environment but from readability perspective this is poorer solution as it's not immediately clear how the function creates suffix.
my_summarise_two <- function(.data, sum_col) {
var_name <- paste0("column_", YEAR)
summarise(.data,
{{var_name}} := sum({{sum_col}}))
}
my_summarise_two(.data = y, sum_col = col1)
alply(df1 %>% as.matrix, 2, foo, keyword.count)
I have the above line of code that applies function 'foo' on each column of 'df1'. I want to add an additional parameter (df2) to function foo that has same number of columns as df1. something like
alply(df1 %>% as.matrix, 2, foo, df2 %>% as.matrix, keyword.count)
I want a function that uses same iterator for df1 and df2. In terms of loops, df1[1] and df2[1] in 1st iteration, df1[2] and df2[2] in 2nd iteration and so on.
In current implementation using alply, df1[1] uses df2 matrix as a parameter and not a column of df2.
in terms of a loop, it would look something like this
for(int i=0; i<ncol(df1); i++){
foo(df1[i], df2[i], keyword.count)
}
Is there an apply family function that allows me to do this? or some way to get the number of iteration that can be accessed in "foo".
Any help would be appreciated
example:
df1 <- data.frame(
col1 = sample(LETTERS[1:5]),
col2 = sample(LETTERS[6:10])
)
df2 <- data.frame(
col1 = sample(LETTERS[11:13]),
col2 = sample(LETTERS[14:16])
)
foo <- function(terms, fixed_terms , collocated_words ) {
terms <- terms[terms != ""]
fixed_terms <- fixed_terms[fixed_terms != ""]
##use terms and fixed_terms in another function
}
mlply(.data = as.matrix(df1), .fun = foo, fixed_terms = as.matrix(df2), collocated_word=2)
##error:
##Error in (function (terms, fixed_terms, collocated_words) :
## unused arguments (col1 = "B", col2 = "H")
you can use mlply:
mlply(as.matrix(df1), foo, argument2 = as.matrix(df2), 2)
you may need to specify what argument of foo each matrix is being called by
I want to substitute parts of the transform function with variable inputs.
I have created a df using subset with col1 from an existing table:
col1 = c('A','B','C')
The df looks something like this:
A = c(1, 3)
B = c(3, 1)
C = c(5, 2)
df = data.frame(A, B, C)
I now want to automate calculations which manually would look like this:
df <- transform(df, 'ABC' = (A + B + C))
where (A + B + C) refers to the columns of the df. Because I have hundreds of 'col1's I can't do it by hand. I was trying to use something similar to %s (as available in python 2.X), yet so far nothing really worked and I understand too little of R (related to eval()?)to get things working (tried paste, as.formula, sprintf, substitute etc.).
Using cv(col1) I'm trying to paste the output inside the transform function, yet the furthest I got was transform trying to grab values from the environment (not columns) when using as.formula.
cv = function(var){
output = paste('(', paste(var, collapse = ' + '), ')', sep = '')
return(output)
}
Would appreciate any hints or ideas!
You have maneuvered yourself into a strange corner. This is easy with R:
cols <- c("A", "B", "C")
df[, paste(cols, collapse = "")] <- rowSums(df[, cols])
#alternatively for other binary functions:
#Reduce("+", df[, cols])
# A B C ABC
#1 1 3 5 9
#2 3 1 2 6
You can get a similar effect using mutate from dplyr:
library(dplyr)
cols <- c("A", "B", "C")
df %>% mutate_(.dots = setNames(paste(cols, collapse = '+'),
'new_column_name'))
Here we tell mutate_ (spot the _) what to do via paste() which yields "A+B+C", and use setNames to name the new column.
I acknowledge the syntax is somewhat convoluted, but this is related to non-standard evaluation in dplyr. But if you want to do this in the dplyr ecosystem, this is the way to do it.
dplyr's rename functions require the new column name to be passed in as unquoted variable names. However I have a function where the column name is constructed by pasting a string onto an argument passed in and so is a character string.
For example say I had this function
myFunc <- function(df, col){
new <- paste0(col, '_1')
out <- dplyr::rename(df, new = old)
return(out)
}
If I run this
df <- data.frame(a = 1:3, old = 4:6)
myFunc(df, 'x')
I get
a new
1 1 4
2 2 5
3 3 6
Whereas I want the 'new' column to be the name of the string I constructed ('x_1'), i.e.
a x_1
1 1 4
2 2 5
3 3 6
Is there anyway of doing this?
I think this is what you were looking for. It is the use of rename_ as #Henrik suggested, but the argument has an, lets say, interesting, name:
> myFunc <- function(df, col){
+ new <- paste0(col, '_1')
+ out <- dplyr::rename_(df, .dots=setNames(list(col), new))
+ return(out)
+ }
> myFunc(data.frame(x=c(1,2,3)), "x")
x_1
1 1
2 2
3 3
>
Note the use of setNames to use the value of new as name in the list.
Recent updates to tidyr and dplyr allow you to use the rename_with function.
Say you have a data frame:
library(tidyverse)
df <- tibble(V0 = runif(10), V1 = runif(10), V2 = runif(10), key=letters[1:10])
And you want to change all of the "V" columns. Usually, my reference for columns like this comes from a json file, which in R is a labeled list. e.g.,
colmapping <- c("newcol1", "newcol2", "newcol3")
names(colmapping) <- paste0("V",0:2)
You can then use the following to change the names of df to the strings in the colmapping list:
df <- rename_with(.data = df, .cols = starts_with("V"), .fn = function(x){colmapping[x]})
I am trying to create a function in R that takes four arguments, namely:
data frame, number, character 1 and character 2.
What I am trying to have as an output is this:
test_df <- data.frame(col1 = c("matt", "baby"), col2 = c("john", "luck"))
my_function(test_df, 1, "u", "o")
col1 col2
mutt john
buby luck
I was just wondering how should I specifically define the function to take the [number] column the user is entering? For the renaming, I guess the function rename() would be fine. Do I need to substitue with [x,x]?
Thank you!
If you have to create a function that takes a column as an argument you need to split out the data frame and column specification (using gsub() to do the actual replacement):
my_function <- function(df, column, pattern, replacement) {
gsub(pattern, replacement, df[[column]])
}
Which would work like:
my_function(df = test_df, column = 1, pattern = "a", replacement = "u")
## [1] "mutt" "buby"
But, this has the downside that if you want to loop over multiple columns, for example with lapply(), the list specification becomes more complicated:
test_df[] <- lapply(colnames(test_df), my_function, df = test_df, pattern = "a", replacement = "u")
test_df
# col1 col2
# 1 mutt john
# 2 buby luck
Which is much more complicated than:
test_df <- data.frame(test_df, stringsAsFactors = FALSE)
test_df[] <- lapply(test_df, gsub, pattern = "a", replacement = "u")
test_df
# col1 col2
# 1 mutt john
# 2 buby luck
(Note: ensure stringsAsFactors = FALSE for this to work. It's a good idea to use this as the default unless you explicitly want factors anyway)