Changing column names within a function - r

When writing a function, how do I get the new name for baseline to change depending on what the name of my dataset is? With this function the column names become dataset_baseline and dataset_adverse instead of for example Inflation_baseline and Inflation_adverse.
renaming <- function(dataset) {
dataset <- dataset %>%
rename(dataset_baseline = baseline, dataset_adverse = adverse)
return(dataset)
}

Try this :
renaming <- function(dataset,columns) {
call = as.list(match.call())
dataset.name <- toString(call$dataset)
dataset %>% rename_at(columns,funs(paste0(dataset.name,.)))
}
dataset <- renaming(dataset,c("baseline","adverse"))
NOTE : You should not try to assign dataset from within your function : it won't work because the 'dataset' there would refer to a local variable of your function.

Related

How to rename an output dataset within a function with R?

I made a function which its outputs are three different datasets in .csv format, but, I'll like that the name of the original dataset appears in the name of the output dataset.
For example:
If the name of the original dataset is "microbial_mat1", I'll like that output was "microbial_mat1_output1.csv", because I only get "_output1.csv".
Is there a way to do this?
My function looks like the following code:
myFunction <- function(original_dataset,
parameter1,
parameter2 = TRUE){
a long bunch of code
if(parameter2){
write.csv(dataset_temporal, "_output.csv")
} else{
print("No parameter2")
}
Thanks in advance for your help.
We may need to extract the object name. One option is to use deparse/substitute at the top of the function on the original_dataset and use that (nm1) with paste to create the file name
myFunction <- function(original_dataset,
parameter1,
parameter2 = TRUE){
nm1 <- deparse(substitute(original_dataset))
...
...
if(parameter2){
write.csv(dataset_temporal, paste0(nm1, "_output.csv"))
} else{
print("No parameter2")
}

R function used to rename columns of a data frames

I have a data frame, say acs10. I need to relabel the columns. To do so, I created another data frame, named as labelName with two columns: The first column contains the old column names, and the second column contains names I want to use, like the table below:
column_1
column_2
oldLabel1
newLabel1
oldLabel2
newLabel2
Then, I wrote a for loop to change the column names:
for (i in seq_len(nrow(labelName))){
names(acs10)[names(acs10) == labelName[i,1]] <- labelName[i,2]}
, and it works.
However, when I tried to put the for loop into a function, because I need to rename column names for other data frames as well, the function failed. The function I wrote looks like below:
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
print(varName[i,1])
print(varName[i,2])
print(names(dataF))
}
}
renameDF(acs10, labelName)
where dataF is the data frame whose names I need to change, and varName is another data frame where old variable names and new variable names are paired. I used print(names(dataF)) to debug, and the print out suggests that the function works. However, the calling the function does not actually change the column names. I suspect it has something to do with the scope, but I want to know how to make it works.
In your function you need to return the changed dataframe.
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
}
return(dataF)
}
You can also simplify this and avoid for loop by using match :
renameDF <- function(dataF,varName){
names(dataF) <- varName[[2]][match(names(dataF), varName[[1]])]
return(dataF)
}
This should do the whole thing in one line.
colnames(acs10)[colnames(acs10) %in% labelName$column_1] <- labelName$column_2[match(colnames(acs10)[colnames(acs10) %in% labelName$column_1], labelName$column_1)]
This will work if the column name isn't in the data dictionary, but it's a bit more convoluted:
library(tibble)
df <- tribble(~column_1,~column_2,
"oldLabel1", "newLabel1",
"oldLabel2", "newLabel2")
d <- tibble(oldLabel1 = NA, oldLabel2 = NA, oldLabel3 = NA)
fun <- function(dat, dict) {
names(dat) <- sapply(names(dat), function(x) ifelse(x %in% dict$column_1, dict[dict$column_1 == x,]$column_2, x))
dat
}
fun(d, df)
You can create a function containing just on line of code.
renameDF <- function(df, varName){
setNames(df,varName[[2]][pmatch(names(df),varName[[1]])])
}

Parsing colnames text string as expression in R

I am trying to create a large number of data frames in a for loop using the "assign" function in R. I want to use the colnames function to set the column names in the data frame. The code I am trying to emulate is the following:
county_tmax_min_df <- data.frame(array(NA,c(length(days),67)))
colnames(county_tmax_min_df) <- c('Date',sd_counties$NAME)
county_tmax_min_df$Date <- days
The code I have so far in the loop looks like this:
file_vars = c('file1','file2')
days <- seq(as.Date("1979-01-01"), as.Date("1979-01-02"), "days")
f = 1
for (f in 1:2){
assign(paste0('county_',file_vars[f]),data.frame(array(NA,c(length(days),67))))
}
I need to be able to set the column names similar to how I did in the above statement. How do I do this? I think it needs to be something like this, but I am unsure what goes in the text portion. The end result I need is just a bunch of data frames. Any help would be wonderful. Thank you.
expression(parse(text = ))
You can set the names within assign, like that:
file_vars = c('file1', 'file2')
days <- seq.Date(from = as.Date("1979-01-01"), to = as.Date("1979-01-02"), by = "days")
for (f in seq_along(file_vars)) {
assign(x = paste0('county_', file_vars[f]),
value = {
df <- data.frame(array(NA, c(length(days), 67)))
colnames(df) <- paste0("fancy_column_",
sample(LETTERS, size = ncol(df), replace = TRUE))
df
})
}
When in {} you can use colnames(df) or setNames to assign column names in any manner desired. In your first piece of code you are referring to sd_counties object that is not available but the generic idea should work for you.

Mutate a column selected by a string converted to symbol

I'm trying to make lower case a column of my dataset
I wrote a basic stupid function
library(dplyr)
cleaning_tags<-function(data,col)
{
data<-data%>%mutate(!!sym(col)=tolower(!!sym(col)))
return (data)
}
where data is a data.frame and column is column name as a string
I don't know the error I'm getting
Error: unexpected '=' in "data%>%dplyr::mutate(!!sym("GROUPDSC") ="
The sym operator seems to work correctly because if I'm trying to do
data%>%select(!!sym(col))
it select the desired column.
Thanks.
Try using := when assigning values to column
library(dplyr)
library(rlang)
cleaning_tags<-function(data,col) {
data %>% mutate(!!sym(col) := tolower(!!sym(col)))
}
df <- data.frame(a = c("ABC", "DEF"))
cleaning_tags(df, "a")
# a
#1 abc
#2 def
There are different strange things in your code. The function does not return anything (you forgot to return data), you can't assign the new column name like this and the code is hard to read.
library(tidyverse)
cleaning_tags<-function(data, col) {
data %>%
mutate_at(col, toupper)
}
ir <- cleaning_tags(iris, "Species")

r: lapply but with dynamic naming

Let's say I have 5 datasets in a list (each named df_1, df_2, and so on), each with a variable called cons. I'd like to execute a function over cons in each dataset in the list, and create a new variable whose name has the suffix of the corresponding dataset.
So in the end df_1 will have a variable called something like cons_1 and df_2 will have a variable called cons_2. The problem I run into is the variable looping and trying to create dynamic names.
Any suggestions?
This is actually pretty straightforward:
df_names <- paste("df", 1:5, sep = "_")
cons_names <- paste("cons", 1:5, sep = "_")
for (i in 1:5) {
# get the df from the current env by name
df_i <- get(df_names[i])
# do whatever you need to do and assign the result
df_i[[cons_names[i]]] <- some_operation(df_i)
}
But it would make more sense to keep your data frames in a list to avoid using get, which can be sketchy:
for (i in 1:5) {
df_i[[cons_names[i]]] <- some_operation(df_list[[i]])
}
Using the purrr package, this would be an alternative solution:
library(purrr)
lst <- list(mtcars_1 = mtcars,
mtcars_2 = mtcars,
mtcars_3 = mtcars,
mtcars_4 = mtcars,
mtcars_5 = mtcars)
map(seq_along(lst), function(x) {
lst[[x]][paste0("mpg_", x)] <- some_operation(lst[[x]]['mpg']); lst[[x]]
})
Subset each data frame from the list, create the new mpg variable with the index of the current data frame and perform whatever operation you want on the mpg variable. The result is a list with all data previous data frames with the new variable for each data frame.
Since this new list doesn't have the data frame names, you can always just add them with setNames(newlist, names(lst))

Resources