I have the following function where there is only one parameter, df. df is dataframe:
test_function <- function(df) {
df_name <- df #get name of dataframe (does not work)
df_name
}
test_function(mtcars)
How do I return name of the dataset from this function? For test_function(mtcars) I need to assign string mtcars to df_name.
You can use the combo substitute + deparse
test_function <- function(df)
deparse(substitute(df))
test_function(mtcars)
##[1] "mtcars"
Another option is to use ??match.call
returns a call in which all of the specified arguments are specified by their full names.
test_function <- function(df){
as.list(match.call())[-1]
}
test_function(mtcars)
$df
mtcars
Related
How does one create a named list of all dataframes/tibbles in the global environment in R? Is there a way to do this without manually hardcoding all dataframes/tibbles?
I.e. if the global environment contains the dataframes/tibbles df_1, my_data_1, science_1, all_data, how does one create an output that looks like:
files_list <- list(
df_1 = df_1,
my_data_1 = my_data_1,
science_1 = science_1,
all_data = all_data
)
We may Filter the elements that are data.frame or tibble in the environment that we are working on - e.g. in the global env, it can be
Filter(length, eapply(.GlobalEnv,
function(x) if(is.data.frame(x)||is_tibble(x)) x))
We can get all objects first, then keep only the data.frames
library(purrr)
mget(ls()) %>% keep(is.data.frame)
A base way, combining methods of #GuedesBF and #akrun could be using ls, mget and Filter.
Filter(is.data.frame, mget(ls()))
#Filter(is.data.frame, mget(ls(.GlobalEnv))) #More explicit using globEnv
Please try the below code which will generate a df
naml <- list()
for (i in seq_along(ls(envir =.GlobalEnv))) {
j <- ls(envir =.GlobalEnv)[i]
if (any(class(get(j))=='data.frame')) name <- {j} else name <- NA
if (any(class(get(j))=='data.frame')) class <- class(get(j))[3] else class <- NA
if (!is.na(name) & !is.na(class)) {
df <- data.frame(namex=name,classx=class)
naml[[j]] <- df
}
}
df2 <- do.call(rbind, naml) %>% rownames_to_column('name') %>%
pivot_wider(names_from = name, values_from = namex)
I am trying to figure out how to write a function in R than can select specific columns from a dataframe(df) for subsetting:
Essentially I have df with columns or colnames : count_A.x, count_B.x, count_C.x, count_A.y, count_B.y, count_C.y.
I would ideally like a function where I can select both "count_A.x" and "count_A.y" columns by simply specifying count_A in function argument.
I tried the following:
e.g. pull_columns2 <- function(df,count_char){
df_subset<- df%>% select(,c(count_char.x, count_char.y))
}
Unfortunately when I run the above code [i.e., pull_columns2(df, count_A)] the following code it rightfully says that column count_char.x does not exist and does not "convert" count_char.x to count_A
pull_columns2(df, count_A)
We can use
pull_columns2 <- function(df,count_char){
df_subset<- df %>% select(contains(count_char))
df_subset
}
#> then use it as follows
df %>% pull_columns2("count_A")
Try
select_func = function(df, pattern){
return(df[colnames(df)[which(grepl(pattern, colnames(df)))]])
}
df = data.frame("aaa" = 1:10, "aab" = 1:10, "bb" = 1:10, "ca" = 1:10)
select_func(df,"b")
I want to manipulate the names of all the columns in a dataframe with this function that I wrote:
clean_names <- function(df) {
names(df) <- tolower(names(df))
names(df) <- gsub('\\s', '\\_', names(df))
names(df) <- gsub('\\(|\\)|\\/|,|\\.', '\\_', names(df))
names(df) <- gsub('(\\_)\\_', '\\1', names(df))
names(df) <- gsub('\\_$', '', names(df))
}
That said, when actually called, it doesn't do anything (no error just nothing).
What's the problem here?
I suspect the problem is that I'm only assigning things and not returning anything. But in this case I don't want to return a value just change the column names.
The only parameter here is df and I'm calling the names() function multiple times. Shouldn't this work? Any help is appreciated!
Two things here:
R tends to not operate in side-effect, so while you may pass a data.frame in to it, the first time you change anything about it, the df in the function is completely copied into a new object that will go away when the function is done. The original frame is untouched. There are some functions in R that operate in side-effect, but most of R is not. With this, you cannot just make changes inside the function and assume that it will have an effect outside of the function. For this, you would need to reassign the results back to the frame, as in:
mydata <- clean_names(mydata)
When there is no literal return(.) statement in a function, R will return the last expression (often invisibly). You will often see functions end with the desired object (df here) without using the literal return function; that function is useful in some circumstances but usually not needed.
The last expression is usually invisible. You can see what is really happening by capturing the return value in a new variable or, as a shortcut, just (clean_names(mydata)). My gut feeling is that the output from that function is a vector of strings.
Why? Because the last expression is a reassignment of names. The RHS of that assignment is producing a character vector, and that is passed to the `names<-` function on the LHS, and that value (the vector of strings) is then used as the return value of the function.
The resolution here is to add df (or return(df) if you must) to the end of your function, as in:
clean_names <- function(df) {
names(df) <- tolower(names(df))
names(df) <- gsub('\\s', '\\_', names(df))
names(df) <- gsub('\\(|\\)|\\/|,|\\.', '\\_', names(df))
names(df) <- gsub('(\\_)\\_', '\\1', names(df))
names(df) <- gsub('\\_$', '', names(df))
df
}
After doing both of those steps, you should then get data.
From the names documentation:
For names<-, the updated object. (Note that the value of names(x) <- value is that of the assignment, value, not the return value from the left-hand side.)
Therefore you should try:
clean_names <- function(df) {
names(df) <- tolower(names(df))
names(df) <- gsub('\\s', '\\_', names(df))
names(df) <- gsub('\\(|\\)|\\/|,|\\.', '\\_', names(df))
names(df) <- gsub('(\\_)\\_', '\\1', names(df))
names(df) <- gsub('\\_$', '', names(df))
return(df)
}
I want to write a helper function that summarizes the percentage change for column A, B and C in one shot. I want to pass a string to the "mutate" argument of dplyr with the help of rlang. Unfortunately, I get an error saying that I have an unexpected ",". Could you please take a look? Thanks in advance!
library(rlang) #read text inputs and return vars
library(dplyr)
set.seed(10)
dat <- data.frame(A=rnorm(10,0,1),
B=rnorm(10,0,1),
C=rnorm(10,0,1),
D=2001:2010)
calc_perct_chg <- function(input_data,
target_Var_list,
year_Var_name){
#create new variable names
mutate_varNames <- paste0(target_Var_list,rep("_pct_chg = ",length(target_Var_list)))
#generate text for formula
mutate_formula <- lapply(target_Var_list,function(x){output <- paste0("(",x,"-lag(",x,"))/lag(",x,")");return(output)})
mutate_formula <- unlist(mutate_formula) #convert list to a vector
#generate arguments for mutate
mutate_args <<- paste0(mutate_varNames,collapse=",",mutate_formula)
#data manipulation
output <- input_data %>%
arrange(!!parse_quo(year_Var_name,env=caller_env())) %>%
mutate(!!parse_quo(mutate_args,env=caller_env()))
#output data frame
return(output)
}
# error: unexpected ','
calc_perct_chg(input_data =dat,
target_Var_list=list("A","B","C"),
year_Var_name="D")
I don't think it's a good idea to evaluate string as code, also I think you are over-complicating it. Using across this should be easier.
library(dplyr)
calc_perct_chg <- function(input_data,
target_Var_list,
year_Var_name){
input_data %>%
arrange(across(all_of(year_Var_name))) %>%
mutate(across(all_of(target_Var_list), ~(.x - lag(.x))/lag(.x)))
}
calc_perct_chg(input_data = dat,
target_Var_list = c("A","B","C"),
year_Var_name = "D")
I would like to create a function that updates a data frame from a different environment. Specifically, I would like to update the labels of a data frame using the Hmisc::label() function.
assign_label <- function(df, col) {
col <- rlang::as_name(rlang::ensym(col))
Hmisc::label(df[,col]) <- fetch_label(col)
}
fetch_label <- function(col) {
val <- c("mpg" = "MPG",
"hp" = "HP")
unname(val[col])
}
The following code executes without issue: assign_label(mtcars, hp)
However, it does not actually alter the data frame in the calling environment. I just can't figure out how to make it do what I imagine.
Ideally, I would like to be able to pipe a dataframe to this function as such:
mtcars %>% assign_label(mpg)
1) Return modified object Modifying objects in place is discouraged in R. The usual way to do this is to return the data frame and then assign it to a new name or back to the original name clobbering or shadowing it.
assign_label <- function(df, col) {
col <- deparse(substitute(col))
Hmisc::label(df[[col]]) <- fetch_label(col)
df
}
mtcars_labelled <- mtcars %>% assign_label(mpg)
2) magrittr Despite what we have said above there are some facilties for modifying in place in R and in some R packages. The magrittr package provides a syntax for overwriting or shadowing the input. Using the definition in (1) we can write:
library(mtcars)
mtcars %<>% assign_label(mpg)
If mtcars were in the global environment it would ovewrite it with the new value but in this case mtcars is in datasets so a new mtcars is written to the caller and the original in datasets is unchanged.
3) replacement function Although not widely used, R does provide replacement functions which are defined and used like this. This does overwite or shadow the input.
`assign_label<-` <- function(df, value) {
Hmisc::label(df[[value]]) <- fetch_label(value)
df
}
assign_label(mtcars) <- "mpg"
Note
As an aside, if the aim is for an interface that is consistent with tidyverse then use tidyselect to retrieve the column name(s) so that examples like the following work:
assign_labels <- function(df, col) {
nms <- names(select(df, {{col}}))
for(nm in nms) Hmisc::label(df[[nm]]) <- fetch_label(nm)
df
}
mtcars_labelled <- mtcars %>% assign_labels(starts_with("mp"))
str(mtcars_labelled)
mtcars_labelled <- mtcars %>% assign_labels(mpg|hp)
str(mtcars_labelled)
In regards to the comments about not modifying outside of the scope of a function, I created two functions that assign new dataframes with labels.
fetch_label <- function(col) {
val <- c("mpg" = "MPG",
"hp" = "HP")
unname(val[col])
}
assign_label <- function(df, col) {
col <- rlang::as_name(rlang::ensym(col))
Hmisc::label(df[[col]]) <- fetch_label(col)
return(df)
}
assign_labels <- function(df) {
purrr::iwalk(df, function(.x, .y) {
lab <- fetch_label(.y)
Hmisc::label(df[[col]]) <<- lab
})
return(df)
}
mtcars <- mtcars %>% assign_label(hp)
mtcars <- mtcars %>% assign_labels()