retrieve name from list element r - r

How can you access the name of a list element within a function if you pass not the whole list but only the list element (dataframe)?
I have a named list of dataframes, e.g.
files <- list(BRX = -0.72, BRY = -0.72, BRZ = -0.156, BTX = -0.002, BTY = -0.002,
BTZ = -0.0034)
Later in the code, I will use a single list element as input for a plot function. This plot function shall also print the list element's name. How can I access it?
I have the following solution - it works but is a bit cumbersome:
map2(files, names(files),
function(file, filename) {
data.table::setattr(file, "filename", filename)
})
Later, I can retrieve the filename as attribute within the plot function by:
plotfunction(list_element, ...) {
...
filename <- attr(input, "filename")
...
+ ggtitle(filename)
...
}
Is there a more elegant alternative solution, either by a different way to access the list element name, or by setting the filename attribute differently?

One straightforward approach might be to pass the data to the plot function plot_fun as a list (using single brackets [), instead of as a list element (using double brackets [[). In this way, the list element's name will directly be available inside the plot function:
## dummy list of datasets
data_ls <- list(`Dataset 1` = data.frame(x = 1:10, y = 1:10), `Dataset 2` = data.frame(x = 1:10, y = 2 * (1:10)))
## dummy plot function
plot_fun <- function(data_el, ...) {
plot(data_el[[1]], ...) +
title(names(data_el))
}
plot_fun(data_ls["Dataset 1"], type = "l")
plot_fun(data_ls[2], type = "l")
Edit: to call plot_fun for each list element in data_ls, we could modify plot_fun to accept a data and name argument, and then call lapply, Map, mapply or purrr's walk2 or map2 (walk2 is preferable, since plot_fun is called for its side-effects).
## modified dummy plot function
plot_fun <- function(data, name, ...) {
plot(data, ...) +
title(name)
}
## using lapply
lapply(seq_along(data_ls), function(i) plot_fun(data_ls[[i]], names(data_ls)[i], type = "l"))
## or with Map
Map(plot_fun, data = data_ls, name = names(data_ls), type = "l")
## or with purrr
purrr::walk2(.x = data_ls, .y = names(data_ls), .f = plot_fun, type = "l")

Related

Error on final object when generating ggplot objects in for loop with dplyr select()

I want to make many plots using multiple pairs of variables in a dataframe, all with the same x. I store the plots in a named list. For simplicity, below is an example with only 1 variable in each plot.
Key to this function is a select() call that is clearly not necessary here but is with my actual data.
The body of the function works fine on each variable, but when I loop through a list of variables, the last one in the list always produces
Error in get(ll): object 'd' not found.
(or whatever the last variable, if not 'd'). Replacing data <- df %>% select(x,ll) with data <- df avoids the error.
## make data
df2 <- data.frame(x = 1:10,
a = 1:10,
b = 2:11,
c = 101:110,
d = 10*(1:10))
## make function
testfun <- function(df = df2, vars = letters[1:4]){
## initialize list to store plots
plotlist <- list()
for (ll in vars){
## subset data
data <- df %>% select(x, ll) ## comment out select() to get working function
# print(data) ## uncomment to check that dataframe subset works correctly
## plot variable vs. x
p <- ggplot(data,
aes(x = x, y = get(ll))) +
geom_point() +
ylab(ll)
## add plot to named list
plotlist[[ll]] <- p
# print(p) ## uncomment to see that each plot is being made
}
return(plotlist) ## unnecessary, being explicit for troubleshooting
}
## use function
pl <- testfun(df2)
## error ?
pl
I have a work-around that avoids select() by renaming variables in my actual dataframe, but I am curious why this does not work? Any ideas?
get() could work, but not with ll directly. Try y = get(!!ll) or y = {{ll}}.
ggplot (or maybe aes, it's hard to tell) waits to run this code until its plot object is referenced, as the error in the provided code demonstrates. By the time each ggplot evaluates get(ll), the for loop has already finished. So ll evaluates to the last value of the loop variable, "d", for all four ggplots. ll being "d" in the error makes it seem like it's the final ggplot object that fails, but it's actually evaluating the first one that causes this error.
In the body of the loop we'd like a way to evaluate the ll variable and stick that resulting string ("a", "b", "c", or "d") into this code, the rest of which won't run until later. Changing y = get(ll) to y = get(!!ll) is one way to do this: !! performs "surgery" on the unevaluated expression (called a "blueprint for code" in Tidyverse docs) so that the expression passed into ggplot contains a literal string like "a" instead of the variable reference ll.
testfun <- function(df = df2, vars = letters[1:4]){
plotlist <- list()
for (ll in vars){
data <- df %>% select(x, ll)
p <- ggplot(data,
aes(x = x, y = get(!!ll))) +
geom_point() +
ylab(ll)
plotlist[[ll]] <- p
}
return(plotlist)
}
Read on for explanation and an alternate solution.
The loop problem: late binding
In a given function or in the global scope in R, there's just one variable of any given name. A for (x in xs) loop repeatedly rebinds that variable to a new value. That means that after a for loop has finished, that variable still exists and retains the last value it was assigned. Here's a way this can trip you up:
vars <- c("a", "b", "c", "d")
results <- list()
for (ll in vars){
message("in for loop, ll: ", ll)
func <- function () { ll }
results[[ll]] <- c(ll, func)
}
message("after for loop, ll: ", ll)
# after for loop, now ll is "d"
for (vec in results) {
message(vec[[1]], " ", vec[[2]]())
}
This outputs
in for loop, ll: a
in for loop, ll: b
in for loop, ll: c
in for loop, ll: d
after for loop, ll: d
a d
b d
c d
d d
Each of the four functions constructed here use the same outer scope variable ll which, by the time the functions are actually called after the for loop, is "d". The late binding part is that the value of the variable at function call time (late) is used when looking up its value, not the value of the variable when the function is defined (early).
The NSE problem
The OP isn't creating functions in a loop though, they're calling ggplot. ggplot does something similar to creating a function: it takes some code as an argument that it doesn't evaluate until later. ggplot (or maybe aes) "captures" code from some of arguments instead of running them. In OP's case, get(ll) isn't evaluated until later.
When this code is evaluated it's in a new context with a "data mask" that allows names of a data frame to be referenced directly. This part is great, it's what we want — this is what makes get("a") work at all. But the fact that the evaluation happens later is a problem for the OP: ll in get(ll) evaluated to "d", like get("d"), because the code is evaluated after the for-loop iteration where ll had the expected value.
Ignoring the data mask part, here's a function called run.later that, like ggplot, doesn't run one of its arguments. When we run that code later, we again find that ll evaluates to "d" for all four of the saved expressions.
vars <- c("a", "b", "c", "d")
unevaluated.exprs <- list();
run.later <- function(name, something) {
expr <- substitute(something)
unevaluated.exprs[[name]] <<- c(name, expr)
}
for (ll in vars){
run.later(ll, ll)
}
for (vec in unevaluated.exprs) {
message(c(vec[[1]], " ", eval(vec[[2]])))
}
prints
a d
b d
c d
d d
That's the ll part of the problem. The rule of thumb from languages like Python of "Don't define functions in a loop (if they reference loop variables)" could be generalized for R to "don't define functions or otherwise write code that won't be immediately evaluated in a loop (if that code references loop variables)."
Fixing the scope problem instead of metaprogramming
The !! solution provided at the top uses metaprogramming to evaluate the ll variable in the loop instead of evaluating it later.
Theoretically, one could instead dynamically create variables in each iteration of a loop, then carefully reference that dynamically created variable name with metaprogramming. But a more elegant way would be to use the same variable name but in different scopes. This is what Nithin's answer does with a function: every function creates a new scope and tada, you can use the same variable name in each. Here's another version of that, closer to OP's code:
testfun <- function(df = df2, vars = letters[1:4]){
plotlist <- list()
plot.fn <- function(var) {
data <- df %>% select(x, var)
p <- ggplot(data,
aes(x = x, y = get(var))) +
geom_point() +
ylab(var)
plotlist[[ll]] <<- p
}
for (ll in vars){
plot.fn(ll)
}
return(plotlist)
}
pl <- testfun(df2)
pl
There are 4 distinct variables called var in this code, and each iteration of the loop references a different one.
Prettier metaprogramming
I think (haven't tested) that get(!!ll) is equivalent to {{ll}} here — get() looks up a string as a variable, but that's also what sticking the symbol of the string that ll evaluates to into the expression does. Double curlies seem more common and can roughly be understood as "evaluate the result of this expression as a variable in the other context," or as "template this string into the expression."
write a custom function like this
plot_fn<- function(df,y){
df %>% ggplot(aes(x=x,
y=get(y))+
geom_point()+
ylab(y)
}
Iterate over plots with purrr:::map
map(letters[1:4],~plot_fn(df=df2,y=.x))
The issue is that we cannot use get to access dplyr/tidyverse data in a "programming" paradigm. Instead, we should use non standard evaluation to access the data. I offer a simplified function below (originally I thought it was a function masking issue as I quickly skimmed the question).
testfun <- function(df = df2, vars = letters[1:4]){
lapply(vars, function(y) {
ggplot(df,
aes(x = x, y = .data[[y]] )) +
geom_point() +
ylab(y)
})
}
Calling
plots <- testfun(df2)
plots[[1]]
EDIT
Since OP would like to know what the issue is, I have used a traditional loop as requested
testfun2 <- function(df = df2, vars = letters[1:4]){
## initialize list to store plots
plotlist <- list()
for (ll in vars){
## subset data
d_t <- df %>% select(x, ll) ## comment out select() to get working function
# print(data) ## uncomment to check that dataframe subset works correctly
## plot variable vs. x
p <- ggplot(d_t,
aes(x = x, y = .data[[ll]])) +
geom_point() +
ylab(ll)
## add plot to named list
plotlist[[ll]] <- p
## uncomment to see that each plot is being made
}
plotlist
}
pl <- testfun2(df2)
pl[[1]]
The reason get does not work is that we need to use non-standard evaluation as the docs state. Related questions on using get may be useful.
First plot

Trouble correctly returning object name in for loop

I'm trying to automate making a series of the same plot using different objects; I'm working with S4 class phyloseq objects. When I use a for loop to iterate over a list of objects and try to use the object name as a title for each plot and in a filename for ggsave I can't quite get it to recognize the correct name, though it's making the correct plots for a given object in the list.
I've tried using variations of deparse(substitute(object)) with get() and quote() and end up getting slightly different, but still off-target results.
object_list <- c(object1, object2, object3)
automate_graphs <- function(x){
for(object in x){
name <- deparse(substitute(object))
ordination <- ordinate(object, "NMDS", "bray")
plot <- plot_ordination(object, ordination) + ggtitle(label = name)
ggsave(plot, filename=sprintf("NMDS_bray_%s.pdf", name), height=4, width=7)}}
automate_graphs(object_list)
I'm expecting to save 3 pdfs named NMDS_bray_object1, NMDS_bray_object2, NMDS_bray_object3.
Instead I get NMDS_bray_S4 object of class structure("phyloseq", package = "phyloseq") (so it's saving the deparse of the object to the variable name rather than the substitution) or with quote I get NMDS_bray_object which I suppose is to be expected haha. Thanks in advance for any help!
Just make it a named vector (list) of objects and iterate over the names:
object_list <- c(object1 = object1,object2 = object2,object3 = object3)
automate_graphs <- function(x){
for(nm in names(x)){
object <- x[nm] #Pick out the one named nm
ordination <- ordinate(object, "NMDS", "bray")
plot <- plot_ordination(object, ordination) + ggtitle(label = nm)
ggsave(plot, filename=sprintf("NMDS_bray_%s.pdf", nm), height=4, width=7)
}}
automate_graphs(object_list)
#Joran thanks again for the help--it pushed me in the right direction to figuring out a solution, even if it's not the most elegant. I took the idea of generating a vector of names and then just created an extra variable to cycle through that vector. But this way it maintains the class of object and creates a separate list of corresponding names:
object_list <- c("object1" = object1, "object2" = object2, "object3" = object3)
automate_graphs <- function(x){
names = names(x)
obj_num = 1
for(object in x){
name <- names[obj_num]
ordination <- ordinate(object, "NMDS", "bray")
plot <- plot_ordination(object, ordination) + ggtitle(label = name)
ggsave(plot, filename=sprintf("NMDS_bray_%s.pdf", name), height=4, width=7)
obj_num = obj_num + 1
}
}
automate_graphs(object_list)

How to pass list of arguments to method in R?

I'm trying to pass arguments as a list to a method. I'm creating methods of stuff to pass to a data.frame. Example:
dfApply <- function(df, ...) {
UseMethod("dfApply", df)
}
dfApply.sample <- function(df, size, ...) {
# Stuff
df <- sample_frac(df, size = size)
return(df)
}
Now, if I call the function:
args <- list(size = 0.5)
class(df) <- c("sample", class(df))
df <- dfApply(df, args)
The method still receives it as a list().
Is there a way to pass arguments like this?
EDIT:
As mentioned in the comments, do.call() solves the problem (for now), but I have to define every argument in args:
args <- list(df = df, size = 0.5)
class(df) <- c("sample", class(df))
df <- do.call(dfApply, args)
Is this a wise way to implement methods? Doesn't seem right.

R dplyr 0.7.2 - functional programming. Resolving dataframe name

I am writing a R function using dplyr 0.7.2 syntax to pass input and output data frame names and a column name to sort on. The following is the code I have.
#test data frame creation
lb<- data.frame(study = replicate(25,"ABC"),
subjid = c("x1","x2","x3","x4","x5"),
visit = c("SCREENING","VISIT1","VISIT2","VISIT3","EOT"),
visitn = c(-1,1,2,3,4),
param = c("ALB","AST","HGB","HCT","LDL"),
aval = replicate(5, sample(c(20:100), 1, rep = TRUE)))
#sort function- user to provide input/output df names and column name to sort on
sortdf <- function(ind,outd,col){
col <- enquo(col)
outd <- ind %>% arrange(!!col)
outd <<- outd # return dataframe to workspace
}
sortdf(lb,lb_sort, visitn)
the above code works but the output df name is not getting resolved to lb_sort. output df is named as the name of the associated parameter (outd). Need some help!
Thanks,
Prasanna
You do not need to make use of the << in this context. In effect, your function is a wrapper for arrange:
my_sort <- function(df, col) {
col <- enquo(col)
df %>%
arrange(!!col)
}
my_sort(df = lb, col = visitn)
Then you could create your objects as usual:
my_sort(df = lb, col = visitn) -> sorted_stuff
Edit
As per request, forcing creation of names object in parent environment.
my_sort <- function(df, col, some_name) {
col <- enquo(col)
df %>%
arrange(!!col) -> dta_a
# Gather env. inf
e <- environment() # current environment
p <- parent.env(e)
# Create object in parent env.
assign(x = some_name,
value = dta_a,
envir = p)
# If desired return another object
# return(some_other_data)
}
my_sort(df = lb, col = visitn, some_name ="created_data")
Explanation
e/p objects are used to gather information about functions current and parent environment
assign uses string and creates names object in function's parent environment. Global environment, if called as provided in the example.
Remarks
This is odd behaviour, when called as shown:
>> ls()
[1] "lb" "my_sort"
>> my_sort(df = lb, col = visitn, some_name ="created_data")
>> ls()
[1] "created_data" "lb" "my_sort"
The function leaves "created_data" object in global environment. This is inconsistent with expected behaviour where the user would usually create objects:
my_sort(df = lb, col = visitn) -> created_data
and I wouldn't encourage using it. If the actual problem is concerned with returning multiple objects a potentially better approach may involve packing all the results into a list and returning one list:
list(result_1 = mtcars,
result_2 = airquality)

Pass arguments in nested function to update default arguments

I have nested functions and wish to pass arguments to the deepest function. That deepest function will already have default arguments, so I will be updating those argument values.
My mwe is using plot(), but in reality I'm working with png(), with default height and width arguments.
Any suggestions?
f1 <- function(...){ f2(...)}
f2 <- function(...){ f3(...)}
f3 <- function(...){ plot(xlab="hello1", ...)}
#this works
f1(x=1:10,y=rnorm(10),type='b')
# I want to update the default xlab value, but it fails:
f1(x=1:10,y=rnorm(10),type='b', xlab='hello2')
In your f3(), "hello1" is not a default value for xlab in the list of function's formal arguments. It is instead the supplied value in the function body, so there's no way to override it:
f3 <- function(...){ plot(xlab="hello1", ...)}
I suspect you meant instead to do something like this.
f1 <- function(...){ f2(...)}
f2 <- function(...){ f3(...)}
f3 <- function(..., xlab="hello1") plot(..., xlab=xlab)
## Then check that it works
par(mfcol=c(1,2))
f1(x=1:10,y=rnorm(10),type='b')
f1(x=1:10,y=rnorm(10),type='b', xlab='hello2')
(Do notice that the formal argument xlab must follow the ... argument here, so that it can only be matched exactly (and not by partial matching). Otherwise, in the absence of an argument named xlab, it'll get matched by an argument named x, potentially (and actually here) causing you a lot of grief.)
My usual approach for modifying arguments in ... is as follows:
f1 = function(...) {
dots = list(...)
if (!('ylab' %in% names(dots))) {
dots$ylab = 'hello'
}
do.call(plot, dots)
}
# check results
f1(x = 1:10, y = rnorm(10))
f1(x = 1:10, y = rnorm(10), ylab = 'hi')
What happens here is that ... is captured in a list called dots. Next, R checks if this list dots contains any information about ylab. If there is no information, we set it to a specified value. If there is information, we do nothing. Last, do.call(a, b) is a function that basically stands voor execute function a with arguments b.
edit
This works better with multiple default arguments (and probably also better in general).
f1 = function(...) {
# capture ... in a list
dots = list(...)
# default arguments with their values
def.vals = list(bty = 'n', xlab = 'hello', las = 1)
# find elements in dots by names of def.vals. store those that are NULL
ind = unlist(lapply(dots[names(def.vals)], is.null))
# fill empty elements with default values
dots[names(def.vals)[ind]] = def.vals[ind]
# do plot
do.call(plot, dots)
}
f1(x = 1:10, y = rnorm(10), ylab = 'hi', bty = 'l')

Resources