How to find object name passed to function - r

I have a function which takes a dataframe and its columns and processes it in various ways (left out for simplicity). We can put in column names as arguments or transform columns directly inside function arguments (like here). I need to find out what object(s) are passed in the function.
Reproducible example:
df <- data.frame(x= 1:10, y=1:10)
myfun <- function(data, col){
col_new <- eval(substitute(col), data)
# magic part
object_name <- ...
# magic part
plot(col_new, main= object_name)
}
For instance, the expected output for myfun(data= df, x*x) is the plot plot(df$x*df$x, main= "x"). So the title is x, not x*x. What I have got so far is this:
myfun <- function(data, col){
colname <- tryCatch({eval(substitute(col))}, error= function(e) {geterrmessage()})
colname <- gsub("' not found", "", gsub("object '", "", colname))
plot(eval(substitute(col), data), main= colname)
}
This function gives the expected output but there must be some more elegant way to find out to which object the input refers to. The answer must be with base R.

Use substitute to get the expression passed as col and then use eval and all.vars to get the values and name.
myfun <- function(data, col){
s <- substitute(col)
plot(eval(s, data), main = all.vars(s), type = "o", ylab = "")
}
myfun(df, x * x)
Anothehr possibility is to pass a one-sided formula.
myfun2 <- function(formula, data){
plot(eval(formula[[2]], data), main = all.vars(formula), type = "o", ylab = "")
}
myfun2(~ x * x, df)

The rlang package can be very powerful when you get a hang of it. Does something like this do what you want?
library(rlang)
myfun <- function (data, col){
.col <- enexpr(col)
unname(sapply(call_args(.col), as_string))
}
This gives you back the "wt" column.
myfun(mtcars, as.factor(wt))
# [1] "wt"
I am not sure your use case, but this would work for multiple inputs.
myfun(mtcars, sum(x, y))
# [1] "x" "y"
And finally, it is possible you might not even need to do this, but rather store the expression and operate directly on the data. The tidyeval framework can help with that as well.

Related

Error on final object when generating ggplot objects in for loop with dplyr select()

I want to make many plots using multiple pairs of variables in a dataframe, all with the same x. I store the plots in a named list. For simplicity, below is an example with only 1 variable in each plot.
Key to this function is a select() call that is clearly not necessary here but is with my actual data.
The body of the function works fine on each variable, but when I loop through a list of variables, the last one in the list always produces
Error in get(ll): object 'd' not found.
(or whatever the last variable, if not 'd'). Replacing data <- df %>% select(x,ll) with data <- df avoids the error.
## make data
df2 <- data.frame(x = 1:10,
a = 1:10,
b = 2:11,
c = 101:110,
d = 10*(1:10))
## make function
testfun <- function(df = df2, vars = letters[1:4]){
## initialize list to store plots
plotlist <- list()
for (ll in vars){
## subset data
data <- df %>% select(x, ll) ## comment out select() to get working function
# print(data) ## uncomment to check that dataframe subset works correctly
## plot variable vs. x
p <- ggplot(data,
aes(x = x, y = get(ll))) +
geom_point() +
ylab(ll)
## add plot to named list
plotlist[[ll]] <- p
# print(p) ## uncomment to see that each plot is being made
}
return(plotlist) ## unnecessary, being explicit for troubleshooting
}
## use function
pl <- testfun(df2)
## error ?
pl
I have a work-around that avoids select() by renaming variables in my actual dataframe, but I am curious why this does not work? Any ideas?
get() could work, but not with ll directly. Try y = get(!!ll) or y = {{ll}}.
ggplot (or maybe aes, it's hard to tell) waits to run this code until its plot object is referenced, as the error in the provided code demonstrates. By the time each ggplot evaluates get(ll), the for loop has already finished. So ll evaluates to the last value of the loop variable, "d", for all four ggplots. ll being "d" in the error makes it seem like it's the final ggplot object that fails, but it's actually evaluating the first one that causes this error.
In the body of the loop we'd like a way to evaluate the ll variable and stick that resulting string ("a", "b", "c", or "d") into this code, the rest of which won't run until later. Changing y = get(ll) to y = get(!!ll) is one way to do this: !! performs "surgery" on the unevaluated expression (called a "blueprint for code" in Tidyverse docs) so that the expression passed into ggplot contains a literal string like "a" instead of the variable reference ll.
testfun <- function(df = df2, vars = letters[1:4]){
plotlist <- list()
for (ll in vars){
data <- df %>% select(x, ll)
p <- ggplot(data,
aes(x = x, y = get(!!ll))) +
geom_point() +
ylab(ll)
plotlist[[ll]] <- p
}
return(plotlist)
}
Read on for explanation and an alternate solution.
The loop problem: late binding
In a given function or in the global scope in R, there's just one variable of any given name. A for (x in xs) loop repeatedly rebinds that variable to a new value. That means that after a for loop has finished, that variable still exists and retains the last value it was assigned. Here's a way this can trip you up:
vars <- c("a", "b", "c", "d")
results <- list()
for (ll in vars){
message("in for loop, ll: ", ll)
func <- function () { ll }
results[[ll]] <- c(ll, func)
}
message("after for loop, ll: ", ll)
# after for loop, now ll is "d"
for (vec in results) {
message(vec[[1]], " ", vec[[2]]())
}
This outputs
in for loop, ll: a
in for loop, ll: b
in for loop, ll: c
in for loop, ll: d
after for loop, ll: d
a d
b d
c d
d d
Each of the four functions constructed here use the same outer scope variable ll which, by the time the functions are actually called after the for loop, is "d". The late binding part is that the value of the variable at function call time (late) is used when looking up its value, not the value of the variable when the function is defined (early).
The NSE problem
The OP isn't creating functions in a loop though, they're calling ggplot. ggplot does something similar to creating a function: it takes some code as an argument that it doesn't evaluate until later. ggplot (or maybe aes) "captures" code from some of arguments instead of running them. In OP's case, get(ll) isn't evaluated until later.
When this code is evaluated it's in a new context with a "data mask" that allows names of a data frame to be referenced directly. This part is great, it's what we want — this is what makes get("a") work at all. But the fact that the evaluation happens later is a problem for the OP: ll in get(ll) evaluated to "d", like get("d"), because the code is evaluated after the for-loop iteration where ll had the expected value.
Ignoring the data mask part, here's a function called run.later that, like ggplot, doesn't run one of its arguments. When we run that code later, we again find that ll evaluates to "d" for all four of the saved expressions.
vars <- c("a", "b", "c", "d")
unevaluated.exprs <- list();
run.later <- function(name, something) {
expr <- substitute(something)
unevaluated.exprs[[name]] <<- c(name, expr)
}
for (ll in vars){
run.later(ll, ll)
}
for (vec in unevaluated.exprs) {
message(c(vec[[1]], " ", eval(vec[[2]])))
}
prints
a d
b d
c d
d d
That's the ll part of the problem. The rule of thumb from languages like Python of "Don't define functions in a loop (if they reference loop variables)" could be generalized for R to "don't define functions or otherwise write code that won't be immediately evaluated in a loop (if that code references loop variables)."
Fixing the scope problem instead of metaprogramming
The !! solution provided at the top uses metaprogramming to evaluate the ll variable in the loop instead of evaluating it later.
Theoretically, one could instead dynamically create variables in each iteration of a loop, then carefully reference that dynamically created variable name with metaprogramming. But a more elegant way would be to use the same variable name but in different scopes. This is what Nithin's answer does with a function: every function creates a new scope and tada, you can use the same variable name in each. Here's another version of that, closer to OP's code:
testfun <- function(df = df2, vars = letters[1:4]){
plotlist <- list()
plot.fn <- function(var) {
data <- df %>% select(x, var)
p <- ggplot(data,
aes(x = x, y = get(var))) +
geom_point() +
ylab(var)
plotlist[[ll]] <<- p
}
for (ll in vars){
plot.fn(ll)
}
return(plotlist)
}
pl <- testfun(df2)
pl
There are 4 distinct variables called var in this code, and each iteration of the loop references a different one.
Prettier metaprogramming
I think (haven't tested) that get(!!ll) is equivalent to {{ll}} here — get() looks up a string as a variable, but that's also what sticking the symbol of the string that ll evaluates to into the expression does. Double curlies seem more common and can roughly be understood as "evaluate the result of this expression as a variable in the other context," or as "template this string into the expression."
write a custom function like this
plot_fn<- function(df,y){
df %>% ggplot(aes(x=x,
y=get(y))+
geom_point()+
ylab(y)
}
Iterate over plots with purrr:::map
map(letters[1:4],~plot_fn(df=df2,y=.x))
The issue is that we cannot use get to access dplyr/tidyverse data in a "programming" paradigm. Instead, we should use non standard evaluation to access the data. I offer a simplified function below (originally I thought it was a function masking issue as I quickly skimmed the question).
testfun <- function(df = df2, vars = letters[1:4]){
lapply(vars, function(y) {
ggplot(df,
aes(x = x, y = .data[[y]] )) +
geom_point() +
ylab(y)
})
}
Calling
plots <- testfun(df2)
plots[[1]]
EDIT
Since OP would like to know what the issue is, I have used a traditional loop as requested
testfun2 <- function(df = df2, vars = letters[1:4]){
## initialize list to store plots
plotlist <- list()
for (ll in vars){
## subset data
d_t <- df %>% select(x, ll) ## comment out select() to get working function
# print(data) ## uncomment to check that dataframe subset works correctly
## plot variable vs. x
p <- ggplot(d_t,
aes(x = x, y = .data[[ll]])) +
geom_point() +
ylab(ll)
## add plot to named list
plotlist[[ll]] <- p
## uncomment to see that each plot is being made
}
plotlist
}
pl <- testfun2(df2)
pl[[1]]
The reason get does not work is that we need to use non-standard evaluation as the docs state. Related questions on using get may be useful.
First plot

Convert list of symbols to character string in own function

I have the following data frame:
dat <- data.frame(time = runif(20),
group1 = rep(1:2, times = 10),
group2 = rep(1:2, each = 10),
group3 = rep(3:4, each = 10))
I'm now writing a function my_function that takes the following form:
my_function(data, time_var = time, group_vars = c(group1, group2))
If I'm not mistaken, I'm passing the group_vars as symbols to my function, right?
However, within my function I want to first do some error checks if the variables passed to the function exist in the data. For the time variable I was successful, but I don't know how I can turn my group_vars list into a vector of strings so that it looks like c("group1", "group2").
My current function looks like:
my_function <- function (data, time_var = NULL, group_vars = NULL)
{
time_var <- enquo(time_var)
time_var_string <- as_label(time_var)
group_vars <- enquos(group_vars)
# is "time" variable part of the dataset?
if (!time_var_string %in% colnames(data))
{
stop(paste0("The variable '", time_var_string, "' doesn't exist in your data set. Please check for typos."))
}
}
And I want to extend the latter part so that I can also do some checks in the form of !group_vars %in% colnames(data). I know I could pass the group_var variables already as a vector of strings to the function, but I don't want to do that for other reasons.
enquos is the wrong function here: it operates on multiple arguments, but you’re only passing a single argument. Just use enquo. However, either way the result isn’t directly usable, because you don’t get a vector of unevaluated names — you get an unevaluated c call.
Working with this is a bit more convoluted, I’m afraid:
group_vars_expr = quo_squash(group_vars)
group_var_names = if (is_symbol(group_vars_expr)) {
as_name(group_vars_expr)
} else {
stopifnot(is_call(group_vars_expr))
stopifnot(identical(group_vars_expr[[1L]], sym('c')))
stopifnot(all(purrr::map_lgl(group_vars_expr[-1L], is_symbol)))
purrr::map_chr(group_vars_expr[-1L], as_name)
}
stopifnot(all(group_var_names %in% colnames(data)))
If you'd like to use c() in this way, chances are you need selections. One easy way to take selections in an argument is to interface with dplyr::select():
my_function <- function(data, group_vars = NULL) {
group_vars <- names(dplyr::select(data, {{ group_vars }}))
group_vars
}
mtcars %>% my_function(c(cyl, mpg))
#> [1] "cyl" "mpg"
mtcars %>% my_function(starts_with("d"))
#> [1] "disp" "drat"

Evaluate listed strings to create function object in r

I need a function created by a list of commands to fully evaluate so that it is identical to the "manual" version of the function.
Background: I am using ScaleR functions in Microsoft R Server and need to apply a set of transformations as a function. ScaleR is very picky about needing to be passed a function that is phrased exactly as specified below:
functionThatWorks <- function(data) {
data$marital_status_p1_ismarried <- impute(data$marital_status_p1_ismarried)
return(data)
}
I have a function that creates this list of transformations (and hundreds more, hence the need to functionalize its writing).
transformList <- list ("data$ismarried <- impute(data$ismarried)",
"data$issingle <- impute(data$issingle)")
This line outputs the evaluated string that I want to the console, but I am unaware of a way to move it from console output to being used in a function:
cat(noquote(unlist(bquote( .(noquote(transformList[1]))))))
I need to evaluate functionIWant so that it is identical to functionThatWorks.
functionIWant <- function(data){
eval( cat(noquote(unlist(bquote( .(noquote(transformList[1])))))) )
return(data)
}
identical(functionThatWorks, functionIWant)
EDIT: Adding in the answer based on #dww 's code. It works well in ScaleR. It is identical, minus meaningless spacing.
functionIWant <- function(){}
formals(functionIWant) <- alist(data=NULL)
functionIWant.text <- parse(text = c(
paste( bquote( .(noquote(transformList[1]))), ";", "return(data)\n")
))
body(functionIWant) <- as.call(c(as.name("{"), functionIWant.text))
Maybe something like this?
# 1st define a 'hard-coded' function
f1 <- function (x = 2)
{
y <- x + 1
y^2
}
f1(3)
# [1] 16
# now create a similar function from a character vector
f2 <- function(){}
formals(f2) <- alist(x=2)
f2.text <- parse(text = c('y <- x + 1', 'y^2'))
body(f2) <- as.call(c(as.name("{"), f2.text))
f2(3)
# [1] 16

How do I convert this for loop into something cooler like by in R

uniq <- unique(file[,12])
pdf("SKAT.pdf")
for(i in 1:length(uniq)) {
dat <- subset(file, file[,12] == uniq[i])
names <- paste("Sample_filtered_on_", uniq[i], sep="")
qq.chisq(-2*log(as.numeric(dat[,10])), df = 2, main = names, pvals = T,
sub=subtitle)
}
dev.off()
file[,12] is an integer so I convert it to a factor when I'm trying to run it with by instead of a for loop as follows:
pdf("SKAT.pdf")
by(file, as.factor(file[,12]), function(x) { qq.chisq(-2*log(as.numeric(x[,10])), df = 2, main = paste("Sample_filtered_on_", file[1,12], sep=""), pvals = T, sub=subtitle) } )
dev.off()
It works fine to sort the data frame by this (now a factor) column. My problem is that for the plot title, I want to label it with the correct index from that column. This is easy to do in the for loop by uniq[i]. How do I do this in a by function?
Hope this makes sense.
A more vectorized (== cooler?) version would pull the common operations out of the loop and let R do the book-keeping about unique factor levels.
dat <- split(-2 * log(as.numeric(file[,10])), file[,12])
names(dat) <- paste0("IoOPanos_filtered_on_pc_", names(dat))
(paste0 is a convenience function for the common use case where normally one would use paste with the argument sep=""). The for loop is entirely appropriate when you're running it for its side effects (plotting pretty pictures) rather than trying to capture values for further computation; it's definitely un-cool to use T instead of TRUE, while seq_along(dat) means that your code won't produce unexpected results when length(dat) == 0.
pdf("SKAT.pdf")
for(i in seq_along(dat)) {
vals <- dat[[i]]
nm <- names(dat)[[i]]
qq.chisq(val, main = nm, df = 2, pvals = TRUE, sub=subtitle)
}
dev.off()
If you did want to capture values, the basic observation is that your function takes 2 arguments that vary. So by or tapply or sapply or ... are not appropriate; each of these assume that just a single argument is varying. Instead, use mapply or the comparable Map
Map(qq.chisq, dat, main=names(dat),
MoreArgs=list(df=2, pvals=TRUE, sub=subtitle))

Entering variables into regression function

I have this feature_list that contains several possible values, say "A", "B", "C" etc. And there is time in time_list.
So I will have a loop where I will want to go through each of these different values and put it in a formula.
something like for(i in ...) and then my_feature <- feature_list[i] and my_time <- time_list[i].
Then I put the time and the chosen feature to a dataframe to be used for regression
feature_list<- c("GPRS")
time_list<-c("time")
calc<-0
feature_dim <- length(feature_list)
time_dim <- length(time_list)
data <- read.csv("data.csv", header = TRUE, sep = ";")
result <- matrix(nrow=0, ncol=5)
errors<-matrix(nrow=0, ncol=3)
for(i in 1:feature_dim) {
my_feature <- feature_list[i]
my_time <- time_list[i]
fitdata <- data.frame(data[my_feature], data[my_time])
for(j in 1:60) {
my_b <- 0.0001 * (2^j)
for(k in 1:60) {
my_c <- 0.0001 * (2^k)
cat("Feature: ", my_feature, "\t")
cat("b: ", my_b, "\t")
cat("c: ", my_c, "\n")
err <- try(nlsfit <- nls(GPRS ~ 53E5*exp(-1*b*exp(-1*c*time)), data=fitdata, start=list(b=my_b, c=my_c)), silent=TRUE)
calc<-calc+1
if(class(err) == "try-error") {
next
}
else {
coefs<-coef(nlsfit)
ess<-deviance(nlsfit)
result<-rbind(result, c(coefs[1], coefs[2], ess, my_b, my_c))
}
}
}
}
Now in the nls() call I want to be able to call my_feature instead of just "A" or "B" or something and then to the next one on the list. But I get an error there. What am I doing wrong?
You can use paste to create a string version of your formula including the variable name you want, then use either as.formula or formula functions to convert this to a formula to pass to nls.
as.formula(paste(my_feature, "~ 53E5*exp(-1*b*exp(-1*c*time))"))
Another option is to use the bquote function to insert the variable names into a function call, then eval the function call.
I worked with R a while ago, maybe you can give this a try:
What you want is create a formula with a list of variables right?
so if the response variable is the first element of your list and the others are the explanatory variables you could create your formula this way:
my_feature[0] ~ reduce("+",my_feature[1:]) . This might work.
this way you can create formulae that depends on the variables in my_features.

Resources