I am using R and trying to debug a function that I call with do.call() for convenience.
Combining do.call() and browser() is problematic. Basically, all the elements of the list of arguments passed to do.call() are printed, which, if the list contains for example a very large data table, is not sustainable.
Here is a reprex. I create a simple getsum() function that sums elements of a vector. I create a func() function that calls getsum() for a list of vectors.
#getsum returns the sum of a vector's elements
#func returns the vector of the sum of a list of vectors
func <- function(vec_list){
browser()
sums = lapply(FUN=getsum, X=vec_list)
sum = unlist(sums)
return(sum)
}
getsum <- function(vec){
sum = sum(vec)
return(sum)
}
args = list(vec_list=list(rnorm(5), rnorm(5)))
do.call(func, args)
That's the output I get :
Called from: (function(vec_list){
browser()
sums = lapply(FUN=getsum, X=vec_list)
sum = unlist(sums)
return(sum)
})(vec_list = list(c(-0.0801864335418185, 0.448324209935905,
-2.86518616779484, -0.359284963520417, -0.620062639582574), c(1.74835180362954,
-0.904288222154223, 0.746007117029027, 0.625889703799832, -0.908748727897187
)))
Browse[1]>
One might tell me "why are you using do.call()?". Indeed, if I simply call the function myself, the problem does not arise (see below). In this example I don't need to use do.call(), but sometimes it's extremely convenient.
#instead of do.call() use :
func(vec_list=args$vec_list)
The output is then :
Called from: func(vec_list = args$vec_list)
Browse[1]>
EDIT :
I have tried the argument browser(skipCalls=TRUE) that solves the problem but defies the purpose of browser(). It makes R executing all the function's command at once. Other suggestions welcome.
Related
I have made a function that takes as an argument another function, the argument function takes as its argument some object (in the example a vector) which is supplied by the original function. It has been challenging to make the function call in the right way. Below are three approaches I have used after having read Programming with dplyr.
Only Option three works,
I would like to know if this is in fact the best way to evaluate a function within a function.
library(dplyr);library(rlang)
#Function that will be passed as an argument
EvaluateThis1 <- quo(mean(vector))
EvaluateThis2 <- ~mean(vector)
EvaluateThis3 <- quo(mean)
#First function that will recieve a function as an argument
MyFunc <- function(vector, TheFunction){
print(TheFunction)
eval_tidy(TheFunction)
}
#Second function that will recieve a function as an argument
MyFunc2 <- function(vector, TheFunction){
print(TheFunction)
quo(UQ(TheFunction)(vector)) %>%
eval_tidy
}
#Option 1
#This is evaluating vector in the global environment where
#EvaluateThis1 was captured
MyFunc(1:4, EvaluateThis1)
#Option 2
#I don't know what is going on here
MyFunc(1:4, EvaluateThis2)
MyFunc2(1:4, EvaluateThis2)
#Option 3
#I think this Unquotes the function splices in the argument then
#requotes before evaluating.
MyFunc2(1:4, EvaluateThis3)
My question is:
Is option 3 the best/most simple way to perform this evaluation
An explanation of what is happening
Edit
After reading #Rui Barradas very clear and concise answer I realised that I am actually trying to do someting similar to below which I didn't manage to make work using Rui's method but solved using environment setting
OtherStuff <-c(10, NA)
EvaluateThis4 <-quo(mean(c(vector,OtherStuff), na.rm = TRUE))
MyFunc3 <- function(vector, TheFunction){
#uses the captire environment which doesn't contain the object vector
print(get_env(TheFunction))
#Reset the enivronment of TheFunction to the current environment where vector exists
TheFunction<- set_env(TheFunction, get_env())
print(get_env(TheFunction))
print(TheFunction)
TheFunction %>%
eval_tidy
}
MyFunc3(1:4, EvaluateThis4)
The function is evaluated within the current environment not the capture environment. Because there is no object "OtherStuff" within that environment, the parent environments are searched finding "OtherStuff" in the Global environment.
I will try to answer to question 1.
I believe that the best and simpler way to perform this kind of evaluation is to do without any sort of fancy evaluation techniques. To call the function directly usually works. Using your example, try the following.
EvaluateThis4 <- mean # simple
MyFunc4 <- function(vector, TheFunction){
print(TheFunction)
TheFunction(vector) # just call it with the appropriate argument(s)
}
MyFunc4(1:4, EvaluateThis4)
function (x, ...)
UseMethod("mean")
<bytecode: 0x000000000489efb0>
<environment: namespace:base>
[1] 2.5
There are examples of this in base R. For instance approxfun and ecdf both return functions that you can use directly in your code to perform subsequent calculations. That's why I've defined EvaluateThis4 like that.
As for functions that use functions as arguments, there are the optimization ones, and, of course, *apply, byand ave.
As for question 2, I must admit to my complete ignorance.
I'm currently writing a utility to run a series of test on a set of data. I have the data in a data.frame and would like to run N tests on each row of data. (Apologies if my terminology isn't all there: I've been using R for all of five hours).
In my utility, I would like to split the tests into different files and in the main program, load all those tests and run them once for each data.frame row. Here's what I'm doing to source the relevant files:
file.sources = list.files(pattern="validator-.*.R$")
sapply(file.sources,source,verbose = TRUE)
This works well, and if I do this in each matched file:
b <- function(a) {
if(grep("^[[:blank:]]*$", a)) {
return(FALSE)
} else {
return(TRUE)
}
test.functions <- append(test.functions, b)
Then I end up with a test.function list which accurately contain all the test functions to run, but this is now where I get stuck. I've tried variations of sapply() and I think do.call() is also relevant in this. This is my current attempt:
process.entry <- function(a) {
lapply(test.functions,do.call,a)
}
sapply(all.data,process.entry)
My attempt here was to create a function which takes one row of data as its argument, iterates over test.functions and calls do.call() with the function and row of data as arguments. This doesn't seem to work quite, and the error thrown is:
Error in FUN(X[[i]], ...) : second argument must be a list
However, I'm not entirely sure where this error occurs, and quite possibly: there are other, cleaner, ways of doing what I intend!
# I would
process.entry <- function(a) {
# call each function to a
# I think a anonymous function is easier here;
lapply(test.functions, function(f) f(a))
}
# sapply iterate over column of data.frame by default,
# if you want to iterate over rows, use for or apply;
apply(all.data, 1, process.entry)
I have created a test function, called testFunc which expects two arguments.
testFunc<-function(x,y){
length(x)
nrow(y)
}
Now I want to use lappy to apply this function to a list, keeping the y argument fixed.
Consider a test list, testList:
testList<-list(a=c(1,2,3,4,5,5,6),b=c(1,2,4,5,6,7,8))
Can we use lapply to run testFunc on testList$a and testList$b with same value of y?
I tried this call:
lapply(X = testList, FUN = testFunc, someDataFrame)
But I am always getting the length of someDataFrame as the output. Am I missing something obvious.
Change your function to
testFunc<-function(x,y){
return(c(length(x), nrow(y)))
}
By default, a R function returns the last evaluated value
Simplest way, use a named variable:
lapply(X = testList, FUN=testFunc, y=someDataFrame)
I have two lists of lists. humanSplit and ratSplit. humanSplit has element of the form::
> humanSplit[1]
$Fetal_Brain_408_AGTCAA_L001_R1_report.txt
humanGene humanReplicate alignment RNAtype
66 DGKI Fetal_Brain_408_AGTCAA_L001_R1_report.txt 6 reg
68 ARFGEF2 Fetal_Brain_408_AGTCAA_L001_R1_report.txt 5 reg
If you type humanSplit[[1]], it gives the data without name $Fetal_Brain_408_AGTCAA_L001_R1_report.txt
RatSplit is also essentially similar to humanSplit with difference in column order. I want to apply fisher's test to every possible pairing of replicates from humanSplit and ratSplit. Now I defined the following empty vector which I will use to store the informations of my fisher's test
humanReplicate <- vector(mode = 'character', length = 0)
ratReplicate <- vector(mode = 'character', length = 0)
pvalue <- vector(mode = 'numeric', length = 0)
For fisher's test between two replicates of humanSplit and ratSplit, I define the following function. In the function I use `geneList' which is a data.frame made by reading a file and has form:
> head(geneList)
human rat
1 5S_rRNA 5S_rRNA
2 5S_rRNA 5S_rRNA
Now here is the main function, where I use a function getGenetype which I already defined in other part of the code. Also x and y are integers :
fishertest <-function(x,y) {
ratReplicateName <- names(ratSplit[x])
humanReplicateName <- names(humanSplit[y])
## merging above two based on the one-to-one gene mapping as in geneList
## defined above.
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
mergedRatData <- merge(geneList, ratSplit[[x]], by.x = "rat", by.y = "ratGene")
## [here i do other manipulation with using already defined function
## getGenetype that is defined outside of this function and make things
## necessary to define following contingency table]
contingencyTable <- matrix(c(HnRn,HnRy,HyRn,HyRy), nrow = 2)
fisherTest <- fisher.test(contingencyTable)
humanReplicate <- c(humanReplicate,humanReplicateName )
ratReplicate <- c(ratReplicate,ratReplicateName )
pvalue <- c(pvalue , fisherTest$p)
}
After doing all this I do the make matrix eg to use in apply. Here I am basically trying to do something similar to double for loop and then using fisher
eg <- expand.grid(i = 1:length(ratSplit),j = 1:length(humanSplit))
junk = apply(eg, 1, fishertest(eg$i,eg$j))
Now the problem is, when I try to run, it gives the following error when it tries to use function fishertest in apply
Error in humanSplit[[y]] : recursive indexing failed at level 3
Rstudio points out problem in following line:
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
Ultimately, I want to do the following:
result <- data.frame(humanReplicate,ratReplicate, pvalue ,alternative, Conf.int1, Conf.int2, oddratio)
I am struggling with these questions:
In defining fishertest function, how should I pass ratSplit and humanSplit and already defined function getGenetype?
And how I should use apply here?
Any help would be much appreciated.
Up front: read ?apply. Additionally, the first three hits on google when searching for "R apply tutorial" are helpful snippets: one, two, and three.
Errors in fishertest()
The error message itself has nothing to do with apply. The reason it got as far as it did is because the arguments you provided actually resolved. Try to do eg$i by itself, and you'll see that it is returning a vector: the corresponding column in the eg data.frame. You are passing this vector as an index in the i argument. The primary reason your function erred out is because double-bracket indexing ([[) only works with singles, not vectors of length greater than 1. This is a great example of where production/deployed functions would need type-checking to ensure that each argument is a numeric of length 1; often not required for quick code but would have caught this mistake. Had it not been for the [[ limit, your function may have returned incorrect results. (I've been bitten by that many times!)
BTW: your code is also incorrect in its scoped access to pvalue, et al. If you make your function return just the numbers you need and the aggregate it outside of the function, your life will simplify. (pvalue <- c(pvalue, ...) will find pvalue assigned outside the function but will not update it as you want. You are defeating one purpose of writing this into a function. When thinking about writing this function, try to answer only this question: "how do I compare a single rat record with a single human record?" Only after that works correctly and simply without having to overwrite variables in the parent environment should you try to answer the question "how do I apply this function to all pairs and aggregate it?" Try very hard to have your function not change anything outside of its own environment.
Errors in apply()
Had your function worked properly despite these errors, you would have received the following error from apply:
apply(eg, 1, fishertest(eg$i, eg$j))
## Error in match.fun(FUN) :
## 'fishertest(eg$i, eg$j)' is not a function, character or symbol
When you call apply in this sense, it it parsing the third argument and, in this example, evaluates it. Since it is simply a call to fishertest(eg$i, eg$j) which is intended to return a data.frame row (inferred from your previous question), it resolves to such, and apply then sees something akin to:
apply(eg, 1, data.frame(...))
Now that you see that apply is being handed a data.frame and not a function.
The third argument (FUN) needs to be a function itself that takes as its first argument a vector containing the elements of the row (1) or column (2) of the matrix/data.frame. As an example, consider the following contrived example:
eg <- data.frame(aa = 1:5, bb = 11:15)
apply(eg, 1, mean)
## [1] 6 7 8 9 10
# similar to your use, will not work; this error comes from mean not getting
# any arguments, your error above is because
apply(eg, 1, mean())
## Error in mean.default() : argument "x" is missing, with no default
Realize that mean is a function itself, not the return value from a function (there is more to it, but this definition works). Because we're iterating over the rows of eg (because of the 1), the first iteration takes the first row and calls mean(c(1, 11)), which returns 6. The equivalent of your code here is mean()(c(1, 11)) will fail for a couple of reasons: (1) because mean requires an argument and is not getting, and (2) regardless, it does not return a function itself (in a "functional programming" paradigm, easy in R but uncommon for most programmers).
In the example here, mean will accept a single argument which is typically a vector of numerics. In your case, your function fishertest requires two arguments (templated by my previous answer to your question), which does not work. You have two options here:
Change your fishertest function to accept a single vector as an argument and parse the index numbers from it. Bothing of the following options do this:
fishertest <- function(v) {
x <- v[1]
y <- v[2]
ratReplicateName <- names(ratSplit[x])
## ...
}
or
fishertest <- function(x, y) {
if (missing(y)) {
y <- x[2]
x <- x[1]
}
ratReplicateName <- names(ratSplit[x])
## ...
}
The second version allows you to continue using the manual form of fishertest(1, 57) while also allowing you to do apply(eg, 1, fishertest) verbatim. Very readable, IMHO. (Better error checking and reporting can be used here, I'm just providing a MWE.)
Write an anonymous function to take the vector and split it up appropriately. This anonymous function could look something like function(ii) fishertest(ii[1], ii[2]). This is typically how it is done for functions that either do not transform as easily as in #1 above, or for functions you cannot or do not want to modify. You can either assign this intermediary function to a variable (which makes it no longer anonymous, figure that) and pass that intermediary to apply, or just pass it directly to apply, ala:
.func <- function(ii) fishertest(ii[1], ii[2])
apply(eg, 1, .func)
## equivalently
apply(eg, 1, function(ii) fishertest(ii[1], ii[2]))
There are two reasons why many people opt to name the function: (1) if the function is used multiple times, better to define once and reuse; (2) it makes the apply line easier to read than if it contained a complex multi-line function definition.
As a side note, there are some gotchas with using apply and family that, if you don't understand, will be confusing. Not the least of which is that when your function returns vectors, the matrix returned from apply will need to be transposed (with t()), after which you'll still need to rbind or otherwise aggregrate.
This is one area where using ddply may provide a more readable solution. There are several tutorials showing it off. For a quick intro, read this; for a more in depth discussion on the bigger picture in which ddply plays a part, read Hadley's Split, Apply, Combine Strategy for Data Analysis paper from JSS.
In R, the idiomatic way to call another function without evaluating the parameters you give it is apparently as follows:
Call <- match.call(expand.dots = TRUE)
# Modify parameters here as needed and set unneeded ones to NULL.
Call[[1L]] <- as.name("name.of.function.to.be.called.here")
eval.parent(Call)
However, when I put a namespaced name (e.g. utils::write.csv) in the as.name() call, I get an error:
"could not find function "utils::write.csv"
What is the proper way of using this R idiom to call a namespaced function?
Here is a solution using do.call(), which both constructs and evaluates the function call.
Like the approach you started with, this one uses the fact that R calls are lists in which: (a) the first element is the name of a function; and (b) all following elements are arguments to that function.
j <- function(x, file) {
Call <- match.call(expand.dots = TRUE)
arglist <- as.list(Call)[-1]
do.call(utils::write.csv, arglist)
}
dat <- data.frame(x=1:10, y=rnorm(10))
j(dat, file="outfilename.csv")
EDIT: FWIW, here's an example from plot.formula in base R, which uses a construct similar to the one above:
{
m <- match.call(expand.dots = FALSE)
eframe <- parent.frame()
. . .
. . .
m <- as.list(m)
m[[1L]] <- stats::model.frame.default
m <- as.call(c(m, list(na.action = NULL)))
mf <- eval(m, eframe)
. . .
. . .
}
The function uses the do.call() construct later on. Going a bit deeper into the weeds, my reading is that in the snippet shown here, it instead uses several steps mostly because of the need to add na.action=NULL to the list of arguments.
In any case, it looks like the do.call() options is as close to canonical as could be desired.
As #Josh O'Brien answered, do.call is much more straight forward to use.
The first argument to do.call can be either a function name or an actual function.
The function name can NOT contain the namespace qualifier. The :: part is actually a function that takes the names on both sides and find the corresponding function, so it must be evaluated separately to work.
So, with do.call, you need something like:
# ...Stuff from Josh's answer goes here
# And then:
do.call(utils::write.csv, arglist)
And with eval:
Call <- match.call(expand.dots = TRUE)
# Modify parameters here as needed and set unneeded ones to NULL.
Call[[1L]] <- utils::write.csv
eval.parent(Call)
Note the lack of quotes around the function name. That evaluates to the function closure.
Another way of getting the function from a namespace-qualified name:
eval(parse(text="utils::write.csv"))
Again, the :: function is called that correctly finds the function.
Another more manual way is to extract the namespace name & function name and then do the lookup yourself:
x <- strsplit("utils::write.csv", "::")[[1]]
get(x[2], asNamespace(x[1]))