anova.cca cannot find object within user defined function - r

I want to execute the same rda analysis sequence (fitting a model, testing significance of the model, the axis, and the term, plotting the data) on subsets of the same datasets. Therefore I wrote a function. The problem now is that the call to the anova.cca function does not work well within a function when I want to test the axis. It cannot find the Y.sub dataset
Error in eval(expr, envir, enclos) : object 'RV.sub' not found
Minimal working example:
library(vegan)
data(dune)
data(dune.env)
rda.subsetfunc <- function(RV, Y){
#RV.sub <- subset(RV, !Y$Use%in%c("BF"))
#Y.sub <- subset(Y, !Y$Use%in%c("BF"))
RV.sub <- RV; Y.sub <- Y
rda.mod <- rda(RV.sub ~ Manure, Y.sub)
axis.test <- anova(rda.mod, by = "axis")
return(list(rda.mod, axis.test))
}
rda.subsetfunc(RV = dune, Y = dune.env)
I found some other related questions like here but that seems be a lot more complicated than what I am doing. I tried to implement the do.call approach as is mentioned here but I couldn't get it to work. If it is really not possible to do this without really digging deep into functions I will find a way of programming around it. But to me, it feels like I'm trying to do something that makes total sense. So it is probably more likely that I am doing something wrong than that I am doing something impossible.

This a scoping issue in anova.cca(..., by="axis") which should find items from several different environments when it is updating the formula (I won't go into technical details). You really cannot embed the function to analyse the significances of axes. This is a known issue. We have solved that in the development version of vegan. The re-designed function in https://github.com/vegandevs/vegan seemed to work with this example. All ordination and significance functions are radically changed there, and they are not yet completely finished. We plan to release them in vegan 2.5-0 in the last quarter of 2017, but they are not finished yet.
The problem is that anova.cca(..., by = "axis") must find items that it builds within the function, and in addition it can find items that were available when the original model was built, but it cannot find items that you generate in functions that embed the function. You must circumvent this by making your embedding function to write its objects somewhere that these can be found. The easiest (but dirty) solution is to write them into parent environment with <<-. The following version of your function adds this <<- and seems to work in vegan 2.4-3
rda.subsetfunc <- function(RV, Y){
RV.sub <<- RV; Y.sub <- Y
rda.mod <- rda(RV.sub ~ Manure, Y.sub)
axis.test <- anova(rda.mod, by = "axis")
list(rda.mod, axis.test)
}
rda.subsetfunc(RV = dune, Y = dune.env)

Related

To find valid argument for a function in R's help document (meaning of ...)

This question may seem basic but this has bothered me quite a while. The help document for many functions has ... as one of its argument, but somehow I can never get my head around this ... thing.
For example, suppose I have created a model say model_xgboost and want to make a prediction based on a dataset say data_tbl using the predict() function, and I want to know the syntax. So I look at its help document which says:
?predict
**Usage**
predict (object, ...)
**Arguments**
object a model object for which prediction is desired.
... additional arguments affecting the predictions produced.
To me the syntax and its examples didn't really enlighten me as I still have no idea what the valid syntax/arguments are for the function. In an online course it uses something like below, which works:
data_tbl %>%
predict(model_xgboost, new_data = .)
However, looking across the help doc I cannot find the new_data argument. Instead it mentioned newdata argument in its Details section, which actually didn't work if I displace the new_data = . with newdata = .:
Error in `check_pred_type_dots()`:
! Did you mean to use `new_data` instead of `newdata`?
My questions are:
How do I know exactly what argument(s) / syntax can be used for a function like this?
Why new_data but not newdata in this example?
I might be missing something here, but is there any reference/resource about how to use/interpret a help document, in plain English? (a lot of document, including R help file seem just give a brief sentence like "additional arguments affecting the predictions produced" etc)
#CarlWitthoft's answer is good, I want to add a little bit of nuance about this particular function. The reason the help page for ?predict is so vague is an unfortunate consequence of the fact that predict() is a generic method in R: that is, it's a function that can be applied to a variety of different object types, using slightly different (but appropriate) methods in each case. As such, the ?predict help page only lists object (which is required as the first argument in all methods) and ..., because different predict methods could take very different arguments/options.
If you call methods("predict") in a clean R session (before loading any additional packages) you'll see a list of 16 methods that base R knows about. After loading library("tidymodels"), the list expands to 69 methods. I don't know what class your object is (class("model_xgboost")), but assuming that it's of class model_fit, we look at ?predict.model_fit to see
predict(object, new_data, type = NULL, opts = list(), ...)
This tells us that we need to call the new data new_data (and, reading a bit farther down, that it needs to be "A rectangular data object, such as a data frame")
The help page for predict says
Most prediction methods which are similar to those for linear
models have an argument ‘newdata’ specifying the first place to
look for explanatory variables to be used for prediction
(emphasis added). I don't know why the parsnip authors (the predict.model_fit method comes from the parsnip package) decided to use new_data rather than newdata, presumably in line with the tidyverse style guide, which says
Use underscores (_) (so called snake case) to separate words within a name.
In my opinion this might have been a mistake, but you can see that the parsnip/tidymodels authors have realized that people are likely to make this mistake and added an informative warning, as shown in your example and noted e.g. here
Among other things, the existence of ... in a function definition means you can enter any arguments (values, functions, etc) you want to. There are some cases where the main function does not even use the ... but passes them to functions called inside the main function. Simple example:
foo <- function(x,...){
y <- x^2
plot(x,y,...)
}
I know of functions which accept a function as an input argument, at which point the items to include via ... are specific to the selected input function name.

Adapt mle2 to use an unnamed parameter vector & addition algurments

Good evening,
I have a quick question about mle2() syntax. I have an optim() routine that optimizes a log-Likelihood function of the following form (and this runs thousands of times, so i don't want to change much):
ObjFun <- function(p, X, y, ModelFunction, CostFunction)
where p is a vector of 1-8 parameters, X is the input matrix, y is the response/independent variable vector, ModelFunction is a function specifying the shape of a model, and CostFunction specifies the cost/loss function the model likelihood should incorporate during the optimization. The code works fine with optim() or maxLik [maxLik] wit the following code:
maxLik(ObjFun, method="NM", start=c(1,2,3,4,5),
X=conc, y=y, ModelFunction=Model1, CostFunction=GCost)
constrOptim(init.par, ObjFun, ui = Ui, ci = Ci, method = "Nelder-Mead",
control = control1, X=X, y=y, ModelFunction= get(Model1),
CostFunction= get(GCost))
##i'm obviously using constrained optimization in my actual problem.
But I can't get it to work easily with mle() or mle2(). I just want to run it in mle2 to compare the likelihood profile with my own profiling function. Before i go digging through the mle2() code, does anyone know if it's my unnamed parameter vector or the additional arguments that make the function crash? I thought it was the former, but i am confused because the error it's giving me is:
mle2(ObjFun, method="Nelder-Mead"", start=c(1,2,3,4,5),
X=X, y=y, ModelFunction=Model1, CostFunction=GCost)
"minuslogl() : argument "ModelFunction" is missing, with no default"
and that argument is clearly specified. I couldn't really find any examples with additional parameters in the vignettes.
PS:
I would have just commented on this post as it's obviously related:
Creating function arguments from a named list (with an application to stats4::mle)
But I don't have enough points to comment.
UPDATE:
mle2() has options vecpar and parnames options that should allow one to specify "for compatibility with Optim", according to Ben's vignette. I simplified the objective function (the log-likelihood) to include hard-coded loss and model examples. The result looks like this:
mod2 <- mle2(ObjFun2, method="Nelder-Mead", start=list(1,2,3,4,5),
vecpar=T, parnames=c("A", "B", "C", "D", "E"))
However this still doesn't work. I have a hard time troubleshooting it because i don't know how to refer to the parameters inside the objective function after the call from mle2. If i include debugging commands such as print(p[2]) inside the ObjFun2, it returns NULL. So the parameter is no longer in a vector form. However print(A) forces the function to crush. Again, I can't find any working examples of this online, so maybe I'm missing something very simple.
I can't use the parameters argument as Ben suggested in the above link because my models are not linear.
I attempted to look inside the mle2() but got stuck on a call to calc_mle2_function().

For Loop with MCMCglmm Regression

I've looked at some of the answers for this question already, there were only two I found helpful and I still cannot get my loop to execute. I am struggling to use a fixed formula for the MCMCglmm package. I have a lot of models to test with this package, and I would like to make a loop to make the work easier. Each time I run MCMCglmm my intention is to do so with a "fixed" formula, and through each iteration of the loop I want to change one of the variables and input a modified version of the "fixed" formula. Here is my code thus far:
for (i in 5:10){
fixed <- as.formula(paste(as$area_pva ~ as$apva_1yr + as$year + as.numeric(unlist(as[i]))))
print(fixed)
model <- MCMCglmm(fixed=fixed,
rcov=~units, family="gaussian",
data=as,start=NULL, prior=NULL, random=NULL, tune=NULL,
pedigree=NULL, nodes=NULL, scale=FALSE, nitt=30000,
thin=30, burnin=1000, pr=TRUE, pl=TRUE, verbose=TRUE,
DIC=TRUE, singular.ok=FALSE, saveX=TRUE, saveZ=TRUE,
saveXL=TRUE, slice=FALSE, ginverse=NULL)
summary(model)
}
Please, if you can help me make this loop execute properly I would appreciate it.
Never mind, I've got the answer. I needed to make the whole formula a series of strings, like this:
fixed <- as.formula(paste("as$area_pva~as$apva_1yr+as$year+", colnames(as)[i], sep=""))
It works perfectly now.

returning functions in R - when does the binding occur?

As in other functional languages, returning a function is a common case in R. for example, after training a model you'd like to return a "predictor" object, which is essentially a function, that given new data, returns predictions. There are other cases when this is useful, of course.
My question is when does the binding (e.g. evaluation) of values within the returned function occur.
As a simple example, suppose I want to have a list of three functions, each is slightly different based on a parameter whose value I set at the time of the creation of the function. Here is a simple code for this:
function.list = list()
for (i in 1:3) function.list[[i]] = function(x) x+i
So now I have three functions. Ideally, the first one returns x+1, the second computes x+2 and the third computes x+3
so I would expect:
function.list[[1]] (3) = 4
function.list[[2]] (3) = 5
etc.
Unfortunately, this doesn't happen and all the functions in the list above compute the same x+3. my question is why? why does the binding of the value of i is so late, and hence the same for all the functions in the list? How can I work around this?
EDIT:
rawr's link to a similar question was insightful, and I thought it solved the problem. Here is the link:
Explain a lazy evaluation quirk
however, I checked the code I gave above, with the fix suggested there, and it still doesn't work. Certainly, I miss something very basic here. can anyone tell me what is it? here is the "fixed" code (that still doesn't work)
function.list = list()
for (i in 1:3) { force(i); function.list[[i]] = function(x) x+i}
Still function.list[[1]] (3) gives 6 and not 4 as expected.
I also tried the following (e.g. putting the force() inside the function)
function.list = list()
for (i in 1:3) function.list[[i]] = function(x) {force(i);x+i}
what's going on?
Here's a solution with a for loop, using R 3.1:
> makeadd=function(i){force(i);function(x){x+i}}
> for (i in 1:3) { function.list[[i]] = makeadd(i)}
> rm(i) # not necessary but causes errors if we're actually using that `i`
> function.list[[1]](1)
[1] 2
> function.list[[2]](1)
[1] 3
The makeadd function creates the adding function in a context with a local i, which is why this works. It would be interesting to know if this works without the force in R 3.2. I always use the force, Luke....

Scoping and functions in R 2.11.1 : What's going wrong?

This question comes from a range of other questions that all deal with essentially the same problem. For some strange reason, using a function within another function sometimes fails in the sense that variables defined within the local environment of the first function are not found back in the second function.
The classical pattern in pseudo-code :
ff <- function(x){
y <- some_value
some_function(y)
}
ff(x)
Error in eval(expr, envir, enclos) :
object 'y' not found
First I thought it had something to do with S4 methods and the scoping in there, but it also happens with other functions. I've had some interaction with the R development team, but all they did was direct me to the bug report site (which is not the most inviting one, I have to say). I never got any feedback.
As the problem keeps arising, I wonder if there is a logic explanation for it. Is it a common mistake made in all these cases, and if so, which one? Or is it really a bug?
Some of those questions :
Using functions and environments
R (statistical) scoping error using transformBy(), part of the doBy package.
How to use acast (reshape2) within a function in R?
Why can't I pass a dataset to a function?
Values not being copied to the next local environment
PS : I know the R-devel list, in case you wondered...
R has both lexical and dynamic scope. Lexical scope works automatically, but dynamic scope must be implemented manually, and requires careful book-keeping. Only functions used interactively for data analysis need dynamic scope, so most authors (like me!) don't learn how to do it correctly.
See also: the standard non-standard evaluation rules.
There are undoubtedly bugs in R, but a lot of the issues that people have been having are quite often errors in the implementation of some_function, not R itself. R has scoping rules ( see http://cran.r-project.org/doc/manuals/R-intro.html#Scope) which when combined with lazy evaluation of function arguments and the ability to eval arguments in other scopes are extremely powerful but which also often lead to subtle errors.
As Dirk mentioned in his answer, there isn't actually a problem with the code that you posted. In the links you posted in the question, there seems to be a common theme: some_function contains code that messes about with environments in some way. This messing is either explicit, using new.env and with or implicitly, using a data argument, that probably has a line like
y <- eval(substitute(y), data)
The moral of the story is twofold. Firstly, try to avoid explicitly manipulating environments, unless you are really sure that you know what you are doing. And secondly, if a function has a data argument then put all the variables that you need the function to use inside that data frame.
Well there is no problem in what you posted:
/tmp$ cat joris.r
#!/usr/bin/r -t
some_function <- function(y) y^2
ff <- function(x){
y <- 4
some_function(y) # so we expect 16
}
print(ff(3)) # 3 is ignored
$ ./joris.r
[1] 16
/tmp$
Could you restate and postan actual bug or misfeature?

Resources