Good evening,
I have a quick question about mle2() syntax. I have an optim() routine that optimizes a log-Likelihood function of the following form (and this runs thousands of times, so i don't want to change much):
ObjFun <- function(p, X, y, ModelFunction, CostFunction)
where p is a vector of 1-8 parameters, X is the input matrix, y is the response/independent variable vector, ModelFunction is a function specifying the shape of a model, and CostFunction specifies the cost/loss function the model likelihood should incorporate during the optimization. The code works fine with optim() or maxLik [maxLik] wit the following code:
maxLik(ObjFun, method="NM", start=c(1,2,3,4,5),
X=conc, y=y, ModelFunction=Model1, CostFunction=GCost)
constrOptim(init.par, ObjFun, ui = Ui, ci = Ci, method = "Nelder-Mead",
control = control1, X=X, y=y, ModelFunction= get(Model1),
CostFunction= get(GCost))
##i'm obviously using constrained optimization in my actual problem.
But I can't get it to work easily with mle() or mle2(). I just want to run it in mle2 to compare the likelihood profile with my own profiling function. Before i go digging through the mle2() code, does anyone know if it's my unnamed parameter vector or the additional arguments that make the function crash? I thought it was the former, but i am confused because the error it's giving me is:
mle2(ObjFun, method="Nelder-Mead"", start=c(1,2,3,4,5),
X=X, y=y, ModelFunction=Model1, CostFunction=GCost)
"minuslogl() : argument "ModelFunction" is missing, with no default"
and that argument is clearly specified. I couldn't really find any examples with additional parameters in the vignettes.
PS:
I would have just commented on this post as it's obviously related:
Creating function arguments from a named list (with an application to stats4::mle)
But I don't have enough points to comment.
UPDATE:
mle2() has options vecpar and parnames options that should allow one to specify "for compatibility with Optim", according to Ben's vignette. I simplified the objective function (the log-likelihood) to include hard-coded loss and model examples. The result looks like this:
mod2 <- mle2(ObjFun2, method="Nelder-Mead", start=list(1,2,3,4,5),
vecpar=T, parnames=c("A", "B", "C", "D", "E"))
However this still doesn't work. I have a hard time troubleshooting it because i don't know how to refer to the parameters inside the objective function after the call from mle2. If i include debugging commands such as print(p[2]) inside the ObjFun2, it returns NULL. So the parameter is no longer in a vector form. However print(A) forces the function to crush. Again, I can't find any working examples of this online, so maybe I'm missing something very simple.
I can't use the parameters argument as Ben suggested in the above link because my models are not linear.
I attempted to look inside the mle2() but got stuck on a call to calc_mle2_function().
Related
This question may seem basic but this has bothered me quite a while. The help document for many functions has ... as one of its argument, but somehow I can never get my head around this ... thing.
For example, suppose I have created a model say model_xgboost and want to make a prediction based on a dataset say data_tbl using the predict() function, and I want to know the syntax. So I look at its help document which says:
?predict
**Usage**
predict (object, ...)
**Arguments**
object a model object for which prediction is desired.
... additional arguments affecting the predictions produced.
To me the syntax and its examples didn't really enlighten me as I still have no idea what the valid syntax/arguments are for the function. In an online course it uses something like below, which works:
data_tbl %>%
predict(model_xgboost, new_data = .)
However, looking across the help doc I cannot find the new_data argument. Instead it mentioned newdata argument in its Details section, which actually didn't work if I displace the new_data = . with newdata = .:
Error in `check_pred_type_dots()`:
! Did you mean to use `new_data` instead of `newdata`?
My questions are:
How do I know exactly what argument(s) / syntax can be used for a function like this?
Why new_data but not newdata in this example?
I might be missing something here, but is there any reference/resource about how to use/interpret a help document, in plain English? (a lot of document, including R help file seem just give a brief sentence like "additional arguments affecting the predictions produced" etc)
#CarlWitthoft's answer is good, I want to add a little bit of nuance about this particular function. The reason the help page for ?predict is so vague is an unfortunate consequence of the fact that predict() is a generic method in R: that is, it's a function that can be applied to a variety of different object types, using slightly different (but appropriate) methods in each case. As such, the ?predict help page only lists object (which is required as the first argument in all methods) and ..., because different predict methods could take very different arguments/options.
If you call methods("predict") in a clean R session (before loading any additional packages) you'll see a list of 16 methods that base R knows about. After loading library("tidymodels"), the list expands to 69 methods. I don't know what class your object is (class("model_xgboost")), but assuming that it's of class model_fit, we look at ?predict.model_fit to see
predict(object, new_data, type = NULL, opts = list(), ...)
This tells us that we need to call the new data new_data (and, reading a bit farther down, that it needs to be "A rectangular data object, such as a data frame")
The help page for predict says
Most prediction methods which are similar to those for linear
models have an argument ‘newdata’ specifying the first place to
look for explanatory variables to be used for prediction
(emphasis added). I don't know why the parsnip authors (the predict.model_fit method comes from the parsnip package) decided to use new_data rather than newdata, presumably in line with the tidyverse style guide, which says
Use underscores (_) (so called snake case) to separate words within a name.
In my opinion this might have been a mistake, but you can see that the parsnip/tidymodels authors have realized that people are likely to make this mistake and added an informative warning, as shown in your example and noted e.g. here
Among other things, the existence of ... in a function definition means you can enter any arguments (values, functions, etc) you want to. There are some cases where the main function does not even use the ... but passes them to functions called inside the main function. Simple example:
foo <- function(x,...){
y <- x^2
plot(x,y,...)
}
I know of functions which accept a function as an input argument, at which point the items to include via ... are specific to the selected input function name.
I'm running some Surv() functions, and one thing I do not like, or understand, is why this function does not take a "data=" argument. This is annoying because I want to perform the same Surv() function on the same data frame but filtered by different criteria each time.
So for example, my data frame is called "ikt" and I want to filter by "donor_type2=='LD'" and also use a strata variable "plan 2". I tried the following but it didn't work:
library(survival)
library(dplyr)
ikt<-data.frame(organ_yrs=(seq(1,20)),
organ_status=rep(c(0,0,1,1),each=5),
plan2=rep(c('A','B','A','B'),each=5),
donor_type2=rep(c('LD','DD'),each=10) )
organ_surv_func<-function(data,criteria,strata) {
data2<-filter(data,criteria)
Surv(data2$organ_yrs,data2$organ_status)~data2$strata
}
organ_surv_func(ikt,donor_type2=='LD',plan2)
Error in filter_impl(.data, quo) : object 'donor_type2' not found
I'm coming from a SAS background so that's probably why I'm thinking this should work and it doesn't...
I looked up something about sapply(), but I don't think that works when the function doesn't have the data= option.
Also the reason I need the Surv() object and not just survfit(Surv()) (which would let me use data=) is because I'm also using survdiff() for log-rank tests, which takes in the Surv() object as it's main argument:
lr<-function (surv) {
round(1-pchisq(survdiff(surv)$chisq,length(survfit(surv)$strata)-1),3)
}
Thanks for any help you can provide.
I'm writing this "answer" to caution you against proceeding down the path you seem to be following. The Surv function is really intended to be used as the LHS of a formula defined within one of the survival package functions. You should avoid using constructions like:
Surv(data2$organ_yrs,data2$organ_status)~data2$strata
For one thing it's needlessly verbose, but more importantly, it will prevent the use of predict when it comes time to match up names to formals. The survdiff and the other survival functions all have both a "data" argument as well as a "subset" argument. The subset function should allow you to avoid using filter.
organ_surv_func<-function(data, covar) {
form = as.formula(substitute( Surv(organ_yrs, organ_status) ~ covar, list(covar=covar) ) )
survdiff(form, data=data)
}
# although I think running surdiff in a for-loop might be easier,
# as it would involve fewer tricky language constructs
organ_surv_func( subset(ikt, (donor_type2=='LD')), covar=quote(plan2))
If you assign the output of survfit to a named variable, you will be able to more economically access chisq and strata:
myfit <- organ_surv_func( subset(ikt, (donor_type2=='LD')), covar=quote(plan2))
my.lr.test<-function (myfit) {
round(1-pchisq(myfit$chisq, length(myfit$strata)-1), 3)
}
my.lr.test(myfit) # not going to be useful with that dataset.
I want to execute the same rda analysis sequence (fitting a model, testing significance of the model, the axis, and the term, plotting the data) on subsets of the same datasets. Therefore I wrote a function. The problem now is that the call to the anova.cca function does not work well within a function when I want to test the axis. It cannot find the Y.sub dataset
Error in eval(expr, envir, enclos) : object 'RV.sub' not found
Minimal working example:
library(vegan)
data(dune)
data(dune.env)
rda.subsetfunc <- function(RV, Y){
#RV.sub <- subset(RV, !Y$Use%in%c("BF"))
#Y.sub <- subset(Y, !Y$Use%in%c("BF"))
RV.sub <- RV; Y.sub <- Y
rda.mod <- rda(RV.sub ~ Manure, Y.sub)
axis.test <- anova(rda.mod, by = "axis")
return(list(rda.mod, axis.test))
}
rda.subsetfunc(RV = dune, Y = dune.env)
I found some other related questions like here but that seems be a lot more complicated than what I am doing. I tried to implement the do.call approach as is mentioned here but I couldn't get it to work. If it is really not possible to do this without really digging deep into functions I will find a way of programming around it. But to me, it feels like I'm trying to do something that makes total sense. So it is probably more likely that I am doing something wrong than that I am doing something impossible.
This a scoping issue in anova.cca(..., by="axis") which should find items from several different environments when it is updating the formula (I won't go into technical details). You really cannot embed the function to analyse the significances of axes. This is a known issue. We have solved that in the development version of vegan. The re-designed function in https://github.com/vegandevs/vegan seemed to work with this example. All ordination and significance functions are radically changed there, and they are not yet completely finished. We plan to release them in vegan 2.5-0 in the last quarter of 2017, but they are not finished yet.
The problem is that anova.cca(..., by = "axis") must find items that it builds within the function, and in addition it can find items that were available when the original model was built, but it cannot find items that you generate in functions that embed the function. You must circumvent this by making your embedding function to write its objects somewhere that these can be found. The easiest (but dirty) solution is to write them into parent environment with <<-. The following version of your function adds this <<- and seems to work in vegan 2.4-3
rda.subsetfunc <- function(RV, Y){
RV.sub <<- RV; Y.sub <- Y
rda.mod <- rda(RV.sub ~ Manure, Y.sub)
axis.test <- anova(rda.mod, by = "axis")
list(rda.mod, axis.test)
}
rda.subsetfunc(RV = dune, Y = dune.env)
I get the following error when I try to improve the intensity estimate of a kppm object, if I include the argument vcov = TRUE in the function improve.kppm:
Error in improve.kppm(object, type = type, rmax = rmax, dimyx = dimyx, :
object 'gminus1' not found
If I don't include the argument, the function runs but I cannot use the summary() function on the improved kppm object. I get the same error message as above. The same thing happens when I use vcov().
The call that I used to create my kppm object was (number of covariates has been reduced for clarity):
a05 = kppm(a2005nests ~ nest + nest2, cluster = "Thomas", covariates = fitcov(2))
where fitcov(2) is a function that returns a list of im objects. Could this be the issue? I've noticed that some spatstat functions on kppm objects throw errors if I used this function in the original kppm call. Usually it says something along the lines of Error: Covariates ‘nest’ and ‘nest2’ were not found.
There is a bug in the logical flow of improve.kppm: if vcov=TRUE and type != "quasi", the variable gminus1 is not defined. We will fix this in the development version of spatstat as soon as possible.
Did you perhaps select type="clik1" or type="wclik1" in the original call to kppm?
For the moment, you should be able to avoid the bug by omitting the argument type, or explicitly selecting type="quasi", in calls to kppm and improve.kppm.
The second problem, in which kppm fails to find the covariates, appears to be a scoping problem, but I can't reproduce it here. It would help if you can supply a minimal working example.
Here is a snippet of R script doing beta regression on data "GasolineYield":
library("betareg")
data("GasolineYield", package = "betareg")
gy_logit <- betareg(yield ~ batch + temp, data = GasolineYield)
gy_logit4 <- update(gy_logit, subset = -4)
The 4th line magically deletes the 4th observation and update the fit automatically, but I don't quite understand the why this parameter works in the update function here, because I tried to look up the documentation by ?update, but couldn't find there's such parameter.
I'm curious about how to find right documentation in this case, because maybe I want to add some new observation instead of removing it. Any help?
subset in betareg works the same as subset in lm, therefore you can read lm documentation.
From the help file you can find:
subset an optional vector specifying a subset of observations to be used in the fitting process.
Hence by setting select=-4 you are lefting out the fourth row in the estimation.
update() contains the ... parameter, which means any parameters that are not matched in your call to update() are passed on to the function that does the estimation. In this case, that is betareg(), which does have the subset argument.
This type of thing is very common in R. Many higher-level function that call other user-visible functions will have the three dot parameter and pass any unmatched parameters on, so you have to search all the user-visible functions that get called in order to know all possible options.
You can check out the help file for the top level function (update() in this case) to get an idea of which functions get the leftover parameters.