I'm using summary() on output of mle(stats4) function, its output belongs to class mle. I would like to find out how summary() estimates standard deviation of coefficient returned by mle(stats4), but I do not see summary.mle in list printed by methods(summary), why can't I find summary.mle() function ?
(I guess the proper function is summary.mlm(), but I'm not sure that and don't know why it would be mlm, instead of mle)
It's actually what summary.mle would be if it were an S3 method. S3 methods get created and then dispatched using the generic_function_name.class_of_first_argument mechanism whereas S4 methods are dispatched on the basis of their argument "signature" which allows consideration of second and later arguments. This is how to get showMethods to display the code that is called when an S4-method is called. This is an instance where only the first argument is used as the signature. You can choose any of the object signatures that appear in the abbreviated output to specify the classes-agument, and it is the includeDefs flag that prompts display of the code:
showMethods("summary",classes="mle", includeDefs=TRUE)
#---(output to console)----
Function: summary (package base)
object="mle"
function (object, ...)
{
cmat <- cbind(Estimate = object#coef, `Std. Error` = sqrt(diag(object#vcov)))
m2logL <- 2 * object#min
new("summary.mle", call = object#call, coef = cmat, m2logL = m2logL)
}
As shown in
>library(stats4)
>showMethods("summary")
Function: summary (package base)
object="ANY"
object="mle"
The summary is interpreted in the S4 way. I don't know how to check the code in R directly, so I search the source of stats4 directly for you.
In stats4/R/mle.R, there is:
setMethod("summary", "mle", function(object, ...){
cmat <- cbind(Estimate = object#coef,
`Std. Error` = sqrt(diag(object#vcov)))
m2logL <- 2*object#min
new("summary.mle", call = object#call, coef = cmat, m2logL = m2logL)
})
So it creates a S4 object summary.mle. And I guess you could trace the code by yourself now.
Related
I am trying to create a new model for the parsnip package from an existing modeling function foo.
I have followed the tutorial in building new models in parsnip and followed the README on Github, but I still cannot figure out some things.
How does the fit function in parsnip know how to assign its input data (e.g. a matrix) to my idiosyncratic function call?
Imagine if there was an idiosyncratic model function foo where the conventional roles of x and y arguments were reversed: i.e. foo(x,y) where x should be an outcome vector and y should be a predictor matrix, bizarrely.
For example: suppose a is a matrix of predictors and b is a vector of outcomes. Then I call fit_xy(object=my_model, x=a, y=b). Internally, how does fit_xy() know to call foo(x=y,y=x) ?
The function to validate the input is check_final_param, which require that each argument e.g. have to be named. That is why order is not important.
https://github.com/tidymodels/parsnip/blob/f7ba069671684f61af0ca1eadb1927fedec8a9c6/R/misc.R#L235
The README file linked by you pointing out:
"To create the model fit call, the protect arguments are populated with the appropriate objects (usually from the data set), and rlang::call2 is used to create a call that can be executed. "
Example of randomForest which using ntree instead of default trees argument.
They created a translation calls which will be used during evaluation.
https://github.com/tidymodels/parsnip/blob/228a6dc6975fc91562b63d191e43d2164cc78e3d/R/rand_forest_data.R#L339
If we use call2 and unpack the named args the order does not matter. And as we know that args will be properly named because of additional translation step.
args <- list(na.rm = TRUE, trim = 0)
rlang::call2("mean", 1:10, !!!args)
The way we do this is through the set_fit() function. Most models are pretty sensible and we can use default mappings (for example, from data argument to data argument or x to x) but you are right that some models use different norms. An example of this are the Spark models that use x to mean what we might normally call data with a formula method.
The random forest set_fit() function for Spark looks like this:
set_fit(
model = "rand_forest",
eng = "spark",
mode = "classification",
value = list(
interface = "formula",
data = c(formula = "formula", data = "x"),
protect = c("x", "formula", "type"),
func = c(pkg = "sparklyr", fun = "ml_random_forest"),
defaults = list(seed = expr(sample.int(10 ^ 5, 1)))
)
)
Notice especially the data element of the value argument. You can read a bit more here.
I'd like to define a wrapper class that will encapsulate an actual model, and let the user call predict() with either new data frames or a model matrix:
raw_model <- ...
model <- Model(raw_model)
X <- matrix(...)
predict(model, X)
df <- data.frame(...)
predict(model, df)
I thought it would simply be a matter of defining two methods for predict(), dispatching on the types of the first two arguments:
library(methods)
Model <- setClass("Model", slots = "model")
setMethod("predict", signature("Model", "matrix"),
function(object, newdata, ...) {
stats::predict(object#model, newdata)
})
setMethod("predict", signature("Model", "data.frame"),
function(object, newdata, ...) {
matrix <- model.matrix(newdata) # or something like that
stats::predict(object#model, matrix)
})
However, both calls to setMethod fail with
Error in matchSignature(signature, fdef) :
more elements in the method signature (2) than in the generic signature (1) for function ‘predict’
I understand that an S4 generic is created from the S3 generic predict, whose signature takes just one named argument object, but is there a way to have the S4 methods dispatch on more than just that first argument?
You can make S4 generics dispatch on more than one argument, but you cannot (currently) dispatch on a named argument and .... This is the problem with predict — the only named argument is object.
You can still achieve what you want though, by defining your own generic "one level down" along the lines of
predict2 <- function(model,newdata){stats::predict(model,newdata)}
setGeneric("predict2",signature=c("model","newdata"))
setMethod(
"predict2",
signature=c("Model","data.frame"),
definition=function(model,newdata){
matrix <- model.matrix(newdata) # or something like that
stats::predict(object#model, matrix)
}
)
Now you modify predict.Model (and the S4 method for predict with the signature model="Model") to call predict2(model,newdata).
I have a wrapper function of two functions. Each function has its own parameters vectors. The main idea is to pass the vectors of parameters (which is a vector or two vectors) to optim and then, I would like to maximize the sum of the function.
Since my function is so complex, then I tried to provide a simple example which is similar to my original function. Here is my code:
set.seed(123)
x <- rnorm(10,2,0.5)
ff <- function(x, parOpt){
out <- -sum(log(dnorm(x, parOpt[[1]][1], parOpt[[1]][2]))+log(dnorm(x,parOpt[[2]][1],parOpt[[2]][2])))
return(out)
}
# parameters in mu,sd vectors arranged in list
params <- c(set1 = c(2, 0.2), set2 = c(0.5, 0.3))
xy <- optim(par = params, fn=ff ,x=x)
Which return this error:
Error in optim(par = params, fn = ff, x = x) :
function cannot be evaluated at initial parameters
As I understand, I got this error because optim cannot pass the parameters to each part of my function. So, how can I tell optim that the first vector is the parameter of the first part of my function and the second is for the second part.
You should change method parameter to use initial parameters.
You can read detailed instructions about optim function using ?optim command.
For example you can use "L-BFGS-B" method to use upper and lower constraints.
I have a function that works just fine when asked to calculate the -logLik given parameters. However, if I try to optimize the function it returns an error message. I'm familiar with debug() to work through problems with a function, but how would I go about debugging optimization for a function that othwerwise works?
Lik <- function(params, data) {
....
return(-log( **likelihood equation** ))
}
These work!
Lik(params=c(3,10,2,9,rowMeans(data[1,])[1]), data = data1)
Lik(params=c(3,10,2,9.5,rowMeans(data[1,])[1]), data = data1)
GENE1 32.60705
GENE1 32.31657
This doesn't work!
optim(params=c(3,10,2,9,rowMeans(data[1,])[1]), data = data1, Lik, method = "BFGS")
Error in optim(params = c(3, 10, 2, 9, rowMeans(data[1, ])[1]), data = data1, :
cannot coerce type 'closure' to vector of type 'double'
The optim parameter name for the parameters to optimize over is par, not params. You don't need to change your Lik function, it just needs to have the parameters to optimize over as the first argument, the name doesn't matter.
This should work. Here I name the fn argument too, but because the others are named the positional finding works.
optim(par=c(3, 10, 2, 9, rowMeans(data[1, ])[1]),
data=data1, fn=Lik, method="BFGS")
So what was happening in your code was that it was saving both params and data to send to the function, and then the first unnamed parameter was Lik so it was getting matched to the first parameter of optim, which is par, the parameters to optimize over. That parameter should be a numeric (a double, technically) but you were sending it a function (a closure, technically), hence the error message.
To debug, you could have turned on debugging for optim debug(optim) and then at the first browse, explored what the parameters were that it was using. You would have found exactly this, though simply in exploring the parameters, you would have discovered you named them incorrectly.
Browse[2]> print(par)
function(params, data) {... return(-log( **likelihood equation** ))}
Browse[2]> print(fn)
Error in print(fn) : argument "fn" is missing, with no default
It is bad practice to use built-in function names as object names created (or to be created) by the user.
When there is no "data" object (a matrix or a data frame) yet created by the user, R interpreter scans the environments and finds that the only object named "data" is the built in "data" function:
> class(data)
[1] "function"
> str(data)
function (..., list = character(), package = NULL, lib.loc = NULL, verbose = getOption("verbose"),
envir = .GlobalEnv)
Hence R treats the "data" object as a closure (a function declaration) that cannot be subsetted:
> data[1]
Error in data[1] : object of type 'closure' is not subsettable
So you should change the name of the parameter to sth other than data.
And a second point, the syntax of optim is:
optim(par, fn, gr = NULL, ...,
method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN",
"Brent"),
lower = -Inf, upper = Inf,
control = list(), hessian = FALSE)
So in your example, the second parameter supplied to optim should be the function Lik, not the data. And the interpreter tries to interpret data1 as a closure. You can try to swap the positions of data1 and Lik.
And more importantly as #李哲源ZheyuanLi also points, there is no parameter in optim named as "data". You should just write it as "data1" in place of the additional function parameters "...".
And last, as also #Aaron pointed out, the first parameter is named "par" not params".
I'm building a function for many model types which needs to extract the formula used to make the model. Is there a flexible way to do this? For example:
x <- rnorm(10)
y <- rnorm(10)
z <- rnorm(10)
equation <- z ~ x + y
model <- lm(equation)
I what I need to do is extract the formula object "equation" once being passed the model.
You could get what you wanted by:
model$call
# lm(formula = formula)
And if you want to see what I did find out then use:
str(model)
Since you passed 'formula' (bad choice of names by the way) from the calling environment you might then need to extract from the object you passed:
eval(model$call[[2]])
# z ~ x + y
#JPMac offered a more compact method: formula(model). It's also worth looking at the mechanism used by the formula.lm function. The function named formula is generic and you use methods(formula) to see what S3 methods have been defined. Since the formula.lm method has an asterisk at its end, you need to wrap it in `getAnywhere:
> getAnywhere(formula.lm)
A single object matching ‘formula.lm’ was found
It was found in the following places
registered S3 method for formula from namespace stats
namespace:stats
with value
function (x, ...)
{
form <- x$formula
if (!is.null(form)) {
form <- formula(x$terms)
environment(form) <- environment(x$formula)
form
}
else formula(x$terms)
}
<bytecode: 0x36ff26158>
<environment: namespace:stats>
So it is using "$" to extract the list item named "formula" rather than pulling it from the call. If the $formula item is missing (which it is in your case) then It then replaces that with formula(x$terms) which I suspect is calling formula.default and looking at the operation of that function appears to only be adjusting the environment of the object.
As noted, model$call will get you the call that created the lm object, but if that call contains an object itself as the model formula, you get the object name, not the formula.
The evaluated object, ie the formula itself, can be accessed in model$terms (along with a bunch of auxiliary information on how it was treated). This should work regardless of the details of the call to lm.