R - as.formula() not working with ctree {party}? - r

I get Error: $ operator not defined for this S4 class when I try to run a ctree from the party package, but only when the formula is writen as a string that I transform using as.formula().
Below the example :
#This works fine :
y <- ctree(formula = quotation ~ minute + temp, data=test[[1]], controls = ctree_control(mincriterion = 0.99))
#While this doesn't :
x <- "ctree(formula = quotation ~ minute + temp, data=test[[1]], controls = ctree_control(mincriterion = 0.99))"
y <- as.formula(x)
Error: $ operator not defined for this S4 class
My ultimate purpose is to create a function that iterates through the list test to create multiple trees.
Any idea ?

ctree is a function and not a formula. formula is the class of the object resulting from the function '~' (tilde). You can learn more about formulas from help('~') and help('formula').
The most common way to use as.formula is to convert a string that represents the formula syntax to an object of class formula. Something like as.formula('y ~ x'). Also, check class(as.formula(y~x)).
In your case you saved a string representing function ctree to variable x. Function ctree only contains a string representing a formula syntax (quotation ~ minute + temp) but it cannot be coerced to formula (it does not represent a formula, it just contains a formula syntax string) because it does not follow the formula syntax.
If you want to execute a function from text you need to use eval(parse(text = x)) although this technique is not encouraged..

Related

lme within a user defined function in r

I need to use mixed model lme function many times in my code. But I do not know how to use it within a function. If used otherwise, the lme function works just well but when used within the function, it throws errors:
myfunc<- function(cc, x, y, z)
{
model <- lme(fixed = x ~1 , random = ~ 1|y/z,
data=cc,
method="REML")
}
on calling this function:
myfunc (dbcon2, birthweight, sire, dam)
I get the error :
Error in model.frame.default(formula = ~x + y + z, data = list(animal
= c("29601/9C1", : invalid type (list) for variable 'x'
I think, there is a different procedure for using this which I am unaware of. Any help would be greatly appreciated.
Thanks in advance
Not sure if you are looking for this, you may try to use this, as correctly pointed out by #akrun, you may use paste, I am using paste0 however(its a special case of paste), paste concatenates two strings:
Here the idea is to concatenate the variable names with the formula, but since paste converts it to a string hence you can't refer that as formula to build a model,so you need to convert that string to a formula using as.formula which is wrapped around paste0 statement.
To understand above, Try writing a formula like below using paste:
formula <-paste0("mpg~", paste0("hp","+", "am"))
print(formula)
[1] "mpg~hp+am"
class(formula)
[1] "character" ##This should ideally be a formula rather than character
formula <- as.formula(formula) ##conversion of character string to formula
class(formula)
[1] "formula"
To work inside a model, you would always require a formula object, also please also try to learn about collapse and sep option in paste they are very handy.
I don't have your data , hence I have used mtcars data to represent the same.
library("nlme")
myfunc<- function(cc, x, y, z)
{
model <- lme(fixed = as.formula(paste0(x," ~1")) , random = as.formula(paste0("~", "1|",y,"/",z)),
data=cc,
method="REML")
}
models <- myfunc(cc=mtcars, x="hp", y="mpg", z="am")
summary(models)
You can read more about paste by typing ?paste in your console.

How is formula specified in R's lm function

I have a function in R, lets say
myfunction <- function(formula,data)
Among other things, the function contains a call to lm(). Formula should include the covariates, and should be specified as
formula = x1 + x2 + ... + x_n
Data contains columns Z and W, where the response
y=data$Z/data$W
I only want to have formula including the covariates, since the function modifies the response variable for each iteration.
The call for lm() should then work with
lm(y~formula,data=data)
why would you do that? It is cleaner to pass the whole formula in myfunction, for instance:
myfunction <- function(formula,data) {
data = data*2 # this is an example of data manipulation
lm(formula=formula, data=data)
}
then use myfunction as you would use lm
If you REALLY want to create complexity (for nothing?), you can also use the fact that lm will coerce whatever string you pass as an argument into a proper formula object
myfunction2 <- function(formula2,data) {
data = data*2 # this is an example of data manipulation
lm(formula=paste0("y~",formula2), data=data)
}

Why does order matter when using "data" and "formula" keyword arguments?

In R, why is it that the order of the data and formula keywords matters when plotting? I thought that with named arguments order isn't supposed to matter...
For an example of what I mean, check out this code:
library(MASS)
data(menarche)
# Correct formulation (apparently):
plot(formula=Menarche/Total ~ Age, data=menarche)
# In contrast, note how the following returns an error:
plot(data=menarche, formula=Menarche/Total ~ Age)
Is this just a quirk of the plot function or is this behavior exhibited in other functions as well?
It is related to S3 methods for the S3 generic plot(). S3 dispatches methods based on the first argument however the exact functioning is complicated because formula is allowed as a special exception from the usual generic arguments of plot(), which are x and y plus ...:
> args(plot)
function (x, y, ...)
NULL
Hence what happens in the first case is that the plot.formula() method is run because the first argument supplied is a formula and this matches the arguments of plot.formula()
> args(graphics:::plot.formula)
function (formula, data = parent.frame(), ..., subset, ylab = varnames[response],
ask = dev.interactive())
NULL
for example:
> debugonce(graphics:::plot.formula)
> plot(formula=Menarche/Total ~ Age, data=menarche)
debugging in: plot.formula(formula = Menarche/Total ~ Age, data = menarche)
debug: {
m <- match.call(expand.dots = FALSE)
[...omitted...]
In contrast, when you call plot(data=menarche, formula=Menarche/Total ~ Age), the first argument is a data frame and hence the graphics:::plot.data.frame method is called:
> plot(data=menarche, formula=Menarche/Total ~ Age)
Error in is.data.frame(x) : argument "x" is missing, with no default
> traceback()
3: is.data.frame(x)
2: plot.data.frame(data = menarche, formula = Menarche/Total ~ Age)
1: plot(data = menarche, formula = Menarche/Total ~ Age)
but because that method expects an argument x, which you didn't supply, you get the error about missing x.
So in a sense, the ordering of named arguments doesn't and shouldn't matter but when S3 generics are in play method dispatch kicks in first to decide which method to pass the arguments on to and then the arguments supplied - not the ordering - is what will often catch you out, especially when mixing the formula methods with other non-formula methods.

Subsetting data breaks GLM

I have a GLM Logit regression that works correctly, but when I add a subset argument to the GLM command, I get the following error:
invalid type (list) for variable '(weights)'.
So, the following command works:
glm(formula = A ~ B + C,family = "binomial",data = Data)
But the following command yield the error:
glm(formula = A ~ B + C,family = "binomial",data = Data,subset(Data,D<10))
(I realize that it may be difficult to answer this without seeing my data, but any general help on what may be causing my problem would be greatly appreciated)
Try subset=D<10 instead (you don't need to specify Data again, it is implicitly used as the environment for the subset argument). Because you haven't named the argument, R is interpreting it as the weights argument (which is the next argument after data).

Extract formula from model in R

I'm building a function for many model types which needs to extract the formula used to make the model. Is there a flexible way to do this? For example:
x <- rnorm(10)
y <- rnorm(10)
z <- rnorm(10)
equation <- z ~ x + y
model <- lm(equation)
I what I need to do is extract the formula object "equation" once being passed the model.
You could get what you wanted by:
model$call
# lm(formula = formula)
And if you want to see what I did find out then use:
str(model)
Since you passed 'formula' (bad choice of names by the way) from the calling environment you might then need to extract from the object you passed:
eval(model$call[[2]])
# z ~ x + y
#JPMac offered a more compact method: formula(model). It's also worth looking at the mechanism used by the formula.lm function. The function named formula is generic and you use methods(formula) to see what S3 methods have been defined. Since the formula.lm method has an asterisk at its end, you need to wrap it in `getAnywhere:
> getAnywhere(formula.lm)
A single object matching ‘formula.lm’ was found
It was found in the following places
registered S3 method for formula from namespace stats
namespace:stats
with value
function (x, ...)
{
form <- x$formula
if (!is.null(form)) {
form <- formula(x$terms)
environment(form) <- environment(x$formula)
form
}
else formula(x$terms)
}
<bytecode: 0x36ff26158>
<environment: namespace:stats>
So it is using "$" to extract the list item named "formula" rather than pulling it from the call. If the $formula item is missing (which it is in your case) then It then replaces that with formula(x$terms) which I suspect is calling formula.default and looking at the operation of that function appears to only be adjusting the environment of the object.
As noted, model$call will get you the call that created the lm object, but if that call contains an object itself as the model formula, you get the object name, not the formula.
The evaluated object, ie the formula itself, can be accessed in model$terms (along with a bunch of auxiliary information on how it was treated). This should work regardless of the details of the call to lm.

Resources