I want to run a multiple comparisons analysis for the different variables of a model. My idea is as follows:
library(multcomp)
set.seed(123)
x1 <- gl(4,10)
x2 <- gl(5,2,40)
y <- rnorm(40)
fm1 <- lm(y ~ x1 + x2)
for(var in c('x1', 'x2'))
{
mc1 <- glht(fm1, linfct=mcp(var='Tukey'))
print(summary(mc1))
}
When I run, I get the following error:
Error en mcp2matrix(model, linfct = linfct) :
Variable(s) ‘var’ have been specified in ‘linfct’ but cannot be found in ‘model’!
That is, it is not possible to use a character to specify an attribute of the mcp function.
Anyone knows a solution?
It's generally better to avoid working with strings representing code wherever possible - it prevents errors that are hard to debug, and aesthetically is much more elegant. This problem turns out to be fairly easy to solve if you use do.call and the setNames function:
var <- "x1"
cmp <- do.call(mcp, setNames(list("Tukey"), var))
glht(fm1, linfct = cmp)
You can't use substitute here because it does not allow you modify the names of function parameters. I have some intuition for why this is reasonable, but not enough to explain it :/
If you're a package author, it's a good idea to provide an alternative version of functions that use unusual syntax so they can be accessed programmatically without jumping through hoops.
(Update: Make sure to see Hadley's answer for the better way of doing this, without resorting to string-pasting. My answer will still be useful for explaining why that is harder-than-usual in this case.)
The peculiarities of mcp() require you to use the relatively brute force approach of pasting together the expression you'd like to evaluate and then passing it through eval(parse()).
The tricky bit is that mcp() interprets its first argument in a nonstandard way. Within mcp(), x1 = 'Tukey' does not (as it normally would) mean "assign a value of 'Tukey' to the argument x1". Instead, the whole thing is interpreted as a symbolic description of the intended contrasts. (In this, it is much like more familiar formula objects such as the y ~ x1 + x2 in your lm() call).
for(var in c('x1', 'x2')) {
# Construct a character string with the expression you'd type at the command
# line. For example : "mcp(x1 = 'Tukey')"
exprString <- paste("mcp(", var, "='Tukey')")
# eval(parse()) it to get an 'mcp' object.
LINFCT <- eval(parse(text = exprString))
mc1 <- glht(fm1, linfct = LINFCT)
print(summary(mc1))
}
Have you tried: eval(parse(text='variable'))
or assign ?
Related
I need to use mixed model lme function many times in my code. But I do not know how to use it within a function. If used otherwise, the lme function works just well but when used within the function, it throws errors:
myfunc<- function(cc, x, y, z)
{
model <- lme(fixed = x ~1 , random = ~ 1|y/z,
data=cc,
method="REML")
}
on calling this function:
myfunc (dbcon2, birthweight, sire, dam)
I get the error :
Error in model.frame.default(formula = ~x + y + z, data = list(animal
= c("29601/9C1", : invalid type (list) for variable 'x'
I think, there is a different procedure for using this which I am unaware of. Any help would be greatly appreciated.
Thanks in advance
Not sure if you are looking for this, you may try to use this, as correctly pointed out by #akrun, you may use paste, I am using paste0 however(its a special case of paste), paste concatenates two strings:
Here the idea is to concatenate the variable names with the formula, but since paste converts it to a string hence you can't refer that as formula to build a model,so you need to convert that string to a formula using as.formula which is wrapped around paste0 statement.
To understand above, Try writing a formula like below using paste:
formula <-paste0("mpg~", paste0("hp","+", "am"))
print(formula)
[1] "mpg~hp+am"
class(formula)
[1] "character" ##This should ideally be a formula rather than character
formula <- as.formula(formula) ##conversion of character string to formula
class(formula)
[1] "formula"
To work inside a model, you would always require a formula object, also please also try to learn about collapse and sep option in paste they are very handy.
I don't have your data , hence I have used mtcars data to represent the same.
library("nlme")
myfunc<- function(cc, x, y, z)
{
model <- lme(fixed = as.formula(paste0(x," ~1")) , random = as.formula(paste0("~", "1|",y,"/",z)),
data=cc,
method="REML")
}
models <- myfunc(cc=mtcars, x="hp", y="mpg", z="am")
summary(models)
You can read more about paste by typing ?paste in your console.
I'm running a bunch of logit models, some of them with perfect separation which returns a glm warning. Here a dataset that shows the problem:
DT <- iris
str(DT)
DT$binary <- as.numeric(DT$Petal.Width>1)
DT$dummy <- as.numeric(as.numeric(DT$Species)>2)
mylogit <- glm(binary~Sepal.Length+dummy,data = DT, family=binomial(link='logit'))
I'm collecting estimates, model fit, etc from mylogit inside an apply function and would like to add a dummy showing if this warning was returned. However, I don't understand the tryCatch() syntax enough and the examples I find are mostly aimed at returning warnings etc. I'm looking for something like:
if(warning is returned){x <- 1}
Is tryCatch() the wrong approach?
Yes, tryCatch is the right function to use:
x <- 0
tryCatch(
mylogit <- glm(binary~Sepal.Length+dummy,data = DT, family=binomial(link='logit')),
warning = function(w) { x <<- x + 1 }
)
The <<- is necessary, as you are assigning to a variable that is outside the scope of the function. (Usually that is a bad idea but here it is necessary.)
If you want to do something with the warning text, use conditionMessage(w).
tryCatch would be the correct approach. I agree with you that some examples are not as clear and had some trouble with tryCatch in the past myself as well. I always find the following SO answer a helpful reference: How to write trycatch in R
I am using the randomForest package (v. 4.6-7) in R 2.15.2. I cannot find the source code for the partialPlot function and am trying to figure out exactly what it does (the help file seems to be incomplete.) It is supposed to take the name of a variable x.var as an argument:
library(randomForest)
data(iris)
rf <- randomForest(Species ~., data=iris)
x1 <- "Sepal.Length"
partialPlot(x=rf, pred.data=iris, x.var=x1)
# Error in `[.data.frame`(pred.data, , xname) : undefined columns selected
partialPlot(x=rf, pred.data=iris, x.var=as.character(x1))
# works!
typeof(x1)
# [1] "character"
x1 == as.character(x1)
# TRUE
# Now if I try to wrap it in a function...
f <- function(w){
partialPlot(x=rf, pred.data=iris, x.var=as.character(w))
}
f(x1)
# Error in as.character(w) : 'w' is missing
Questions:
1) Where can I find the source code for partialPlot?
2) How is it possible to write a function which takes a string x1 as an argument where x1 == as.character(x1), but the function throws an error when as.character is not applied to x1?
3) Why does it fail when I wrap it inside a function? Is partialPlot messing with environments somehow?
Tips/ things to try that might be helpful for solving such questions by myself in future would also be very welcome!
The source code for partialPlot() is found by entering
randomForest:::partialPlot.randomForest
into the console. I found this by first running
methods(partialPlot)
because entering partialPlot only tells me that it uses a method. From the methods call we see that there is one method, and the asterisk next to it tells us that it is a non-exported function. To view the source code of a non-exported function, we use the triple-colon operator :::. So it goes
package:::generic.method
Where package is the package, generic is the generic function (here it's partialPlot), and method is the method (here it's the randomForest method).
Now, as for the other questions, the function can be written with do.call() and you can pass w without a wrapper.
f <- function(w) {
do.call("partialPlot", list(x = rf, pred.data = iris, x.var = w))
}
f(x1)
This works on my machine. It's not so much environments as it is evaluation. Many plotting functions use some non-standard evaluation, which can be handled most of the time with this do.call() construct.
But note that outside the function you can also use eval() on x1.
partialPlot(x = rf, pred.data = iris, x.var = eval(x1))
I don't really see a reason to check for the presence of as.character() inside the function. If you can leave a comment we can go from there if you need more info. I'm not familiar enough with this package yet to go any further.
I want to write a function that evaluates an expression in a data frame, but one that does so using expressions that may or may not contain user-defined objects.
I think the magic word is "non-standard evaluation", but I cannot quite figure it out just yet.
One simple example (yet realistic for my purposes): Say, I want to evaluate an lm() call for variables found in a data frame.
mydf <- data.frame(x=1:10, y=1:10)
A function that does so can be written as follows:
f <- function(df, expr){
expr <- substitute(expr)
pf <- parent.frame()
eval(expr, df, pf)
}
Such that I get what I want using the following command.
f(mydf, lm(y~x))
# Call:
# lm(formula = y ~ x)
#
# Coefficients:
# (Intercept) x
# 1.12e-15 1.00e+00
Nice. However, there are cases in which it is more convenient to save the model equation in an object before calling lm(). Unfortunately the above function no longer does it.
fml <- y~x
f(mydf, lm(fml))
# Error in eval(expr, envir, enclos): object 'y' not found
Can someone explain why the second call doesn't work? How could the function be altered, such that both calls would lead to the desired results? (desired=fitted model)
Cheers!
From ?lm, re data argument:
If not found in data, the variables are taken from environment(formula)
In your first case, the formula is created in your eval(expr, df, pf) call, so the environment of the formula is an environment based on df. In the second case, the formula is created in the global environment, which is why it doesn't work.
Because formulas come with their own environment, they can be tricky to handle in NSE.
You could try:
with(mydf,
{
print(lm(y~x))
fml <- y~x
print(lm(fml))
}
)
but that probably isn't ideal for you. Short of checking whether any names in the captured parameter resolve to formulas, and re-assigning their environments, you'll have some trouble. Worse, it isn't even necessarily obvious that re-assigning the environment is the right thing to do. In many cases, you do want to look in the formula environment.
There was a loosely related discussion on this issue on R Chat:
Ben Bolker outlines an issue
Josh O'Brien points to some old references
I am writing a function that takes two variables and separately regresses each of them on a set of controls expressed as a one-sided formula. Right now I'm using the following to make the formula for one of the regressions, but it feels a bit hacked-up:
foo <- function(x, y, controls) {
cl <- match.call()
xn <- cl[["x"]]
xf <- as.formula(paste(xn, deparse(controls)))
}
I'd prefer to do this using update.formula(), but of course update.formula(controls, x ~ .) and update.formula(controls, as.name(x) ~ .) don't work. What should I be doing?
Here's one approach:
right <- ~ a + b + c
left <- ~ y
left_2 <- substitute(left ~ ., list(left = left[[2]]))
update(right, left_2)
But I think you'll have to either paste text strings together, or use substitute. To the best of my knowledge, there are no functions to create one two sided formula from two one-sided formulas (or similar equivalents).
I am not sure about update.formula(), but I have used the approach you take here of pasting text and converting it via as.formula in the past with success. My reading of help(update.formula) does not make me think you can substitute the left-hand side as you desire.
Lastly, trust the dispatching mechanism. If you object is of type formula, just call update which is preferred over the explicit update.formula.