What causes this weird behaviour in the randomForest.partialPlot function? - r

I am using the randomForest package (v. 4.6-7) in R 2.15.2. I cannot find the source code for the partialPlot function and am trying to figure out exactly what it does (the help file seems to be incomplete.) It is supposed to take the name of a variable x.var as an argument:
library(randomForest)
data(iris)
rf <- randomForest(Species ~., data=iris)
x1 <- "Sepal.Length"
partialPlot(x=rf, pred.data=iris, x.var=x1)
# Error in `[.data.frame`(pred.data, , xname) : undefined columns selected
partialPlot(x=rf, pred.data=iris, x.var=as.character(x1))
# works!
typeof(x1)
# [1] "character"
x1 == as.character(x1)
# TRUE
# Now if I try to wrap it in a function...
f <- function(w){
partialPlot(x=rf, pred.data=iris, x.var=as.character(w))
}
f(x1)
# Error in as.character(w) : 'w' is missing
Questions:
1) Where can I find the source code for partialPlot?
2) How is it possible to write a function which takes a string x1 as an argument where x1 == as.character(x1), but the function throws an error when as.character is not applied to x1?
3) Why does it fail when I wrap it inside a function? Is partialPlot messing with environments somehow?
Tips/ things to try that might be helpful for solving such questions by myself in future would also be very welcome!

The source code for partialPlot() is found by entering
randomForest:::partialPlot.randomForest
into the console. I found this by first running
methods(partialPlot)
because entering partialPlot only tells me that it uses a method. From the methods call we see that there is one method, and the asterisk next to it tells us that it is a non-exported function. To view the source code of a non-exported function, we use the triple-colon operator :::. So it goes
package:::generic.method
Where package is the package, generic is the generic function (here it's partialPlot), and method is the method (here it's the randomForest method).
Now, as for the other questions, the function can be written with do.call() and you can pass w without a wrapper.
f <- function(w) {
do.call("partialPlot", list(x = rf, pred.data = iris, x.var = w))
}
f(x1)
This works on my machine. It's not so much environments as it is evaluation. Many plotting functions use some non-standard evaluation, which can be handled most of the time with this do.call() construct.
But note that outside the function you can also use eval() on x1.
partialPlot(x = rf, pred.data = iris, x.var = eval(x1))
I don't really see a reason to check for the presence of as.character() inside the function. If you can leave a comment we can go from there if you need more info. I'm not familiar enough with this package yet to go any further.

Related

ksmooth function doesn't work with parameters via ellipsis

I am currently working with R due to a course at university, so I am still quite inexperienced.
We use R for exploratory data analysis. In a data analysis we are supposed to apply different regression models to the data and generate the same plots for each. Additionally, we are supposed to play a bit with the parameters for learning purposes. To avoid unattractive 10-20 times copy-pasting I wrote a function that shows the regression function and the parameters for it as an ellipsis (...). In this function I call the passed function with the ellipsis as parameter.
library("astsa")
data_glob <- globtemp
plot.data.and.reg <- function(data, reg.func, ...){
model <- reg.func(...)
par(mfrow = c(1, 2))
plot(data)
abline(model, col = "orange", lwd = 3)
qqnorm(data)
}
This works for the simple lm function, but unfortunately not for the ksmooth function.
When I pass this function I get the error message: "numeric y must be supplied. For density estimation use density()".
plot.data.and.reg(
data_g,
lm,
list(
formula = as.formula("data_glob ~ time(data_glob)"),
data = data_glob
)
)
plot.data.and.reg(
data_glob,
ksmooth,
list(
x = as.numeric(time(data_glob)),
y = as.numeric(data_glob),
kernel = "box",
bandwidth = 0.25
)
)
Thereupon I looked at the source code of ksmooth. It shows that this error message occurs because the check "missing(y)" fails. Apparently a problem occurs because I passed the parameters as an ellipsis and it doesn't seem to "unpack".
For simplicity, I wrote a dummy function to test if I can add this "unpack" myself.
test.wrapper <- function(func, ...){
func(...)
}
test <- function(x, y){
match.call()
if(missing(y))
print("Leider hatte ich Recht")
print(x)
print(y)
}
test.wrapper(test, list(x = 10, y = 20))
Unfortunately I have not found a solution yet.
From Python I know it so that as with kwargs a dictionary can be unpacked with the ** operator. Is there an equivalent in R? Or how to make sure in R that the parameters from the ellipsis are used correctly?
Since it worked with the lm function without errors I also looked again in their source code . Unfortunately, with my little experience in R, I can't see exactly where the essential difference is.
Overall, I would attribute the error to the fact that the ksmooth function is not yet designed for use with an ellipsis, but I am not sure. How would I need to adjust the ksmooth code to make it work with ...?
(For my Uni task, I will resort to the copy-paste (anti) pattern if in doubt. After searching for so long, I would still be interested in the solution and it may be useful in the future).
Thanks a lot for your help!
The closest equivalent of the */** splat in Python is the do.call function.
However, you don’t need this here. The actual issue is that you’re passing the extra arguments as a list rather than individually. Once you flatten the list, it works1:
plot.data.and.reg(
data_glob,
ksmooth,
x = as.numeric(time(data_glob)),
y = as.numeric(data_glob),
kernel = "box",
bandwidth = 0.25
)
I’m actually surprised that it works with a list for lm; that’s not intentional, it’s essentially an accident caused by how lm is currently implemented.
1 I say it “works” because there’s no error and it plots something, but with your example data there’s no visible regression line (abline is inappropriate for the output of ksmooth), and the smoothing parameters do nothing — the result is identical to the unsmoothed input.
To get this to work, use lines instead of abline. And as for the smoothing, for your example data a bandwidth of 10 works fine.

optimParallel in Package of the same name cannot find C_dnorm function

I want to optimize a function from a package in R using optimParallel. Till now I only optimized functions that I wrote in my environment and it worked. But functions from any package don't work and I get a Error. I checked with .libPaths() if the paths are the same on each node and I used Sys.info() to check for any differences. Here is an example (which is not meaningful, but it should show my problem)
library(optimParallel)
.libPaths()
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"
cl <- makeCluster(2) #also tried to set "master" to my IP
clusterEvalQ(cl, .libPaths())
[[1]]
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"
[[2]]
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"
setDefaultCluster(cl)
optimParallel(par=0, dnorm, mean=1, method = "L-BFGS-B")$par
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: object 'C_dnorm' not found
#for comparison
optim(par=0, dnorm, mean=1, method = "L-BFGS-B")$par
[1] -5.263924
What am I doing wrong?
Edit: The problem is solved in optimParallel version 0.7-4
The version is available on CRAN: https://CRAN.R-project.org/package=optimParallel
For older versions:
A workaround is to wrap dnorm() into a function defined in the .GlobalEnv.
library("optimParallel")
cl <- makeCluster(2)
setDefaultCluster(cl)
f <- function(x, mean) dnorm(x, mean=mean)
optimParallel(par=0, f, mean=1, method="L-BFGS-B")$par
[1] -5.263924
A more difficult task is to explain why the problem occurs:
optimParallel() uses parallel::parLapply() to evaluate f.
parLapply() has the arguments cl, X, fun.
If we would use parLapply() without pre-processing the arguments passed via ... of optimParallel(), f could not have arguments named cl, X, fun, because this would cause errors like:
Error in lapply(X = x, FUN = f, ...) (from #2) :
formal argument "X" matched by multiple actual arguments
Simply speaking, optimParallel() avoids this error by removing all arguments from f, putting them into an environment and evaluating f in that environment.
One problem of that approach occurs when f is defined in another R package and links to compiled code. That case is illustrated in the question above.
Suggestions for better approaches to handle the issue are welcome. I opened a corresponding question here. As long as there is no better solution, one can use the workaround illustrated above.
Reasoning that your error message indicated that the parallel processes were not getting adequate information, I looked at the examples in the documentation of the optimParallel package. The first one defines a helper function which will carry an environment with it, but it otherwise resembles yours in some respects.
library(optimParallel)
set.seed(123); x <- rnorm(n=1000, mean=1, sd=2)
negll <- function(par, x) -sum(dnorm(x=x, mean=par[1], sd=par[2], log=TRUE))
o1 <- optimParallel(par=c(0, 1), fn=negll, x=x, method="L-BFGS-B", lower=c(-Inf, 0.0001))
o1$par
#[1] 1.032256 1.982398
That example also differs from yours in that it is using data to estimate the parameters. I'm not sure what your result means, whereas I do understand what the values returned by the modification of that example that I posted here. The minimum log-likelihood for that particular data (not completely reproducible since I forgot to set a seed) is at a mean of 1.126 and an sd of 2.007.
For an example of how to create a situation where the environment of a non-base package gets carried to the workers, see this prior answer: parallel::clusterExport how to pass nested functions from global environment?

R do something after a warning (like tryCatch a warning, then edit an object)

I'm running a bunch of logit models, some of them with perfect separation which returns a glm warning. Here a dataset that shows the problem:
DT <- iris
str(DT)
DT$binary <- as.numeric(DT$Petal.Width>1)
DT$dummy <- as.numeric(as.numeric(DT$Species)>2)
mylogit <- glm(binary~Sepal.Length+dummy,data = DT, family=binomial(link='logit'))
I'm collecting estimates, model fit, etc from mylogit inside an apply function and would like to add a dummy showing if this warning was returned. However, I don't understand the tryCatch() syntax enough and the examples I find are mostly aimed at returning warnings etc. I'm looking for something like:
if(warning is returned){x <- 1}
Is tryCatch() the wrong approach?
Yes, tryCatch is the right function to use:
x <- 0
tryCatch(
mylogit <- glm(binary~Sepal.Length+dummy,data = DT, family=binomial(link='logit')),
warning = function(w) { x <<- x + 1 }
)
The <<- is necessary, as you are assigning to a variable that is outside the scope of the function. (Usually that is a bad idea but here it is necessary.)
If you want to do something with the warning text, use conditionMessage(w).
tryCatch would be the correct approach. I agree with you that some examples are not as clear and had some trouble with tryCatch in the past myself as well. I always find the following SO answer a helpful reference: How to write trycatch in R

Weird behaviour of the car::boxCox() function when wrap into a homemade function

I'm trying to wrap the car::boxCox function into a homemade function so I can mapply it to a list of datasets. I'm using the boxCox function from the car package and not the MASS package because I want to use the family="yjPower". My problem is weird and it's either something fondamental I don't understand or some kind of bug. Here is a reproducible example:
library(car)
le.mod <- function(val.gold,val.bad){
donn <- data.frame(val.gold,val.bad)
res.lm <- lm(val.gold ~ val.bad, data=donn)
bcres <- boxCox(res.lm, family="yjPower", plotit=F)
lambda <- bcres$x[which.max(bcres$y)]
donn$val.bad.t <- donn$val.bad^lambda
res.lm <- lm(val.gold ~ val.bad.t, data=donn)
list(res.lm=res.lm, lambda = lambda)
}
xx <- runif(1000,1,100)
xxt1 <- xx^0.6 + runif(1000,1,10)
yy <- 2*xx + 10 + rnorm(1000,0,2)
le.mod(yy,xxt1)
This gives me the error message:
## Error in is.data.frame(data) : object 'donn' not found
I pin-pointed the problem to the line:
bcres <- boxCox(res.lm, family="yjPower", plotit=F)
boxCox is suppose to be able to take a lm class object, it just doesn't find the associated data that were created 2 lines before.
It works well outside of the function le.mod(). It's probably a problem related to environment management, the boxCox fonction looking for "donn" in the global environment but not finding it and for a reason I ignore not looking for it in the function specific environment.
Anybody have an idea to fix this or explain to me what I don't understand here? I've been turning my head over this problem for days and I can't get it working.
Thanks
I've found the answer (!), however I can't understand the reason of the behaviour so if somebody have an explanation, don't hesitate to post it.
The solution by adding y=TRUE in the second line of the function:
res.lm <- lm(val.gold ~ val.bad, data=donn,y=TRUE)
For some reasons, this allows it to get throught.

How to use a character as attribute of a function

I want to run a multiple comparisons analysis for the different variables of a model. My idea is as follows:
library(multcomp)
set.seed(123)
x1 <- gl(4,10)
x2 <- gl(5,2,40)
y <- rnorm(40)
fm1 <- lm(y ~ x1 + x2)
for(var in c('x1', 'x2'))
{
mc1 <- glht(fm1, linfct=mcp(var='Tukey'))
print(summary(mc1))
}
When I run, I get the following error:
Error en mcp2matrix(model, linfct = linfct) :
Variable(s) ‘var’ have been specified in ‘linfct’ but cannot be found in ‘model’!
That is, it is not possible to use a character to specify an attribute of the mcp function.
Anyone knows a solution?
It's generally better to avoid working with strings representing code wherever possible - it prevents errors that are hard to debug, and aesthetically is much more elegant. This problem turns out to be fairly easy to solve if you use do.call and the setNames function:
var <- "x1"
cmp <- do.call(mcp, setNames(list("Tukey"), var))
glht(fm1, linfct = cmp)
You can't use substitute here because it does not allow you modify the names of function parameters. I have some intuition for why this is reasonable, but not enough to explain it :/
If you're a package author, it's a good idea to provide an alternative version of functions that use unusual syntax so they can be accessed programmatically without jumping through hoops.
(Update: Make sure to see Hadley's answer for the better way of doing this, without resorting to string-pasting. My answer will still be useful for explaining why that is harder-than-usual in this case.)
The peculiarities of mcp() require you to use the relatively brute force approach of pasting together the expression you'd like to evaluate and then passing it through eval(parse()).
The tricky bit is that mcp() interprets its first argument in a nonstandard way. Within mcp(), x1 = 'Tukey' does not (as it normally would) mean "assign a value of 'Tukey' to the argument x1". Instead, the whole thing is interpreted as a symbolic description of the intended contrasts. (In this, it is much like more familiar formula objects such as the y ~ x1 + x2 in your lm() call).
for(var in c('x1', 'x2')) {
# Construct a character string with the expression you'd type at the command
# line. For example : "mcp(x1 = 'Tukey')"
exprString <- paste("mcp(", var, "='Tukey')")
# eval(parse()) it to get an 'mcp' object.
LINFCT <- eval(parse(text = exprString))
mc1 <- glht(fm1, linfct = LINFCT)
print(summary(mc1))
}
Have you tried: eval(parse(text='variable'))
or assign ?

Resources