Is there an elegant way to force the execution of a function and return a NA consistent with its normal output type if any error is encountered?
For example, to have lm batch process data in R, and pick just a single estimate, avoiding stop on errors using tryCatch:
lmCoeff <- function(beta, ...) {
tryCatch(ifelse(is.numeric(a <- lm(...)$coefficient[beta]), a, as.numeric(NA)),
error = function(e) {
return(as.numeric(NA))
} )
}
# test with 3 cases
(good <- lmCoeff(beta="cyl", mpg ~ cyl, mtcars))
(bad <- lmCoeff(beta="bad", mpg ~ cyl, mtcars))
(ugly <- lmCoeff(beta="cyl", ugly))
# output should always be of the same type:
str(good)
str(bad)
str(ugly)
The question is whether there a more elegant and less idiosyncratic way to do this.
As stated, tryCatch can be used to catch all errors and output an NA of numeric type if any error is found. In the case where you know what errors are to be expected, then you could directly check for them using a if statement.
For example, I can think of two possible errors: user enters invalid data object, or the formula is not valid. So a neater (and imo more responsible solution) would be:
lmCoeff <- function(beta, ...) {
if(<data does not exist>){
a <- NA
warning("data not found")
} else if (<formula invalid>){
a <- NA
warning("formula invalid")
} else {
lapply(..., function(x) print(str(x))))
a <- lm(...)$coefficient[beta]
}
as.numeric(a)
}
}
Of course in practice if it is too complicated to write conditions to check that the data does not exist, or the formula is invalid, then for simplicity you can use tryCatch.
Related
I am relying on the compareGroups package to do some comparisons after a pipe-chain. When subsetting the final results, the call to [ triggers a call to update (both in their bespoke compareGroups-versions) which leads to a scoping problem.
Try this:
library(tidyverse)
# install.packages("compareGroups")
library(compareGroups)
get_data <- function() return(mtcars)
assign_group <- function(df) {
n <- nrow(df)
df$group <- rbinom(n, 1, 0.5)
return(df)
}
get_results <- function(){
get_data() %>% assign_group %>% compareGroups(group ~ ., data = .)
}
res <- get_results()
# all the above works, but the following triggers the error:
res["mpg"]
This leads to the following error:
Error in compareGroups(formula = group ~ mpg, data = .) :
object '.' not found
The relevant (abbreviated) traceback is this:
compareGroups(formula = group ~ mpg, data = .)
eval(call, parent.frame())
update.compareGroups(x, formula = group ~ mpg)
update(x, formula = group ~ mpg) at <text>#1
eval(parse(text = cmd))
`[.compareGroups`(res, "mpg")
res["mpg"]
So, my understanding is that that the dot-notation in the dplyr pipe-chain prevents the update-call to find the dataframe, which is stored as . in the call. So, the error makes sense as neither . is not the name of the dataframe, nor available outside of the scope of the function get_results (though the main issue is the .). One obvious way of avoiding this error is by fixing the update.compareGroups function - I don't think we need another call to the package to redo all calculations when I simply want to retrieve individual results (which have already been calculated).
However, this is a more general issue with the . notation of dplyr and the fact it is stored in the call. This problem seems general enough so that I would imagine someone has encountered it before, and has found a more general solution?
Firstly, I don't think piping your data into compareGroups makes sense - remember that piping means the first argument to compareGroups() is now the dataframe, even though the function specification is:
compareGroups(formula, data, ...)
Secondly, this dplyr vignette shows you can use .data instead of just . to access the piped data. However, in this case the following will cause a crash giving message data argument will be ignored since formula is already a data set (due to the data being piped into first argument).
get_results <- function(){
get_data() %>% assign_group %>% compareGroups(group ~ ., data = .data) # does NOT work
}
Making a separate call to compareGroups without piping then gets me into an unholy mess of environments whereby res does not have access to the data when requesting res['mpg'] outside the function get_results(), as you already alluded to with the scoping problem. I think this is a compareGroups problem, because if I use the same architecture with glm there's no such problem. So best I can do is to take the dataframe out of the function environment, which I think doesn't properly answer your question:
get_data <- function() return(mtcars)
assign_group <- function(df) {
n <- nrow(df)
df$group <- rbinom(n, 1, 0.5)
return(df)
}
df = get_data() %>% assign_group()
res = compareGroups(group ~ ., data = df)
print(res['mpg'])
But I hope the first two points I made get you closer to an answer.
I am currently working on user defined functions aimed at modelling empirical data and I have problems with objects / parameters passed to the function:
bestModel <- function(k=4L, R2=0.994){
print(k) # here, everything is still fine
lmX <- mixlm::lm(getLinearModelFunction(k), data)
best <- mixlm::best.subsets(lmX, nbest=1)
.
.
.
}
At first, everything works as expected, but as soon as I want to pass the parameter k to another user defined function getLinearModelFunction(), an error is thrown:
Error in getLinearModelFunction(k) : object 'k' not found
It doesn't help, if I am assigning a new parameter, e. g. l <- k and try to pass that on. The parameter doesn't seem to be available for the other function. I ran into this problem not only with primitive data types, but as well complex structures. On command line, everything works, as long as the objects are in my workspace.
To sum it up: Passing parameters work only within that function, but calls of other functions from there onwards result in error. Why? And: What to do about it?
EDIT:
While trying to resolve the problem, it gets really weird. I stripped down all functions:
functionA <- function(data, k){
lmX <- mixlm::lm(functionB(k), data)
summary(lmX)
# best <- mixlm::best.subsets(lmX,nbest=1)
}
functionB <- function(k=4){
if(k==1){
return(formula("raw ~ L1"))
}else if(k==2){
return(formula("raw ~ L1 + L2"))
}else if(k==3){
return(formula("raw ~ L1 + L2 + L3 "))
}else if(k==4){
return(formula("raw ~ L1 + L2 + L3 + L4"))
}
}
Let's say, we have a data.frame d with the variables raw, L1, L2, L3, L4 ... As long, as there is the commenting # before best, it works. As soon as it is removed, calling functionA(d, 3) results in
Error in functionB(k) : object 'k' not found
Even, though k doesn't play a role in that function and before that, it worked.
Ok, indeed, this was an environment thing. The solution is to get the current environment and to take the object from there:
functionA <- function(data, k){
e <- environment()
lmX <- mixlm::lm(functionB(e$k), e$data)
summary(lmX)
best <- mixlm::best.subsets(lmX,nbest=1)
}
This is usually not a problem, when directly working with are packages. The objects usually are in the global environments then. When working with functions, each function has its' own environment. I managed to solve this while starting to learn about packaging the code: http://adv-r.had.co.nz/Environments.html
This question originates from curiosity, I have nothing to deliver based on this.
Mimicking pass-by-reference (question here) I noticed that both approaches described in the answers obviously fail when the variable does not exist and one tries to use/reference them.
Regardless of its actual usefulness, I would be curious to know if there is a way to initialize the parameter x in the code below, and hence the "actual" parameter myVar, to a default value, with the help of the desired type passed as a string, xtype (passing the type, and in such basic form is not a requirement, it is simply the first thing that came to my mind of non-advanced R programmer).
The question whose solution generated this, here, shows better code in the chosen answer, here using my code as I understand it better
myF <- function(x, xtype) {
varName <- deparse(substitute(x))
if (!exists(varName)) {
# here should initialize x to a default value
# of the type passed in xtype
# to avoid that x <- x ... fails
# this may not have any practical usefulness, just curious
}
x <- x+1
assign(varName,x,envir=parent.frame(n = 1))
NA # sorry this is not a function
# in real life sometimes you also need procedures
}
if (exists(deparse(substitute(myVar)))) {
rm(myVar)
}
myF(myVar, "numeric")
print(myVar)
Error in myF(myVar, "numeric") : object 'myVar' not found
# as expected
Maybe this is what you are looking for (even though it's a terrible idea to write a function like this in R).
myF <- function(x, xtype) {
varName <- deparse(substitute(x))
if (!exists(varName)) {
x <- vector(xtype, 1)
} else {
x <- get(varName)
}
x <- x+1
assign(varName,x,envir=parent.frame(n = 1))
}
I am trying to write an R function that takes a data set and outputs the plot() function with the data set read in its environment. This means you don't have to use attach() anymore, which is good practice. Here's my example:
mydata <- data.frame(a = rnorm(100), b = rnorm(100,0,.2))
plot(mydata$a, mydata$b) # works just fine
scatter_plot <- function(ds) { # function I'm trying to create
ifelse(exists(deparse(quote(ds))),
function(x,y) plot(ds$x, ds$y),
sprintf("The dataset %s does not exist.", ds))
}
scatter_plot(mydata)(a, b) # not working
Here's the error I'm getting:
Error in rep(yes, length.out = length(ans)) :
attempt to replicate an object of type 'closure'
I tried several other versions, but they all give me the same error. What am I doing wrong?
EDIT: I realize the code is not too practical. My goal is to understand functional programming better. I wrote a similar macro in SAS, and I was just trying to write its counterpart in R, but I'm failing. I just picked this as an example. I think it's a pretty simple example and yet it's not working.
There are a few small issues. ifelse is a vectorized function, but you just need a simple if. In fact, you don't really need an else -- you could just throw an error immediately if the data set does not exist. Note that your error message is not using the name of the object, so it will create its own error.
You are passing a and b instead of "a" and "b". Instead of the ds$x syntax, you should use the ds[[x]] syntax when you are programming (fortunes::fortune(312)). If that's the way you want to call the function, then you'll have to deparse those arguments as well. Finally, I think you want deparse(substitute()) instead of deparse(quote())
scatter_plot <- function(ds) {
ds.name <- deparse(substitute(ds))
if (!exists(ds.name))
stop(sprintf("The dataset %s does not exist.", ds.name))
function(x, y) {
x <- deparse(substitute(x))
y <- deparse(substitute(y))
plot(ds[[x]], ds[[y]])
}
}
scatter_plot(mydata)(a, b)
I'm using boot to bootstrap an optimization function in order to estimate standard errors. Unfortunately, on rare occasions the optimization function returns an error which stops the boot function. The error's are not critical to estimation and i would like to skip that iteration and continue to the next.
I have tried to find a solution with try and tryCatch but haven't been able to use either correctly. When wrapping the optimization function within statistici have managed to skip the errors. However, this results in the number of estimations within boot being less than the initial number of iterations and returning an error.
A basic example of my code is below
Any help is appreciated,
Thanks
bootfun = function(bootdata, i, d, C1) {
C1 = cov (bootdata[i])
ans = constrOptim(...) #This function returns an error
return(ans$par [d])
}
bootres = boot(bootdata, statistic = bootfun, 500)
EDIT: I have managed to find an acceptable solution to my problem. However, if a function gives errors often this may not be acceptable as each error replaces a bootstrap replication with NA.
bootfun = function(bootdata, i, d, C1) {
C1 = cov(bootresid[i])
tryCatch({
ans = constrOptim(...)
return(ans$par[1:18] [d]) },
error=function(err) {rep(NA,18)} )
}
This is not an answer with your specific code, but a more general demonstration of tryCatch for the situation you describe. If you want to simply remove entries that cause errors, have the function return nothing on error and then remove NULL values from the results:
testfun <- function(i) {
tryCatch({
d <- rbinom(1,1,.3) # generate an error 30% of the time
if(d==1)
error("test stop")
else
return(1:10) # return your actual values
},
error = function(err) {return()} # return NULL on error
)
}
x <- sapply(1:20, FUN=testfun) # run demo 20 times
x <- x[-(which(sapply(x,is.null),arr.ind=TRUE))]
# when errors happen, x is shorter than 20
The final line removes NULL entries from the list (based on this: https://stackoverflow.com/a/3336726/2338862).