What I'm trying to achieve
I'm trying to write my own 'impute' function in R with a tryCatch statement which:
1. outputs a warning/error message containing the function name so I can debug easier.
2. Raises a warning if the function runs ok but doesn't impute all the missing values.
ImputeVariables <- function(impute.var, impute.values,
filter.var){
# function to impute values.
# impute.var = variables with NAs
# impute.values = the missing value(s) to replace NAs, value labesl are levels
# filter.var = the variables to filter on.
# filter.levels = the categories of filter.var
tryCatch({
filter.levels <- names(impute.values)
# Validation
stopifnot(class(impute.var) == class(impute.values),
length(impute.values) > 0,
sum(is.na(impute.values)) == 0)
# Impute values
for(level in filter.levels){
impute.var[which(filter.var == level & is.na(impute.var))] <-
impute.values[level]
}
# Check if all NAs removed. Throw warning if not.
if(sum(is.na(impute.var)) > 0){
warning("Not all NAs removed")
}
# Return values
return(impute.var)
},
error = function(err) print(paste0("ImputeValues: ",err)),
warning = function(war) print(paste0("ImputeValues: ",war))
)
}
impute.var and filter.var are vectors taken from a data.frame (they are vectors of Ages and Titles (e.g. 'Mr', 'Mrs')
impute.values is a vector of the same type as impute.var but with labels taken from filter.var (i.e. is of the form c('Mr' = 30, 'Mrs' = 25...))
The problem
To check if my validation was working I supplied the function with a named vector of NAs, thusly:
ages <- c(34, 22, NA, 17, 38, NA)
titles <- c("Mr", "Mr", "Mr", "Mrs", "Mrs", "Mrs")
ages.values <- c("Mr" = NA, "Mrs" = NA)
ages.new <- ImputeVariables(ages, ages.values, titles)
print(ages.new)
But it outputs the following:
"ImputeValues: Error: class(impute.var) == class(impute.values) is not TRUE\n"
"ImputeValues: Error: class(impute.var) == class(impute.values) is not TRUE\n"
The two lines are due to the function printing the ages.new vector and the following print statement printing ages.new (why?)
If I comment out the validation (the stopifnot function) then I just get:
"ImputeValues: simpleWarning in doTryCatch(return(expr), name, parentenv, handler): Not all NAs removed\n"
What I'm asking
Why does the tryCatch block behave this way?
Is my validation and error handling strategy optimal (obviously without the bug)?
Many thanks for your time.
Rob
Thanks Oliver.
The working code is now:
ImputeVariables <- function(impute.var, impute.values,
filter.var){
# function to impute values.
# impute.var = variables with NAs
# impute.values = the missing value(s) to replace NAs, value labesl are levels
# filter.var = the variables to filter on.
# filter.levels = the categories of filter.var
tryCatch({
filter.levels <- names(impute.values)
# Validation
stopifnot(class(impute.var) == class(impute.values),
length(impute.values) > 0,
sum(is.na(impute.values)) == 0)
# Impute values
for(level in filter.levels){
impute.var[which(filter.var == level & is.na(impute.var))] <-
impute.values[level]
}
# Check if all NAs removed. Throw warning if not.
if(sum(is.na(impute.var)) > 0){
warning("Not all NAs removed")
}
# Return values
return(impute.var)
},
error = function(err) stop(paste0("ImputeValues: ",err)),
warning = function(war) {
message(paste0("ImputeValues: ",war))
return(impute.var)}
)
}
This is essentially two different problems. The first problem is that print statements within a function do not print to the terminal, they print to the scope of the function. As an example:
> foo <- function(){
print("bar")
}
> foo()
[1] "bar"
It didn't print "bar" to your screen, it printed it to the function scope and then returned it. The reason it returned it was that it was the last value printed to the function scope, and so (lacking an explicit return() call) is the best candidate for what to return.
So, your code is (in sequence):
Throwing an error;
Not treating that error normally, but instead passing it into tryCatch's error handler, where it is printed;
Because it is the last thing printed within the function scope, since the return() statement is never hit due to the error, treating it as the return value from the function.
If you really want to continue processing the input values even if the stopifnot() conditions are met, you don't want a stopifnot(): however you structure that it's likely to prevent the return() call from running and cause weirdness. What I'd suggest is instead moving the conditional checks currently in stopifnot() outside the tryCatch, and sticking them in a series of if() statements that throw warnings (not errors) if they don't match up. tryCatch isn't really necessary in this situation.
Related
I try to fix dates (years) using a function
change_century <- function(x){
a <- year(x)
ifelse(test = a >2020,yes = year(x) <- (year(x)-100),no = year(x) <- a)
return(x)
}
The function works for specific row or using a loop for one column (here date of birth)
for (i in c(1:nrow(Df))){
Df_recode$DOB[i] <- change_century(Df$DOB[i])
}
Then I try to use mutate/across
Df_recode <- Df %>% mutate(across(list_variable_date,~change_century(.)))
It does not work. Is there something I am getting wrong? thank you !
Try:
change_century <- function(x){
a <- year(x)
newx <- ifelse(test = a > 2020, yes = a - 100, no = a)
return(newx)
}
(Frankly, the use of newx as a temporary storage and then returning it was done that way solely to introduce minimal changes in your code. In general, in this case one does not need return, in fact theoretically it adds an unnecessary function to the evaluation stack. I would tend to have two lines in that function: a <- year(x) and ifelse(..), without assignment. The default behavior in R is to return the value of the last expression, which in my case would be the results of ifelse, which is what we want. Assigning it to newx and then return(newx) or even just newx as the last expression has exactly the same effect.)
Rationale
ifelse cannot have variable assignment within it. That's not to say that is is a syntax error (it is not), but that it is counter to its intent. You are asking the function to go through each condition found in test=, and return a value based on it. Regardless of the condition, both yes= and no= are evaluated completely, and then ifelse joins them together as needed.
For demonstration,
ifelse(test = c(TRUE, FALSE, TRUE), yes = 1:3, no = 11:13)
The return value is something like:
c(
if (test[1]) yes[1] else no[1],
if (test[2]) yes[2] else no[2],
if (test[3]) yes[3] else no[3]
)
# c(1, 12, 3)
To capture the results of the zipped-together yeses and nos c(1, 12, 3), one must capture the return value from ifelse itself, not inside of the call to ifelse.
Another point that may be relevant: ifelse(cond, yes, now) is not at all a shortcut for if (cond) { yes } else { no }. Some key differences:
in if, the cond must always be exactly length 1, no more, no less.
In R < 4.2, length 0 returns an error argument is of length zero (see ref), while length 2 or more produces a warning the condition has length > 1 and only the first element will be used (see ref1, ref2).
In R >= 4.2, both conditions (should) produce an error (no warnings).
ifelse is intended to be vectorized, so the cond can be any length. yes= and no= should either be the same length or length 1 (recycling is in effect here); cond= should really be the same length as the longer of yes= and no=.
if does short-circuiting, meaning that if (TRUE || stop("quux")) 1 will never attempt to evaluate stop. This can be very useful when one condition will fail (logically or with a literal error) if attempted on a NULL object, such as if (!is.null(quux) && quux > 5) ....
Conversely, ifelse always evaluates all three of cond=, yes=, and no=, and all values in each, there is no short-circuiting.
Disclaimer: I'm trying to make someone else's code more user friendly and I'm pretty new to R, so if you see me using mismatched coding conventions, that's why.
I'm trying to write my script's status to the terminal as it goes through a list of files, checking to make sure they are valid before using them as input to a model. Therefore, I need to pass a variable (filename) and status ("looks good") to a function that will concatenate them and write them to the terminal in green. When I test the function like so, it works:
say <- function(words){
cat(green(words))
}
hi <- "Hello"
say(c(hi, "World!"))
# Hello World!
But when I call say() from within the ifelse() that I need to call it from, I get an error I cannot decipher:
FileList = as.data.frame(list.files(path = "./R_ModelInputs_SecondaryData",
pattern = ".tif$", all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE))
names(FileList)=c("FileName")
for(NAME in FileList$FilName){
data=raster(paste("./R_ModelInputs_SecondaryData/",NAME,sep=""))
ifelse(nrow(data)!=1737,
say(c(NAME, "has a problem"),
ifelse(ncol(data)!=4008,
say(c(NAME, "has a problem")),
say(c(NAME, "looks good"))
))
}
# Goode_FireBrightness_80_10kMax_20002015.tif looks goodError in ans[!test & ok] <- rep(no, length.out = length(ans))[!test &
# :
# replacement has length zero
# Calls: ifelse -> ifelse
# In addition: Warning message:
# In rep(no, length.out = length(ans)) :
# 'x' is NULL so the result will be NULL
# Execution halted
I've tried googling this error but all I've come up with is that it seems to be from the ifelse() call. This doesn't make sense to me because the fact that it's writing the "looks good" part means that it's successfully navigated both ifelse()'s. I inserted a print() statement at the top of the for loop to ensure that it wasn't throwing the error when it tried to evaluate the ifelse() in the second iteration of my for loop, but that's not what it is, as that print() statement only printed once.
ifelse() should be used when you want to return a vector. It expects to return a vector the same length as your first parameter. Your say() function is returning the value from cat() which just returns NULL. There's no way to make NULL the same length as your test condition. This is throwing the ifelse off.
ifelse should not be used for control flow logic. You should be using a standard if/else here for conditional code execution.
Use
if(nrow(data)!=1737) {
say(c(NAME, "has a problem")
} else if (ncol(data)!=4008) {
say(c(NAME, "has a problem"))
} else {
say(c(NAME, "looks good"))
}
Or do this in a more R-y way like this
ff <- list.files(path = ".", pattern = "\\.tif$", full=TRUE)
r <- lapply(ff, raster)
x <- t(sapply(r, dim))
good <- x[,1] == 1737 & x[,2] == 4008
# good
basename(ff)[good]
# problem
basename(ff)[!good]
Is there an option in R that prevents it from returning values from field names with the same beginning if the one you asked for does not exist? This is causing me a fair amount of problems as my fields may or may not be present, and they have similar root names.
d <- data.frame(areallylongname = -99, y = 2, z = 0)
# How do I stop this returning a value
d$a
#[1] -99
# it should return NULL like this
d$jjj
# NULL
You can switch to bracket notation, which requires exact column names:
> d['a']
Error in `[.data.frame`(d, "a") : undefined columns selected
> d['y']
y
1 2
If you want to avoid partial matching and return an error, the following could work.
However, this will make all other warnings to errors as well.
options(warnPartialMatchDollar = TRUE, warn = 2)
# test
d$a
Error in $.data.frame(d, a) :
(converted from warning) Partial match of 'a' to 'areallylongname' in data frame
Using penalizedSVM R package, I am trying to do feature selection. There is a list of several data.frames called trainingdata.
trainingdata <-lapply(trainingdata, function(data)
{
levels(data$label) <- c(-1, 1)
train_x<-data[, -1]
train_x<-data.matrix(train_x)
trainy<-data[, 1]
print(which(!is.finite(train_x)))
scad.fix<-svm.fs(train_x, y=trainy, fs.method="scad",
cross.outer=0, grid.search="discrete",
lambda1.set=lambda1.scad, parms.coding="none",
show="none", maxIter=1000, inner.val.method="cv",
cross.inner=5, seed=seed, verbose=FALSE)
data <- data[c(1, scad.fix$model$xind)]
data
})
Some iterations go well but then on one data.frame I am getting the following error message.
[1] "feature selection method is scad"
Error in svd(m, nv = 0, nu = 0) : infinite or missing values in 'x'
Calls: lapply ... scadsvc -> .calc.mult.inv_Q_mat2 -> rank.condition -> svd
Using the following call, I am also checking whether x is really infinite but the call returns 0 for all preceding and the current data.frame where the error has occurred.
print(which(!is.finite(train_x)))
Is there any other way to check for infinite values? What else could be done to rectify this error? Is there any way that one can determine the index of the current data.frame being processed within lapply?
For the first question , infinite or missing values in 'x' suggests that you change your condition to something like .
idx <- is.na(train_x) | is.infinite(train_x)
You can assign 0 for example to theses values.
train_x[idx] <- 0
For the second question , concerning how to get the names of current data.frame within lapply you can loop over the names of data.farmes, and do something like this :
lapply(names(trainingdata), function(data){ data <- trainingdata[data]....}
For example:
ll <- list(f=1,c=2)
> lapply(names(list(f=1,c=2)), function(x) data <- ll[x])
[[1]]
[[1]]$f
[1] 1
[[2]]
[[2]]$c
[1] 2
EDIT
You can use tryCatch before this line scad.fix<-svm.fs
tryCatch(
scad.fix<-svm.fs(....)
, error = function(e) e)
})
for example, here I test it on this list, the code continues to be executing to the end of list ,even there is a NA in the list.
lapply(list(1,NA,2), function(x){
tryCatch(
if (any(!is.finite(x)))
stop("infinite or missing values in 'x'")
, error = function(e) e)
})
I am using stopifnot and I understand it just returns the first value that was not TRUE. I f that is some freaky dynamic expression someone who is not into the custom function cannot really make something out of that. So I would love to add a custom error message. Any suggestions?
Error: length(unique(nchar(check))) == 1 is not TRUE
Basically states that the elements of the vector check do not have the same length.
Is there a way of saying: Error: Elements of your input vector do not have the same length!?
Use stop and an if statement:
if(length(unique(nchar(check))) != 1)
stop("Error: Elements of your input vector do not have the same length!")
Just remember that stopifnot has the convenience of stating the negative, so your condition in the if needs to be the negation of your stop condition.
This is what the error message looks like:
> check = c("x", "xx", "xxx")
> if(length(unique(nchar(check))) != 1)
+ stop("Error: Elements of your input vector do not have the same length!")
Error in eval(expr, envir, enclos) :
Error: Elements of your input vector do not have the same length!
A custom message can be added as a label to your expression:
stopifnot("Elements of your input vector do not have the same length!" =
length(unique(nchar(check))) == 1)
# Error: Elements of your input vector do not have the same length!
The assertive and assertthat packages have more readable check functions.
library(assertthat)
assert_that(length(unique(nchar(check))) == 1)
## Error: length(unique(nchar(check))) == 1 are not all true.
library(assertive)
assert_is_scalar(unique(nchar(check)))
## Error: unique(nchar(check)) does not have length one.
if(!is_scalar(unique(nchar(check))))
{
stop("Elements of check have different numbers of characters.")
}
## Error: Elements of check have different numbers of characters.
Or you could package it up.
assert <- function (expr, error) {
if (! expr) stop(error, call. = FALSE)
}
So you have:
> check = c("x", "xx", "xxx")
> assert(length(unique(nchar(check))) == 1, "Elements of your input vector do not have the same length!")
Error: Elements of your input vector do not have the same length!
What about embedding the stopifnot into tryCatch and then recasting the exception with stop using customized message?
Something like:
tryCatch(stopifnot(...,error=stop("Your customized error message"))
Unlike some other solutions this does not require additional packages. Compared to using if statement combined with stop you retain the performance advantages of stopifnot, when you use new R versions. Since R version 3.5.0 stopifnot evaluates expressions sequentially and stops on first failure.
I would recommend you check out Hadley's testthat package. It allows for intuitive testing: the names of the functions are great and the way you write them is like a sentence -- "I expect that length(unique(nchar(check))) is [exactly|approximately] 1". The errors produced are informative.
See here:
http://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf
In your case,
> library(testthat)
> check = c("x", "xx", "xxx")
> expect_that(length(unique(nchar(check))), equals(1))
Error: length(unique(nchar(check))) not equal to 1
Mean relative difference: 2
Also note that you don't have the problem that #Andrie referenced with sometimes having to think about double negatives with stopifnot. I know it seems simple, but it caused me many headaches!
The answers already provided are quite good, and mine is just an addition to that collection. For some people it could be more convenient to use one-liner in form of the following function:
stopifnotinform <- function(..., message = "err") {
args <- list(...)
if (length(args) == 0) {
return(NULL)
}
for (i in 1:length(args)) {
result <- args[[i]]
if (is.atomic(result) && result == FALSE) {
stop(message)
}
}
}
# throws an error
stopifnotinform(is.integer(1L), is.integer(2), message = "Some number(s) provided is not an integer")
# continues with execution
stopifnotinform(is.integer(1L), is.integer(2L), message = "Some number(s) provided is not an integer")
Bear in mind that this solution provides you with only one (common) error message for all parameters in ....
Try this:
same_length <- FALSE
stopifnot("Elements of your input vector do not have the same length!" = same_length)
#> Error : Elements of your input vector do not have the same length!