I'm running a complicated function (multiple imputation with Amelia) over a list of datasets. Every so often, a dataset will trigger a long list of warnings that eventually result in an error. I would like R to give up as soon as the first warning is issued and move on to the next dataset. Here is a minimal working example:
df.list <- list(
data.frame(1:4),
data.frame(-1, -2, -4),
data.frame(10:15)
)
for(df in df.list){
ans <- sum(sapply(df, sqrt))
print(ans)
}
The script issues three warnings about NaNs and then prints:
[1] 6.146264
[1] NaN
[1] 21.1632
I would like it to produce 1 message input 2 failed and then output only the valid results:
[1] 6.146264
[1] 21.1632
(The function I'm actually running, amelia(), issues warnings for 10 minutes before finally throwing an error, so I would like to cut it off at the first warning.)
What about this: the sqrt function cannot return -1 so I make tryCatch return -1 when a warning occurs. The nested lapply is required to loop through the list elements to calculate the square root, returned as a list, and then to loop through those list elements to sum. The -1 value in the result indicates a failed calculation and I can test that.
result <- unlist(
lapply(
lapply(df.list, function(x) tryCatch(sqrt(x), warning = function(w) -1)), sum))
failed <- which(result == -1)
result <- result[-failed]
print(paste0("input ", failed, " failed"))
result
> print(paste0("input ", failed, " failed"))
[1] "input 2 failed"
> result
[1] 6.146264 21.163196
Related
In R, how can I determine whether a function call results in a warning?
That is, after calling the function I would like to know whether that instance of the call yielded a warning.
If you want to use the try constructs, you can set the options for warn. See also ?options. Better is to use tryCatch() :
x <- function(i){
if (i < 10) warning("A warning")
i
}
tt <- tryCatch(x(5),error=function(e) e, warning=function(w) w)
tt2 <- tryCatch(x(15),error=function(e) e, warning=function(w) w)
tt
## <simpleWarning in x(5): A warning>
tt2
## [1] 15
if(is(tt,"warning")) print("KOOKOO")
## [1] "KOOKOO"
if(is(tt2,"warning")) print("KOOKOO")
To get both the result and the warning :
tryCatch(x(5),warning=function(w) return(list(x(5),w)))
## [[1]]
## [1] 5
##
## [[2]]
## <simpleWarning in x(5): A warning>
Using try
op <- options(warn=2)
tt <- try(x())
ifelse(is(tt,"try-error"),"There was a warning or an error","OK")
options(op)
On the R-help mailing list (see http://tolstoy.newcastle.edu.au/R/help/04/06/0217.html), Luke Tierney wrote:
"If you want to write a function that computes a value and collects all
warning you could do it like this:
withWarnings <- function(expr) {
myWarnings <- NULL
wHandler <- function(w) {
myWarnings <<- c(myWarnings, list(w))
invokeRestart("muffleWarning")
}
val <- withCallingHandlers(expr, warning = wHandler)
list(value = val, warnings = myWarnings)
}
2019 update
You can you use 'quietly' from the purrr package, which returns a list of output, result, warning and error. You can then extract each element by name. For instance, if you had a list, which you want to map a function over, and find the elements which returned a warning you could do
library(purrr)
library(lubridate)
datelist <- list(a = "12/12/2002", b = "12-12-2003", c = "24-03-2005")
# get all the everything
quiet_list <- map(datelist, quietly(mdy))
# find the elements which produced warnings
quiet_list %>% map("warnings") %>% keep(~ !is.null(.))
# or
quiet_list %>% keep(~ length(.$warnings) != 0)
For this example it's quite trivial, but for a long list of dataframes where the NAs might be hard to spot, this is quite useful.
here is an example:
testit <- function() warning("testit") # function that generates warning.
assign("last.warning", NULL, envir = baseenv()) # clear the previous warning
testit() # run it
if(length(warnings())>0){ # or !is.null(warnings())
print("something happened")
}
maybe this is somehow indirect, but i don't know the more straightforward way.
For a simple TRUE/FALSE return on whether a given operation results in a warning (or error), you could use the is.error function from the berryFunctions package, after first setting options(warn = 2) so that warnings are converted to errors.
E.g.,
options(warn = 2)
berryFunctions::is.error(as.numeric("x")) # TRUE
berryFunctions::is.error(as.numeric("3")) # FALSE
If you want to limit the option change to the use of this function, you could just create a new function as follows.
is.warningorerror <- function(x) {
op <- options()
on.exit(options(op))
options(warn = 2)
berryFunctions::is.error(x)
}
is.warningorerror(as.numeric("x")) # TRUE
options("warn") # still 0 (default)
I personally use the old good sink redirected into a text connection:
# create a new text connection directed into a variable called 'messages'
con <- textConnection("messages","w")
# sink all messages (i.e. warnings and errors) into that connection
sink(con,type = "message")
# a sample warning-generating function
test.fun <- function() {
warning("Your warning.")
return("Regular output.")
}
output <- test.fun()
# close the sink
sink(type="message")
# close the connection
close(con)
# if the word 'Warning' appears in messages than there has been a warning
warns <- paste(messages,collapse=" ")
if(grepl("Warning",warns)) {
print(warns)
}
# [1] "Warning message: In test.fun() : Your warning."
print(output)
# [1] "Regular output."
Possibly more straightforward and cleaner than the other suggested solutions.
I am passing some data to a simple code block in R which counts the null values and then performs an ARIMA time series imputation. I have written a very simple 'if' statement which counts the null values in the time series, and if they are less than a certain amount, ignores that column and moves on to the next one (as the ARIMA imputation requires a certain amount of non-null data to work, otherwise it returns an error). Counting the nulls seems to work fine, but the if statement is behaving very strangely and not working. I included a print statement to count the nulls inside and outside the if statement, but the if statement is passing the code to the loop when the if statement is not fulfilled. Here is the code and the output:
stations <- c('BX1', 'BX2', 'BG3') # each station has a different data file
pollutants <- c('nox','no2','pm10','pm25') # each station contains data on a number of pollutants
for (s in stations) {
print(paste('starting imputation for station ', s, sep=" "))
s_result <- read.csv(paste("/path/to/file", s, "_rescaled.csv", sep=""))
for (p in pollutants) {
ts = c()
pcol = paste0(p,"_iqr",sep="") # find the right column
ts = s_result[[pcol]] # get the time series from the column
print(pcol) # check which pollutant we're working on
print(length(ts)) # test the length of the time series
print(sum(is.na(ts))) # test the number of nulls in the time series
if (sum(is.na(ts) != length(ts))) { # if the time series is not completely null
print(sum(is.na(ts))) # check the length of the time series again for testing
usermodel <- arima(ts, order = c(10, 1, 0))$model # calculate the arima
p_result <- na_kalman(ts, model = usermodel, maxgap = 24) # calculate the arima
s_result <- cbind(s_result,p_result) # add the computed column to the dataframe
names(s_result)[names(s_result) == "p_result"] <- paste0(p,"_imputed",sep ="")
} else { # otherwise add a null column
p_result <- c(NA, length=length(ts))
s_result <- cbind(s_result,p_result) # enter a null column
names(s_result)[names(s_result) == "p_result"] <- paste0(p,"_imputed",sep ="")
}
}
filename = paste0("/path/to/file", s, "_imputed_test.csv", sep="")
write.csv(s_result, filename, row.names = TRUE)
print(paste('completed imputation for station ', s, sep=" "))
}
The problem is, that this if statement is not working correctly as it is passing data to the arima imputation inside the if statement even when the number of nulls is equal to the length of the time series. Here's the output:
[1] "starting imputation for station BG1"
[1] "nox_iqr"
[1] 17520
[1] 4660
[1] 4660
[1] "no2_iqr"
[1] 17520
[1] 4664
[1] 4664
[1] "pm10_iqr"
[1] 17520
[1] 17520
[1] 17520
Error in arima(ts, order = c(10, 1, 0)) : 'x' must be numeric
Clearly something is wrong, as for the pm10 pollutant, there are 17520 nulls, the same as the length of the time series. Therefore the if statement should not run the line counting the number of nulls again inside the 'if' statement, as this line of code should be bypassed. ie. for the time series relating to column pm10_iqr, the number of nulls is 17520, the length of the time series is 17520, and this would cause the arima to fail - hence the if statement should skip this line. But it does not do this.
Where am I going wrong please? This should be very simple but it does not make any sense! I don't write alot of R code, usually Python. Thanks for your help!
sum(is.na(ts) != length(ts))
should probably be
sum(is.na(ts)) != length(ts))
Explanation what went wrong: In R, any number other than 0 evaluates to TRUE. For example:
if (0) {print("evaluated to TRUE")} else {print("evaluated to FALSE")} and
Returns:
[1] "evaluated to FALSE"
and:
if (5) {print("evaluated to TRUE")} else {print("evaluated to FALSE")}
Returns:
[1] "evaluated to TRUE"
Additionally, R accepts booleans (TRUE, FALSE) as arguments to sum (and other arithmetic functions) and treats them in these cases as 1 (TRUE) and 0 (FALSE).
is.na(ts) != length(ts)
Evaluates to some vector of TRUEs and FALSEs
and
sum(is.na(ts) != length(ts))
happily sums them up ;)
That's why your code didn't raise any errors, because it was kind of working, just not doing what we meant it to do... these are my most feared errors ;)
When I type warnings() to console, I get back
Warning message:
In fread(my_directory, ... :
C function strtod() returned ERANGE for one or more fields. The first was string input '4.40589099726375E-309'. It was read using (double)strtold() as numeric
However when I type as.character(warnings()), I get:
[1] "fread(my_directory)"
My objective is to get the actual message displayed in warning() into a character string, so that I can pass it to the logwarn function in the logging package. Currently, I am doing logwarn(warnings(),logger="some_log_file.log") to record my warnings, but it gives the incorrect coercion to character that I displayed above.
Note that I can just use sink but I want to stick with logging package, so I require the ability to correct coerce to character.
This may not be the exact answer you're looking for, but I think it's worth a mention.
R has a global variable, last.warning, which holds just that, the last warning. Calling names on it will return the last warning as a character string. Here's a little example
First, purposely trigger a warning:
x <- 1:5
if(x == 1) "yes" else "no"
# [1] "yes"
# Warning message:
# In if (x == 1) "yes" else "no" :
# the condition has length > 1 and only the first element will be used
Look at the variable last.warning:
last.warning
# $`the condition has length > 1 and only the first element will be used`
# if (x == 1) "yes" else "no"
Now look at the names(last.warning). This returns the warning as a character string:
names(last.warning)
# [1] "the condition has length > 1 and only the first element will be used"
warnings() returns a list.
The list values are the language elements which produced the warning; that is what you are seeing with as.character().
The names of the list values are the warning messages. You can get those with names(warnings()).
Use a calling handler along with the 'restart' (see ?warning and ?withCallingHandlers) that warning() creates
f = function() { warning("oops"); 1 }
withCallingHandlers({
f()
}, warning=function(cond) {
txt <- conditionMessage(cond)
## do with txt what you will, e.g.,
## logwarn(txt, logger="some_log_file.log")
message("captured warning: ", txt, "; now continuing")
## signal that the warning has been handled
invokeRestart("muffleWarning")
})
I am running a for loop from (1:1700) in R, but I am loading different data in each iteration. But I am getting error in some iterations in between (may be because of corresponding data is missing).
I want to know if there is any way I could skip those particular iterations in which I get error and at least for loop should complete all the 1700 iterations skipping aforementioned error showing iterations.
I have to run a for loop, there is no other option.
Yoy can use tryCatch within your loop. here an example where I loop from 1 to 5 , and for some counter value I get an error ( i create it here using stop), I catch it and then I continue for other values of the counters.
for( i in 1:5) ## replace 5 by 1700
tryCatch({
if(i %in% c(2,5)) stop(e)
print(i) ## imagine you read a file here, or any more complicated process
}
,error = function(e) print(paste(i,'is error')))
[1] 1
[1] "2 is error"
[1] 3
[1] 4
[1] "5 is error"
I use try for such issues. It allows your loop to continue through the cycle of values without stopping at the error message.
Example
make data
set.seed(1)
dat <- vector(mode="list", 1800)
dat
tmp <- sample(1800, 900) # only some elements are filled with data
for(i in seq(tmp)){
dat[[tmp[i]]] <- rnorm(10)
}
dat
loop without try
#gives warning
res <- vector(mode="list", length(dat))
for(i in seq(dat)){
res[[i]] <- log(dat[[i]]) # warning given when trying to take the log of the NULL element
}
loop with try
#cycles through
res <- vector(mode="list", length(dat))
for(i in seq(dat)){
res[[i]] <- try(log(dat[[i]]), TRUE) # cycles through
}
Using penalizedSVM R package, I am trying to do feature selection. There is a list of several data.frames called trainingdata.
trainingdata <-lapply(trainingdata, function(data)
{
levels(data$label) <- c(-1, 1)
train_x<-data[, -1]
train_x<-data.matrix(train_x)
trainy<-data[, 1]
print(which(!is.finite(train_x)))
scad.fix<-svm.fs(train_x, y=trainy, fs.method="scad",
cross.outer=0, grid.search="discrete",
lambda1.set=lambda1.scad, parms.coding="none",
show="none", maxIter=1000, inner.val.method="cv",
cross.inner=5, seed=seed, verbose=FALSE)
data <- data[c(1, scad.fix$model$xind)]
data
})
Some iterations go well but then on one data.frame I am getting the following error message.
[1] "feature selection method is scad"
Error in svd(m, nv = 0, nu = 0) : infinite or missing values in 'x'
Calls: lapply ... scadsvc -> .calc.mult.inv_Q_mat2 -> rank.condition -> svd
Using the following call, I am also checking whether x is really infinite but the call returns 0 for all preceding and the current data.frame where the error has occurred.
print(which(!is.finite(train_x)))
Is there any other way to check for infinite values? What else could be done to rectify this error? Is there any way that one can determine the index of the current data.frame being processed within lapply?
For the first question , infinite or missing values in 'x' suggests that you change your condition to something like .
idx <- is.na(train_x) | is.infinite(train_x)
You can assign 0 for example to theses values.
train_x[idx] <- 0
For the second question , concerning how to get the names of current data.frame within lapply you can loop over the names of data.farmes, and do something like this :
lapply(names(trainingdata), function(data){ data <- trainingdata[data]....}
For example:
ll <- list(f=1,c=2)
> lapply(names(list(f=1,c=2)), function(x) data <- ll[x])
[[1]]
[[1]]$f
[1] 1
[[2]]
[[2]]$c
[1] 2
EDIT
You can use tryCatch before this line scad.fix<-svm.fs
tryCatch(
scad.fix<-svm.fs(....)
, error = function(e) e)
})
for example, here I test it on this list, the code continues to be executing to the end of list ,even there is a NA in the list.
lapply(list(1,NA,2), function(x){
tryCatch(
if (any(!is.finite(x)))
stop("infinite or missing values in 'x'")
, error = function(e) e)
})