I am facing with a problem when analysing errors in sapply in R.
Suppose I have a matrix as below,
B <- matrix(
c(2, 4, 3, 1, 5, 7),
nrow=3,
ncol=2)
Just to create some errors, I'am indexing out of the bounds of the matrix. (i in 1:5 part)
for (i in 1:5) {
x <- B[1,i]^2
if(i==1) {
result <- x
}else{
result <- rbind(result,x)
}
}
Of course it gives an error like this.
Error in B[1, i] : subscript out of bounds
However, it is not so hard to find at what step it gives an error.Since, if I call i;
> i
[1] 3
I can easily understand at what step I have faced with the error.In this case it is happening when i=3.
However, to take advantage of the speed of the sapply function in R (since the loops are not recommended because of the lack of speed) I used it as below;
sapply(1:5 ,function(j) {
y <- B[1,j]^2
})
Not surprisingly it gives the same error.
Error in B[1, j] : subscript out of bounds
However, now I cannot see at what step I failed. Since neither j nor y is recorded!
> j
Error: object 'j' not found
> y
Error: object 'y' not found
What Can you suggest about that? I know it is a simple example. But the things I am dealing with in reality are more complex and it becomes harder to find the error step.
Thanks in advance!
If you use RStudio, the easiest way is to activate in the Menu: Debug > On Error > Break in code.
This will open a browser on error and you will be able to see the value of j.
If you don't use RStudio, you can set options(error = recover) which will also open a browser on error. (In your specific case choose frame 3 and you will be able to see the value of j)
Related
in recent months i realized a very annoying behaviour on both windows and unix R with R-Studio installations.
After an error, R is auto-executing every code it finds following a line producing an error(here: "unexpected symbol"). Here is an example code
vec1 <- c("Hallo", "World"
vec2 <- c(1,2,3)
print(vec2)
print(vec1)
In the first line:
vec1 <- c("Hallo", "World"
R is missing a closing ")". After erronously initializing it, this happens:
vec1 <- c("Hallo", "World"
+
+ vec2 <- c(1,2,3)
Error: unexpected symbol in:
"
vec2"
>
> print(vec2)
Error in print(vec2) : object 'vec2' not found
>
> print(vec1)
Error in print(vec1) : object 'vec1' not found
>
R apparently does try to look for a closing bracket, finds one, gives the expected "Unexpected symbol"-error, but instead of stopping it does try to execute the next line (and everything else following) as well.
Is this R- or R-Studio related and how can i stop that?
edit:
I should clarify what the problem is, based on the comments. This behaviour is not intended, nor did i plan to include faulty lines to my code!
Sometimes one just forgets to add a bracket, or comma, or whatever, but still initializes such a line. Then - at least for me - R has this very annoying behaviour to then run through the entire code. Here is a real life example:
Somewhat later in the same situation, model objects were written over, which was very annoying.
So again, i dont want you to correct the code, i would like to learn why R behaves as descrined and how to stop it.
It sounds like you're expecting R to stop when it finds an error. After all, that's what traditional compiled languages like C and Java do. But R isn't a compiled language. Each line of code is interpreted in order. This is an inherent part of R and doesn't have anything to do with RStudio. In your example, it's really hard for R to figure out where the call to c() ends because you're missing the close parenthesis.
One RStudio feature that I find useful for preventing this specific type of error is the auto-formatter (CTRL-SHIFT-A). When formatting the code sample you provide, it becomes obvious that something's not right when you look at the indentation.
The code changes from this...
vec1 <- c("Hallo", "World"
vec2 <- c(1, 2, 3)
print(vec2)
print(vec1)
To this..
vec1 <- c("Hallo", "World"
vec2 <- c(1, 2, 3)
print(vec2)
print(vec1)
The fact that the bottom three lines are indented so far to the right gives me a warning that I might have missed a closing parenthesis.
Generally
If your question is about broader error handling, you can often use a function to prevent R from continuing when it encounters an error. This won't work with your example since the parentheses are wrong, but it gives an answer to the broader question of when you can get R to stop upon encountering a problem.
Let's generate an error.
stop("This is an error")
print("The code keeps running!")
Notice how the second line runs after the error. Now let's wrap that code in a function.
demo_function <- function() {
stop("This is an error")
print("The code keeps running!")
}
demo_function()
The function throws an error and halts execution.
It's a good idea to put high-risk code inside of a function for exactly this reason. With the example you provided, R will throw an error as soon as you try to define the function, which might help you catch an error earlier in the development process.
As per the customer support of R-Studio, this behaviour is related to R-Studio and can be stopped by unticking "Execute all lines in a statement" under Global Options -> Editing -> Execution. Sorry for bothering.
You have to add some commas (' , ') and some parenthesis to your syntax, try with:
> vec1 <- c("Hello", "World")
> vec2 <- c(1,2,3)
> print(vec2)
> print(vec1)
It should work.
What does this error mean?
Error in `[<-`(`*tmp*`, i, 1, value = 0.0225315561703551) :
subscript out of bounds
This error code means you are trying to index your variable outside its range. Example so if you had an array x <- c(1,2,3) and you were wanted x[4], or tried to call x[3.14].
Check your code, To debug in Rstudio, I put browser() statements in where I want code to stop and then step through the process.
Check how you are indexing any loops. I noticed it is complaining about i
Update your package you are using for calc. You can sometimes run into gremlins that have been fixed in later versions.
This may be helpful: Subscript out of bounds - general definition and solution?
I have three dataframes, for which I am trying to find a cell-by-cell mean.
r1<-raster('a.tif')
r2<-raster('b.tif')
r3<-raster('c.tif')
However, doing this is giving me the following error
q<-mean(r1,r2,r3)
or
q<-(r1+r2+r3)/3
Error
Error in .local(.Object, ...) : options(warn) not set
Warning message:
closing unused connection 4 .....
That is a weird error message. Often this type of situation goes away if you restart R without loading an old workspace (which may be stale). If that is what is going on use unlink(".RData"), exit R without saving and start again.
To answer your aside question, yes it is much easier to stack them. E.g.
f <- list.files(pattern='tif$')
s <- stack(f)
x <- sum(s)
I often have the situation like this:
result <- lapply(1:length(mylist), function(x){
doSomething(x)
})
However, if it fails, I have no idea which element in the list failed on doSomething().
So then I end up recoding it as a for loop:
for(i in 1: length(mylist)){
doSomething(mylist[[i]])
}
I can then see the last value of i and what happened. There must be a better way to do this right?? Thanks!
Notice how the error includes 5L
> lapply(1:10, function(i) if (i == 5) stop("oops"))
Error in FUN(1:10[[5L]], ...) : oops
indicating that the 5th iteration failed.
One simple option is to run the code:
options( error=recover )
before running lapply (see ?recover for details).
Then when/if an error occurs you will instantly be put into the recover mode that will let you examine which function you are in, what arguments were passed to that function, etc. so you can see which step you are on and what the possible reason for the error is.
You can also use try or tryCatch as mentioned in the comments to either skip elements that produce an error or print out information on where they occur.
I'm using the snow package in R to execute a function on a SOCK cluster with multiple machines(3) running on Linux OS. I tried to run the code with both parLapply and clusterApply.
In case of any error at the worker level, the results of the worker nodes are not returned properly to master making it very hard to debug. I'm currently logging every heartbeat of the worker nodes independently using futile.logger. It seems as if the results are properly computed. But when I tried to print the result at the master node (After receiving the output from workers) I get an error which says, Error in checkForRemoteErrors(val): 8 nodes produced errors; first error: missing value where TRUE/FALSE needed.
Is there any way to debug the results of the workers more deeply?
The checkForRemoteErrors function is called by parLapply and clusterApply to check for task errors, and it will throw an error if any of the tasks failed. Unfortunately, although it displays the error message, it doesn't provide any information about what worker code caused the error. But if you modify your worker/task function to catch errors, you can retain some extra information that may be helpful in determining where the error occurred.
For example, here's a simple snow program that fails. Note that it uses outfile='' when creating the cluster so that output from the program is displayed, which by itself is a very useful debugging technique:
library(snow)
cl <- makeSOCKcluster(2, outfile='')
problem <- function(i) {
if (NA)
j <- 999
else
j <- i
2 * j
}
r <- parLapply(cl, 1:2, problem)
When you execute this, you see the error message from checkForRemoteErrors and some other messages, but nothing that tells you that the if statement caused the error. To catch errors when calling problem, we define workerfun:
workerfun <- function(i) {
tryCatch({
problem(i)
},
error=function(e) {
print(e)
stop(e)
})
}
Now we execute workerfun with parLapply instead of problem, first exporting problem to the workers:
clusterExport(cl, c('problem'))
r <- parLapply(cl, 1:2, workerfun)
Among the other messages, we now see
<simpleError in if (NA) j <- 999 else j <- i: missing value where TRUE/FALSE needed>
which includes the actual if statement that generated the error. Of course, it doesn't tell you the file name and line number of the expression, but it's often enough to let you solve the problem.
check the range of your observations. how the observation varies. I have noticed that when there are lots of decimal places 4, 5,6 , it throws glm.nb off. To solve this i just round the observations to 2 decimal places.