I already read documentations and several other question about tryCatch(), however, I am not able to figure out the solution to my class of problem.
The task to solve is:
1)There is a for loop that goes from row 1 to nrow of the dataframe.
2)Execute some instruction
3)In the case of error that would stop the program, instead restart the loop cycle from the current iteration.
Example
for (i in 1:50000) {
...execute instructions...
}
What I am looking for is a solution that in the case of error at the iteration 30250, it restart the loop such that
for (i in 30250:50000) {
...execute instructions...
}
The practical example I am working on is the following:
library(RDSTK)
library(jsonlite)
DF <- (id = seq(1:400000), lat = rep(38.929840, 400000), long = rep( -77.062343, 400000)
for (i in 1 : nrow(DF) {
location <- NULL
bo <- 0
while (bo != 10) { #re-try the instruction max 10 times per row
location <- NULL
location <- try(location <- #try to gather the data from internet
coordinates2politics(DF$lat[i], DF$long[i]))
if (class(location) == "try-error") { #if not able to gather the data
Sys.sleep(runif(1,2,7)) #wait a random time before query again
print("reconntecting...")
bo <- bo+1
print(bo)
} else #if there is NO error
break #stop trying on this individual
}
location <- lapply(location, jsonlite::fromJSON)
location <- data.frame(location[[1]]$politics)
DF$start_country[i] <- location$name[1]
DF$start_region[i] <- location$name[2]
Sys.sleep(runif(1,2,7)) #sleep random seconds before starting the new row
}
N.B.: the try() is part of the "...execute instruction..."
What I am looking for is a tryCatch that when a critical error that stops the program happen, it instead restart the for loop from the current index "i".
This program would allow me to automatically iterate over 400000 rows and restart where it interrupted in the case of errors. This means that the program would be able to work totally independently by human.
I hope my question is clear, thank you very much.
This is better addressed using while rather than for:
i <- 1
while(i < 10000){
tryCatch(doStuff()) -> doStuffResults
if(class(doStuffResults) != "try-catch") i <- i + 1
{
Beware though if some cases will always fail doStuff: the loop will never terminate!
If the program will produce a fatal error and need a human restart, then the first line i <- 1 may be replaced (as long as you don't use i elsewhere in your code) with
if(!exists("i")) i <- 1
so that the value from the loop will persist and not be reset to one.
Related
I have a list of URLs (more than 4000) from a specific domain (pixilink.com) and what I want to do is to figure out if the provided domain is a picture or a video. To do this, I used the solutions provided here: How to write trycatch in R and Check whether a website provides photo or video based on a pattern in its URL and wrote the code shown below:
#Function to get the value of initial_mode from the URL
urlmode <- function(x){
mycontent <- readLines(x)
mypos <- grep("initial_mode = ", mycontent)
if(grepl("0", mycontent[mypos])){
return("picture")
} else if(grepl("tour", mycontent[mypos])){
return("video")
} else{
return(NA)
}
}
Also, in order to prevent having error for URLs that don't exist, I used the code below:
readUrl <- function(url) {
out <- tryCatch(
{
readLines(con=url, warn=FALSE)
return(1)
},
error=function(cond) {
return(NA)
},
warning=function(cond) {
return(NA)
},
finally={
message( url)
}
)
return(out)
}
Finally, I separated the list of URLs and pass it into the functions (here for instance, I used 1000 values from URL list) described above:
a <- subset(new_df, new_df$host=="www.pixilink.com")
vec <- a[['V']]
vec <- vec[1:1000] # only chose first 1000 rows
tt <- numeric(length(vec)) # checking validity of url
for (i in 1:length(vec)){
tt[i] <- readUrl(vec[i])
print(i)
}
g <- data.frame(vec,tt)
g2 <- g[which(!is.na(g$tt)),] #only valid url
dd <- numeric(nrow(g2))
for (j in 1:nrow(g2)){
dd[j] <- urlmode(g2[j,1])
}
Final <- cbind(g2,dd)
Final <- left_join(g, Final, by = c("vec" = "vec"))
I ran this code on a sample list of URLs with 100, URLs and it worked; however, after I ran it on whole list of URLs, it returned an error. Here is the error : Error in textConnection("rval", "w", local = TRUE) : all connections are in use Error in textConnection("rval", "w", local = TRUE) : all connections are in use
And after this even for sample URLs (100 samples that I tested before) I ran the code and got this error message : Error in file(con, "r") : all connections are in use
I also tried closeAllConnection after each recalling each function in the loop, but it didn't work.
Can anyone explain what this error is about? is it related to the number of requests we can have from the website? what's the solution?
So, my guess as to why this is happening is because you're not closing the connections that you're opening via tryCatch() and via urlmode() through the use of readLines(). I was unsure of how urlmode() was going to be used in your previous post so it had made it as simple as I could (and in hindsight, that was badly done, my apologies). So I took the liberty of rewriting urlmode() to try and make it a little bit more robust for what appears to be a more expansive task at hand.
I think the comments in the code should help, so take a look below:
#Updated URL mode function with better
#URL checking, connection handling,
#and "mode" investigation
urlmode <- function(x){
#Check if URL is good to go
if(!httr::http_error(x)){
#Test cases
#x <- "www.pixilink.com/3"
#x <- "https://www.pixilink.com/93320"
#x <- "https://www.pixilink.com/93313"
#Then since there are redirect shenanigans
#Get the actual URL the input points to
#It should just be the input URL if there is
#no redirection
#This is important as this also takes care of
#checking whether http or https need to be prefixed
#in case the input URL is supplied without those
#(this can cause problems for url() below)
myx <- httr::HEAD(x)$url
#Then check for what the default mode is
mycon <- url(myx)
open(mycon, "r")
mycontent <- readLines(mycon)
mypos <- grep("initial_mode = ", mycontent)
#Close the connection since it's no longer
#necessary
close(mycon)
#Some URLs with weird formats can return
#empty on this one since they don't
#follow the expected format.
#See for example: "https://www.pixilink.com/clients/899/#3"
#which is actually
#redirected from "https://www.pixilink.com/3"
#After that, evaluate what's at mypos, and always
#return the actual URL
#along with the result
if(!purrr::is_empty(mypos)){
#mystr<- stringr::str_extract(mycontent[mypos], "(?<=initial_mode\\s\\=).*")
mystr <- stringr::str_extract(mycontent[mypos], "(?<=\').*(?=\')")
return(c(myx, mystr))
#return(mystr)
#So once all that is done, check if the line at mypos
#contains a 0 (picture), tour (video)
#if(grepl("0", mycontent[mypos])){
# return(c(myx, "picture"))
#return("picture")
#} else if(grepl("tour", mycontent[mypos])){
# return(c(myx, "video"))
#return("video")
#}
} else{
#Valid URL but not interpretable
return(c(myx, "uninterpretable"))
#return("uninterpretable")
}
} else{
#Straight up invalid URL
#No myx variable to return here
#Just x
return(c(x, "invalid"))
#return("invalid")
}
}
#--------
#Sample code execution
library(purrr)
library(parallel)
library(future.apply)
library(httr)
library(stringr)
library(progressr)
library(progress)
#All future + progressr related stuff
#learned courtesy
#https://stackoverflow.com/a/62946400/9494044
#Setting up parallelized execution
no_cores <- parallel::detectCores()
#The above setup will ensure ALL cores
#are put to use
clust <- parallel::makeCluster(no_cores)
future::plan(cluster, workers = clust)
#Progress bar for sanity checking
progressr::handlers(progressr::handler_progress(format="[:bar] :percent :eta :message"))
#Website's base URL
baseurl <- "https://www.pixilink.com"
#Using future_lapply() to recursively apply urlmode()
#to a sequence of the URLs on pixilink in parallel
#and storing the results in sitetype
#Using a future chunk size of 10
#Everything is wrapped in with_progress() to enable the
#progress bar
#
range <- 93310:93350
#range <- 1:10000
progressr::with_progress({
myprog <- progressr::progressor(along = range)
sitetype <- do.call(rbind, future_lapply(range, function(b, x){
myprog() ##Progress bar signaller
myurl <- paste0(b, "/", x)
cat("\n", myurl, " ")
myret <- urlmode(myurl)
cat(myret, "\n")
return(c(myurl, myret))
}, b = baseurl, future.chunk.size = 10))
})
#Converting into a proper data.frame
#and assigning column names
sitetype <- data.frame(sitetype)
names(sitetype) <- c("given_url", "actual_url", "mode")
#A bit of wrangling to tidy up the mode column
sitetype$mode <- stringr::str_replace(sitetype$mode, "0", "picture")
head(sitetype)
# given_url actual_url mode
# 1 https://www.pixilink.com/93310 https://www.pixilink.com/93310 invalid
# 2 https://www.pixilink.com/93311 https://www.pixilink.com/93311 invalid
# 3 https://www.pixilink.com/93312 https://www.pixilink.com/93312 floorplan2d
# 4 https://www.pixilink.com/93313 https://www.pixilink.com/93313 picture
# 5 https://www.pixilink.com/93314 https://www.pixilink.com/93314 floorplan2d
# 6 https://www.pixilink.com/93315 https://www.pixilink.com/93315 tour
unique(sitetype$mode)
# [1] "invalid" "floorplan2d" "picture" "tour"
#--------
Basically, urlmode() now opens and closes connections only when necessary, checks for URL validity, URL redirection, and also "intelligently" extracts the value assigned to initial_mode. With the help of future.lapply(), and the progress bar from the progressr package, this can now be applied quite conveniently in parallel to as many pixilink.com/<integer> URLs as desired. With a bit of wrangling thereafter, the results can be presented very tidily as a data.frame as shown.
As an example, I've demonstrated this for a small range in the code above. Note the commented out 1:10000 range in the code in this context: I let this code run the last couple of hours over this (hopefully sufficiently) large range of URLs to check for errors and problems. I can attest that I encountered no errors (only the regular warnings In readLines(mycon) : incomplete final line found on 'https://www.pixilink.com/93334'). For proof, I have the data from all 10000 URLs written to a CSV file that I can provide upon request (I don't fancy uploading that to pastebin or elsewhere unnecessarily). Due to oversight on my part, I forgot to benchmark that run, but I suppose I could do that later if performance metrics are desired/would be considered interesting.
For your purposes, I believe you can simply take the entire code snippet below and run it verbatim (or with modifications) by just changing the range assignment right before the with_progress(do.call(...)) step to a range of your liking. I believe this approach is simpler and does away with having to deal with multiple functions and such (and no tryCatch() messes to deal with).
When working with R I frequently get the error message "subscript out of bounds". For example:
# Load necessary libraries and data
library(igraph)
library(NetData)
data(kracknets, package = "NetData")
# Reduce dataset to nonzero edges
krack_full_nonzero_edges <- subset(krack_full_data_frame, (advice_tie > 0 | friendship_tie > 0 | reports_to_tie > 0))
# convert to graph data farme
krack_full <- graph.data.frame(krack_full_nonzero_edges)
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
# Calculate reachability for each vertix
reachability <- function(g, m) {
reach_mat = matrix(nrow = vcount(g),
ncol = vcount(g))
for (i in 1:vcount(g)) {
reach_mat[i,] = 0
this_node_reach <- subcomponent(g, (i - 1), mode = m)
for (j in 1:(length(this_node_reach))) {
alter = this_node_reach[j] + 1
reach_mat[i, alter] = 1
}
}
return(reach_mat)
}
reach_full_in <- reachability(krack_full, 'in')
reach_full_in
This generates the following error Error in reach_mat[i, alter] = 1 : subscript out of bounds.
However, my question is not about this particular piece of code (even though it would be helpful to solve that too), but my question is more general:
What is the definition of a subscript-out-of-bounds error? What causes it?
Are there any generic ways of approaching this kind of error?
This is because you try to access an array out of its boundary.
I will show you how you can debug such errors.
I set options(error=recover)
I run reach_full_in <- reachability(krack_full, 'in')
I get :
reach_full_in <- reachability(krack_full, 'in')
Error in reach_mat[i, alter] = 1 : subscript out of bounds
Enter a frame number, or 0 to exit
1: reachability(krack_full, "in")
I enter 1 and I get
Called from: top level
I type ls() to see my current variables
1] "*tmp*" "alter" "g"
"i" "j" "m"
"reach_mat" "this_node_reach"
Now, I will see the dimensions of my variables :
Browse[1]> i
[1] 1
Browse[1]> j
[1] 21
Browse[1]> alter
[1] 22
Browse[1]> dim(reach_mat)
[1] 21 21
You see that alter is out of bounds. 22 > 21 . in the line :
reach_mat[i, alter] = 1
To avoid such error, personally I do this :
Try to use applyxx function. They are safer than for
I use seq_along and not 1:n (1:0)
Try to think in a vectorized solution if you can to avoid mat[i,j] index access.
EDIT vectorize the solution
For example, here I see that you don't use the fact that set.vertex.attribute is vectorized.
You can replace:
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
by this:
## set.vertex.attribute is vectorized!
## no need to loop over vertex!
for (attr in names(attributes))
krack_full <<- set.vertex.attribute(krack_full,
attr, value = attributes[,attr])
It just means that either alter > ncol( reach_mat ) or i > nrow( reach_mat ), in other words, your indices exceed the array boundary (i is greater than the number of rows, or alter is greater than the number of columns).
Just run the above tests to see what and when is happening.
Only an addition to the above responses: A possibility in such cases is that you are calling an object, that for some reason is not available to your query. For example you may subset by row names or column names, and you will receive this error message when your requested row or column is not part of the data matrix or data frame anymore.
Solution: As a short version of the responses above: you need to find the last working row name or column name, and the next called object should be the one that could not be found.
If you run parallel codes like "foreach", then you need to convert your code to a for loop to be able to troubleshoot it.
If this helps anybody, I encountered this while using purr::map() with a function I wrote which was something like this:
find_nearby_shops <- function(base_account) {
states_table %>%
filter(state == base_account$state) %>%
left_join(target_locations, by = c('border_states' = 'state')) %>%
mutate(x_latitude = base_account$latitude,
x_longitude = base_account$longitude) %>%
mutate(dist_miles = geosphere::distHaversine(p1 = cbind(longitude, latitude),
p2 = cbind(x_longitude, x_latitude))/1609.344)
}
nearby_shop_numbers <- base_locations %>%
split(f = base_locations$id) %>%
purrr::map_df(find_nearby_shops)
I would get this error sometimes with samples, but most times I wouldn't. The root of the problem is that some of the states in the base_locations table (PR) did not exist in the states_table, so essentially I had filtered out everything, and passed an empty table on to mutate. The moral of the story is that you may have a data issue and not (just) a code problem (so you may need to clean your data.)
Thanks for agstudy and zx8754's answers above for helping with the debug.
I sometimes encounter the same issue. I can only answer your second bullet, because I am not as expert in R as I am with other languages. I have found that the standard for loop has some unexpected results. Say x = 0
for (i in 1:x) {
print(i)
}
The output is
[1] 1
[1] 0
Whereas with python, for example
for i in range(x):
print i
does nothing. The loop is not entered.
I expected that if x = 0 that in R, the loop would not be entered. However, 1:0 is a valid range of numbers. I have not yet found a good workaround besides having an if statement wrapping the for loop
This came from standford's sna free tutorial
and it states that ...
# Reachability can only be computed on one vertex at a time. To
# get graph-wide statistics, change the value of "vertex"
# manually or write a for loop. (Remember that, unlike R objects,
# igraph objects are numbered from 0.)
ok, so when ever using igraph, the first roll/column is 0 other than 1, but matrix starts at 1, thus for any calculation under igraph, you would need x-1, shown at
this_node_reach <- subcomponent(g, (i - 1), mode = m)
but for the alter calculation, there is a typo here
alter = this_node_reach[j] + 1
delete +1 and it will work alright
What did it for me was going back in the code and check for errors or uncertain changes and focus on need-to-have over nice-to-have.
I'm writing the code to get the data from Uncomtrade- an UN's database. Because the database has a usage limit of 100 enquiries/hour so I need to put a time out there.
I want to write the code with tryCatch that will:
Automatically set programs to time out everytime the error for max limit appears
Rerun for the current level of i,j and k if a connection error orcurs
My current code still work though but I want to learn how to use tryCatch too
And also is there a way to get rid of the for loops. Can the apply family function be used here?
Thanks guys
n=0
a<-c()
for (i in (1996:2014)) {
for (j in c("0301","0302","0303","0304","0305","0306","0307","0308")) {
for (k in c("704","116","360","418","458","104","608","702","764")) {
s2<-paste(i,j,k,sep="")
a<-c(a,s2)
print (s2)
n<-n+1
if(n<=100) {
s1 <- get.Comtrade(r=k, ps=i, rg="2", cc=j, fmt="csv",px="H0")
Sys.sleep (1)
s1<-do.call(rbind.data.frame,s1)
library(foreign)
write.dta(s1,file=paste("D:/unTrade/",s2,".dta"))
}
else {
print(n)
print(s2)
print("reset here")
n=0
Sys.sleep(3610)
}
}
}
}
I can't really help you with the TryCatch(); I don't have the experience myself.
Regarding the for loops, this is one solution (although I think in these cases the for-loops are not that evil; vectorization really counts in all kinds of matrix operations etc).
dat <- expand.grid(i = 1996:1999, j = c("0301","0302","0303","0304","0305","0306","0307","0308"), k = c("704","116","360","418","458","104","608","702","764"))
library(dplyr)
dat %>% group_by(i, j, k) %>%
do({
cat('s1 <- get.Comtrade(r=', .$k, ', ps=', .$i, ', cc=', .$j, ', rg=\"2\", fmt=\"csv\",px=\"H0\")\n')
flush.console()
# return(s1)
})
From your own code s1 (also) appears to be a data.frame, so in this case, the dplyr do() nicely glues all these data frames together.
HTH
I would like to stop the loop when my if condition becomes true and continue to the next lines of my code. When I run this, I get an error that the subscript i in blocks[[i]] is out of bounds. Any ideas?
for(i in 1:length(blocks)){
if(length(blocks[[i]]) == 0){
path <- path[ -i , -i]
modes = rep("A", nrow(path))
blocks[[i]] = NULL
}
}
NOTE: I have read the help for loops, next, stop and break
If I understand you correctly, you want to loop while a condition is true and exit the loop when the condition become false. So you should be using the while control-flow operator, not for.
i <- 1
while(i<=length(blocks) & length(blocks[[i]])!=0){
i <- i+1
}
# check that we exited the loop because there was a zero
# and not because we went through all the blocks
if(i != length(blocks+1)) {
path <- path[ -i , -i]
modes = rep("A", nrow(path))
blocks[[i]] = NULL
}
# rest of your code
But there's really no need for loop here.
# apply 'length' to every element of the list blocks
# returns a vector containing all the lengths
all_length <- sapply(blocks, FUN=length)
# check that there is at least one zero
if(any(all_length==0)) {
# find the indexes of the zeros in 'all_length'
zero_length_ind <- which(all_length==0)
# this is the index of the first zero
i <- min(zero_length_ind)
}
I don't know what you want to do but if your plan is to treat all the 'i' sequentially, you may actually want to work with zero_length_ind and treat all your zeros at once.
For example if you want to remove all the values in path corresponding to zero length in blocks, you should directly do:
path <- path[-zero_length_ind,-zero_length_ind]
(Note that if there is no zero length element in blocks, then zero_length_ind will be integer(0) (that is, an empty integer vector) and you can't use it to index path. This might save you some debugging time.)
Use the STOP command, the code below should work
for(i in 1:length(blocks)){
if(length(blocks[[i]]) == 0){
path <- path[ -i , -i]
modes = rep("A", nrow(path))
blocks[[i]] = NULL
` stop("Outside bounds")
}
}
I have trouble parsing your title and your description. Seems to me you don't want to stop the loop, but continue for the whole length of your object blocks. This could help, it should skip to the next iteration when the length is different from 0:
for(i in 1:length(blocks)){
if(length(blocks[[i]]) != 0) {
next
} else if(length(blocks[[i]]) == 0){
path <- path[ -i , -i]
modes = rep("A", nrow(path))
blocks[[i]] = NULL
}
}
If i want to check the existence of a variable I use
exists("variable")
In a script I am working on I sometimes encounter the problem of a "subscript out of bounds" after running, and then my script stops. In an if statement I would like to be able to check if a subscript will be out of bounds or not. If the outcome is "yes", then execute an alternative peace of the script, and if "not", then just continue the script as it was intended.
In my imagination in case of a list it would look something like:
if {subscriptOutofBounds(listvariable[[number]]) == TRUE) {
## execute this part of the code
}
else {
## execute this part
}
Does something like that exist in R?
You can compare the length of your list with other number. As an illustration, say I have a list with 3 index and want to check by comparing them with a vector of number 1 to 100.
lol <- list(c(1:10),
c(100:200),
c(3:50))
lol
check_out <- function(x) {
maxi <- max(x)
if (maxi > length(lol)) {
#Excecute this part of code
print("Yes")
}
else {
#Excecute this part of code
print("No")
}
}
num <- 1:100
check_out(num)
The biggest number of vector num is 100 and your list only has 3 index (or length =3), so it will be out of bound from your list, then it will return Yes