getURL (from RCurl package) doesn't work in a loop - r

I have a list of URL named URLlist and I loop over it to get the source code for each of those URL :
for (k in 1:length(URLlist)){
temp = getURL(URLlist[k])
}
Problem is for some random URL, the code get stuck and I get the error message:
Error in function (type, msg, asError = TRUE) :
transfer closed with outstanding read data remaining
But when I try the getURL function, not in the loop, with the URL which had a problem, it perfectly works.
Any help please ? thank you very much

Hard to tell for sure without more information, but it could just be the requests getting sent too quickly, in which case just pausing between requests could help :
for (k in 1:length (URLlist)) {
temp = getURL (URLlist[k])
Sys.sleep (0.2)
}
I'm assuming that your actual code does something with 'temp' before writing over it in every iteration of the loop, and whatever it does is very fast.
You could also try building in some error handling so that one problem doesn't kill the whole thing. Here's a crude example that tries twice on each URL before giving up:
for (url in URLlist) {
temp = try (getURL (url))
if (class (temp) == "try-error") {
temp = try (getURL (url))
if (class (temp) == "try-error")
temp = paste ("error accessing", url)
}
Sys.sleep(0.2)
}

Related

Access R code block in multiple instances

I have a code block, to perform 3 times retry of the code execution in case of a specific error. In below example if HTTP 503 error occurred during the data download from ADLS container, I want the same operation to be executed maximum of 3 times retry.
require(AzureStor)
require(stringr)
recheck <- 0
while (recheck < 3){
recheck <- recheck + 1
tryCatch({
storage_download(container, file, filename, overwrite=TRUE)
recheck <- 4
}, error = function(e){
if ( sum(str_detect(e, '503')*1) > 0 ){
print(e)
print(paste0('An infra-level failure occured. Retry sequence number is : ', recheck))
} else{
recheck <<- 4
print(e)
}
}
)
}
This code works fine for me, but similar to storage_download in the above example, I have other ADLS operations like delete_blob, upload_blob, storage_upload, list_storage_files at multiple instances in the code, I have to write above mentioned code for each of these functions. I want to make the above code as a function which can be called during each of these ADLS operations. Any thoughts or suggestions would help me greatly.
The following should do the trick:
with_retries_on_failure = function (expr, retries = 3L) {
expr = substitute(expr)
for (try in seq_len(retries)) {
tryCatch(
return(eval.parent(expr)),
error = \(e) {
if (str_detect(conditionMessage(e), '503')) stop(e)
message('An infra-level failure occurred. Retry sequence number is: ', try)
}
)
}
}
Used as follows:
with_retries_on_failure(storage_download(container, file, filename, overwrite=TRUE))
Note the return() call, which immediately returns from the surrounding function without the need to update the loop variable. Likewise, in the case of a failure we also don’t have to update the loop variable since we are using a for loop, and we use stop() to break out of the loop for any error that is not a 503 HTTP response.

How to Speed up read_html runtime in r?

I have a character string of 400 URLs called URLs.
I have a loop that has been working for a while but now it takes way too long. It used to just report the url as an error and then I would omit but its is getting hung up.
dput(URLs)
c("http://www.chinadaily.com.cn/a/202102/04/WS601b5bd7a31024ad0baa736d.html",
"http://www.xinhuanet.com/english/2021-02/02/c_139716479.htm",
"http://www.china.org.cn/world/Off_the_Wire/2021-02/02/content_77181645.htm",
"http://english.sina.com/world/af/2021-02-02/detail-ikftssap2511288.shtml",
"https://www.beijingnews.net/news/267750643/fox-takes-clubhouse-lead-as-johnson-makes-move-in-saudi-arabia",
"https://www.beijingnews.net/news/267768819/johnson-excited-for-season-after-second-saudi-title",
"https://en.wtcf.org.cn/GlobalNews/2021020320227.html", "https://www.ladepeche.fr/2021/02/08/golf-un-top-4-royal-pour-victor-perez-9360378.php",
"https://sport24.lefigaro.fr/golf/tour-europeen/actualites/victor-perez-dans-les-pas-de-dustin-johnson-en-arabie-saoudite-1032163",
"https://sport24.lefigaro.fr/golf/tour-europeen/actualites/european-tour-victor-perez-a-longtemps-tenu-tete-a-dustin-johnson-en-arabie-saoudite-1032273",
"https://www.france24.com/en/live-news/20210206-johnson-seizes-two-shot-lead-in-saudi-international",
"https://www.france24.com/en/live-news/20210205-fox-takes-clubhouse-lead-as-johnson-makes-move-in-saudi-arabia",
"https://www.france24.com/en/live-news/20210203-big-hitting-dechambeau-happy-to-take-longer-clubs-out-of-rivals-hands",
"https://www.france24.com/en/live-news/20210203-as-bubble-life-drags-on-psychologists-say-cricketers-need-more-support",
"https://www.sports.fr/golf/circuit-europeen/golf-perez-gratin-arabie-saoudite-426859.html",
"https://www.sport.fr/golf/lopen-de-france-est-sauve-758291.shtm",
"https://www.ffgolf.org/Actus/Pro/European-Tour/Saudi-International-ET-Perez-n-est-pas-passe-loin",
"https://www.ffgolf.org/Actus/Pro/European-Tour/Saudi-International-ET-Perez-a-rendez-vous-avec-DJ-dimanche",
"https://www.ffgolf.org/Actus/Pro/European-Tour/Saudi-International-ET-Rozner-au-sec-a-6-Perez-a-7",
"https://www.ffgolf.org/Actus/Pro/European-Tour/Saudi-International-ET-Rozner-et-Perez-demarrent-bien",
"https://www.ffgolf.org/Actus/Pro/LPGA-Tour/Franck-Riboud-On-va-pouvoir-continuer-a-travailler-sereinement",
"https://www.ffgolf.org/Actus/Pro/Feuilletons/Paroles-de-coach/Paroles-de-coach-6-Gwladys-Nocera",
"https://franceracing.fr/other/porsche-et-tag-heuer-scellent-un-partenariat-strategique/",
"https://www.rfi.fr/en/sports/20210206-johnson-seizes-two-shot-lead-in-saudi-international",
"https://www.rfi.fr/en/sports/20210205-fox-takes-clubhouse-lead-as-johnson-makes-move-in-saudi-arabia",
"https://www.rfi.fr/en/sports/20210203-big-hitting-dechambeau-happy-to-take-longer-clubs-out-of-rivals-hands",
"https://www.rfi.fr/en/sports/20210203-as-bubble-life-drags-on-psychologists-say-cricketers-need-more-support",
"https://www.jeudegolf.org/EasyBlog/Agathe-sauzon.html", "http://topactu.net/2021/02/viktor-hovland-vaults-into-farmers-lead-at-wet-torrey-pines/",
"https://www.sueddeutsche.de/sport/golf-kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt-dpa.urn-newsml-dpa-com-20090101-210207-99-337940",
"https://www1.wdr.de/sport/golf-martin-kaymer-saudi-arabien-100.html",
"https://www.augsburger-allgemeine.de/sport/sonstige-sportarten/Kaymer-18-bei-Golf-Turnier-in-Saudi-Arabien-Johnson-siegt-id59059886.html",
"https://www.schwaebische.de/sport/ueberregionaler-sport_artikel,-kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt-_arid,11325827.html",
"https://www.sport.de/news/ne4341625/golf--kaymer-beendet-turnier-in-saudi-arabien-als-18/",
"https://www.mz-web.de/sport/golf/kaymer-18--bei-golf-turnier-in-saudi-arabien---johnson-siegt-38027428",
"https://www.nwzonline.de/sport-meldungen/european-tour-kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_a_50,12,475833623.html",
"https://www.volksstimme.de/golf/news/kaymer-18.-bei-golf-turnier-in-saudi-arabien---johnson-siegt/1612702615000",
"https://www.wn.de/Sport/Weltsport/Golf/4360897-European-Tour-Kaymer-18.-bei-Golf-Turnier-in-Saudi-Arabien-Johnson-siegt",
"https://www.mainpost.de/sport/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt-art-10562664",
"https://www.moz.de/nachrichten/sport/news/european-tour-kaymer-18.-bei-golf-turnier-in-saudi-arabien-johnson-siegt-54931493.html",
"https://www.svz.de/sport/weitere-sportarten/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt-id31187247.html?nojs=true",
"https://www.rhein-zeitung.de/sport/aus-aller-welt/aus-aller-welt-golf_artikel,-kaymer-18-bei-golfturnier-in-saudiarabien-johnson-siegt-_arid,2220135.html",
"https://www.rhein-zeitung.de/sport/aus-aller-welt/aus-aller-welt-golf_artikel,-martin-kaymer-sagt-olympiastart-in-tokio-ab-_arid,2274019.html",
"https://www.allgemeine-zeitung.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://www.echo-online.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://www.mittelhessen.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://www.muensterschezeitung.de/Sport/Sportarten/Golf/4360897-European-Tour-Kaymer-18.-bei-Golf-Turnier-in-Saudi-Arabien-Johnson-siegt",
"https://www.wiesbadener-kurier.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://www.giessener-anzeiger.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://newsroom.porsche.com/de/2021/unternehmen/porsche-sportwagenhersteller-tag-heuer-luxusuhren-schmiede-zusammenarbeit-videostream-23558.html",
"https://www.azonline.de/Sport/Weitere-Sportarten/Golf/4360897-European-Tour-Kaymer-18.-bei-Golf-Turnier-in-Saudi-Arabien-Johnson-siegt",
"https://www.borkenerzeitung.de/welt/sport/Kaymer-18-bei-Golf-Turnier-in-Saudi-Arabien-Johnson-siegt-327224.html",
"https://www.golfpost.de/european-tour-saudi-international-2021-ergebnisse-runde-2-7777396527/",
"https://www.golfpost.de/396354-7777396354/", "https://www.golfpost.de/german-challenge-powerd-by-vcg-golf-challenge-tour-kehrt-nach-deutschland-zurueck-7777396396/",
"https://www.golfpost.de/die-macht-der-moneten-saudi-arabien-auf-dem-weg-zum-big-player-im-golf-7777396387/",
"https://www.kreis-anzeiger.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://www.wormser-zeitung.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://m.azonline.de/Sport/Weitere-Sportarten/Golf/4361712-PGA-Turnier-US-Golfstar-Koepka-triumphiert-bei-Phoenix-Open",
"https://www.mv-online.de/sport/sportmix/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt-409658.html",
"https://www.golf.de/publish/dgv-sport/golf-team-germany/news/60228375/sophia-popov-nach-major-sieg-in-elite-team-germany",
"https://www.golf.de/publish/tournews/nachrichten-tour/60228372/einmal-saudi-einmal-etwas-gaudi",
"https://www.golf.de/publish/tournews/nachrichten-tour/60228387/koepka-comeback-und-eine-wuestenbilanz",
"https://www.ev-online.de/sport/sportmix/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt-409655.html",
"https://www.nach-welt.com/dustin-johnson-setzt-masstabe-aber-jordan-spieth-justin-rose-und-brooks-koepka-kehren-zur-form-zuruck/",
"https://www.nach-welt.com/ryan-fox-wird-sechster-wahrend-dustin-johnson-saudi-international-gewinnt/",
"https://www.usinger-anzeiger.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://www.gaeubote.de/Nachrichten/Golf-Turnier-in-Muenchen-Kaymer-faellt-zurueck-86604.html",
"https://www.gaeubote.de/Nachrichten/Kaymer-nach-Traumrunde-Zweiter-bei-Golf-Turnier-in-Muenchen-86664.html",
"https://www.main-spitze.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://www.lauterbacher-anzeiger.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://www.oberhessische-zeitung.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://de.advfn.com/p.php?pid=nmona&article=84265497", "https://www.buerstaedter-zeitung.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://www.golftime.de/golf-nachrichten/challenge-tour-in-deutschland-neues-profi-turnier/",
"https://www.golftime.de/golf-nachrichten/martin-kaymer-saudi-international-tour-news/",
"https://www.golftime.de/magazin/distanz-usga-ra-elite-spieler-regel-anpassung/",
"https://www.dmm.travel/nc/news/porsche-und-tag-heuer-arbeiten-zusammen/",
"https://www.lampertheimer-zeitung.de/sport/golf/kaymer-18-bei-golf-turnier-in-saudi-arabien-johnson-siegt_23109750",
"https://www.hongkongherald.com/news/267768819/johnson-excited-for-season-after-second-saudi-title",
"https://www.hongkongherald.com/news/267750643/fox-takes-clubhouse-lead-as-johnson-makes-move-in-saudi-arabia",
"http://hongkongcityportal.com/saudi-international-englands-david-horsey-leads-from-scotlands-stephen-gallacher/",
"http://hongkongcityportal.com/bryson-dechambeau-flattered-and-welcomes-proposed-rule-changes/",
"http://hongkongcityportal.com/paul-casey-englishman-defends-saudi-international-u-turn/",
"https://as.com/masdeporte/2021/02/03/golf/1612378989_020231.html",
"https://www.marca.com/golf/2021/02/07/601fd7c122601d860c8b45dc.html",
"https://www.marca.com/golf/2021/05/02/608ece1b22601d9d5d8b45f0.html",
"https://www.marca.com/golf/2021/02/03/601ad5d7268e3ef01e8b4670.html",
"https://www.republicworld.com/sports-news/other-sports/johnson-eases-to-another-victory-at-saudi-international.html",
"https://www.republicworld.com/sports-news/other-sports/dustin-johnson-within-1-shot-of-lead-at-saudi-international.html",
"https://timesofindia.indiatimes.com/sports/golf/top-stories/dustin-johnson-excited-for-season-after-second-saudi-title/articleshow/80737390.cms",
"https://timesofindia.indiatimes.com/sports/golf/top-stories/johnson-eases-to-another-victory-at-saudi-international/articleshow/80736264.cms",
"https://timesofindia.indiatimes.com/sports/golf/top-stories/ryan-fox-takes-surprise-lead-at-saudi-international/articleshow/80711869.cms",
"https://timesofindia.indiatimes.com/sports/golf/top-stories/horsey-goes-on-birdie-blitz-for-saudi-international-lead/articleshow/80691513.cms",
"https://timesofindia.indiatimes.com/sports/golf/top-stories/shubhankar-shoots-69-in-opening-round-at-saudi-international/articleshow/80691501.cms",
"https://timesofindia.indiatimes.com/sports/golf/top-stories/big-hitting-dechambeau-happy-to-take-longer-clubs-out-of-rivals-hands/articleshow/80672723.cms",
"https://timesofindia.indiatimes.com/sports/cricket/news/as-bubble-life-drags-on-psychologists-say-cricketers-need-more-support/articleshow/80662353.cms",
"https://www.abc.es/deportes/abci-sergio-garcia-apunta-ryder-202102070038_noticia.html",
"https://www.abc.es/deportes/abci-golfistas-golpe-gimnasio-202102050031_noticia.html",
"https://www.investing.com/news/general/golf-johnson-holds-on-to-clinch-second-saudi-international-title-2411514"
)
####I have tried this:
html_reader<- function(x){return( tryCatch(xml2::read_html(URLs[k]), error = function(e) NULL))}
for (k in seq_along(URLs)) parsed_pages[k] <-lapply(as.list(URLs), html_reader)
I havent run into issues with runtime for some reason until now. The function will not complete even with the try() error function.
My current working code is the following:
pp <- replicate(list(), n = length(ESPN))
for (k in seq_along(ESPN)) pp[[k]] <- try(xml2::read_html(ESPN[k]), silent = TRUE)
It used to just take a while but now it never finishes.
I think the issue I am running into is due to the open connections. The script would get progressively slower and I feel it was due to the old connections. Here is a simple loop that closes out all of the connections. I will know when I run a particular report again if this is the solution but it has seemed to help so far.
for (i in seq_along(df$URLs)){function(i)
closeAllConnections(i)
}

Error in open.connection(con, "rb") : Timeout was reached: Resolving timed out after 10000 milliseconds

So I've got a list of "Player" objects, each with an ID, called players and I'm trying to reach a web JSON with information related to the relevant ID, using JSONlite.
The HTML stem is: 'https://fantasy.premierleague.com/drf/element-summary/'
I need to access every players respective page.
I'm trying to do so as follows:
playerDataURLStem = 'https://fantasy.premierleague.com/drf/element-summary/'
for (player in players) {
player_data_url <- paste(playerDataURLStem,player#id,sep = "")
player_data <- fromJSON(player_data_url)
# DO SOME STUFF #
}
When I run it, I'm getting the error Error in open.connection(con, "rb") : Timeout was reached: Resolving timed out after 10000 milliseconds. This error is produced at a different position in my list of players each time I run the code and when I check the webpage that is causing the error, I can't see anything erroneous about it. This leads me to believe that sometimes the pages just take longer than 10000 milliseconds to reply, but using
options(timeout = x)
for some x, doesn't seem to make it wait longer for a response.
For a minimum working example, try:
playerDataURLStem = 'https://fantasy.premierleague.com/drf/element-summary/'
ids <- c(1:540)
for (id in ids) {
player_data_url <- paste(playerDataURLStem, id, sep = "")
player_data <- fromJSON(player_data_url)
print(player_data$history$id[1])
}
options(timeout= 4000000) is working for me .try increasing value of timeout to higher number

How I can find my .RDA has been loaded to R

I have a scenario where I want to check If R has loaded the .RDA(which is a model)
I want this for getting prediction call as I don`t want to load every time I am asking for a prediction.
I tried with this below code
if(!is.na(T2I_Vendor_Eval1.rda)){
print("started")
bar<-load(file = "C:\\T2I_Vendor_Eval1.rda")
print("ended ")
}
Result I get is
Error: object 'T2I_Vendor_Eval1.rda' not found
Instead of doing this
if(!is.na(T2I_Vendor_Eval1.rda)){
print("started")
bar<-load(file = "C:\\T2I_Vendor_Eval1.rda")
print("ended end")
}
I did this
if(!exists("T2I_Vendor_Eval1")){
print("started")
load(file = "C:\\T2I_Vendor_Eval1.rda")
print("ended end")
}
It worked for me.
thanks for you help #JonGrub

Getting a strange bug in jxBrowser

So this is a strange one. My code does a bunch a things that are hard to explain (but if necessary I´ll try to explain), but the following works:
var res = data.delete_if (function(key, value) { return key == "a"; })
but the following crashes:
data.delete_if (function(key, value) { return key == "a"; })
So, the fact that I do not save the result of the delete_if function crashes the browser with the following stack trace:
Error: test: B environment should proxy a Ruby hash. (MDArraySolTest): Java::JavaLang::IllegalStateException: Channel stream was closed before response has been received.
java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498) org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:453)
Any ideas of why this happens? Any solutions? I can provide more information if needed.
EDIT1:
Doing some more tests I found out that the error occurs only if the call to data.delete_if is the last statement on the script. If I add for example: console.log(""); after the call, everything works fine.
Thanks

Resources