Error in open.connection(con, "rb") : Timeout was reached: Resolving timed out after 10000 milliseconds - r

So I've got a list of "Player" objects, each with an ID, called players and I'm trying to reach a web JSON with information related to the relevant ID, using JSONlite.
The HTML stem is: 'https://fantasy.premierleague.com/drf/element-summary/'
I need to access every players respective page.
I'm trying to do so as follows:
playerDataURLStem = 'https://fantasy.premierleague.com/drf/element-summary/'
for (player in players) {
player_data_url <- paste(playerDataURLStem,player#id,sep = "")
player_data <- fromJSON(player_data_url)
# DO SOME STUFF #
}
When I run it, I'm getting the error Error in open.connection(con, "rb") : Timeout was reached: Resolving timed out after 10000 milliseconds. This error is produced at a different position in my list of players each time I run the code and when I check the webpage that is causing the error, I can't see anything erroneous about it. This leads me to believe that sometimes the pages just take longer than 10000 milliseconds to reply, but using
options(timeout = x)
for some x, doesn't seem to make it wait longer for a response.
For a minimum working example, try:
playerDataURLStem = 'https://fantasy.premierleague.com/drf/element-summary/'
ids <- c(1:540)
for (id in ids) {
player_data_url <- paste(playerDataURLStem, id, sep = "")
player_data <- fromJSON(player_data_url)
print(player_data$history$id[1])
}

options(timeout= 4000000) is working for me .try increasing value of timeout to higher number

Related

Access R code block in multiple instances

I have a code block, to perform 3 times retry of the code execution in case of a specific error. In below example if HTTP 503 error occurred during the data download from ADLS container, I want the same operation to be executed maximum of 3 times retry.
require(AzureStor)
require(stringr)
recheck <- 0
while (recheck < 3){
recheck <- recheck + 1
tryCatch({
storage_download(container, file, filename, overwrite=TRUE)
recheck <- 4
}, error = function(e){
if ( sum(str_detect(e, '503')*1) > 0 ){
print(e)
print(paste0('An infra-level failure occured. Retry sequence number is : ', recheck))
} else{
recheck <<- 4
print(e)
}
}
)
}
This code works fine for me, but similar to storage_download in the above example, I have other ADLS operations like delete_blob, upload_blob, storage_upload, list_storage_files at multiple instances in the code, I have to write above mentioned code for each of these functions. I want to make the above code as a function which can be called during each of these ADLS operations. Any thoughts or suggestions would help me greatly.
The following should do the trick:
with_retries_on_failure = function (expr, retries = 3L) {
expr = substitute(expr)
for (try in seq_len(retries)) {
tryCatch(
return(eval.parent(expr)),
error = \(e) {
if (str_detect(conditionMessage(e), '503')) stop(e)
message('An infra-level failure occurred. Retry sequence number is: ', try)
}
)
}
}
Used as follows:
with_retries_on_failure(storage_download(container, file, filename, overwrite=TRUE))
Note the return() call, which immediately returns from the surrounding function without the need to update the loop variable. Likewise, in the case of a failure we also don’t have to update the loop variable since we are using a for loop, and we use stop() to break out of the loop for any error that is not a 503 HTTP response.

How do i fix the warning message "Closing open result set, cancelling previous query" when querying a PostgreSQL database in R?

Below is a snippet of my code that I use in R to extract IDs from a PostgreSQL database. When I run the function I get the following warning message from R:
In result_create(conn#ptr, statement) :
Closing open result set, cancelling previous query
How do I avoid this warning message from happening without making use of options(warn=-1) at the beginning of my code, suppressing the warning instead of
con <- dbConnect(RPostgres::Postgres(),
user = "postgres",
dbname = "DataBaseName",
password = "123456",
port = 5431)
get_id <- function(connection, table){
query <- toString(paste("SELECT id FROM ", table, sep = ""))
data_extract_query <- dbSendQuery(connection, query)
data_extract <- dbFetch(data_extract_query)
return(data_extract)
}
get_id(con, "users")
I found a method for solving the problem.
I found a thread on GitHub for RSQLite a https://github.com/r-dbi/RSQLite/issues/143. In this thread, they explicitly set n = -1 in the dbFetch() function.
This seemed to solve my problem, and the warning message did not show up again by editing the code like the following:
data_extract <- dbFetch(data_extract_query, n = -1)
The meaning of n is the number of rows that the query should return. By setting this to -1 all rows will be retrieved. By default, it is set to n = -1 but for some reason, in this build of R (3.6.3) the warning will still be shown.
Calling ?dbFetch in R you can see more information on this. I have included a snippet from the R-help page:
Usage
dbFetch(res, n = -1, ...)
fetch(res, n = -1, ...)
Arguments
res An object
inheriting from DBIResult, created by dbSendQuery().
n maximum number of records to retrieve per fetch. Use n = -1 or
n = Inf to retrieve all pending records. Some implementations may
recognize other special values.
... Other arguments passed on to methods.
This issue comes up with other database implementations if the results are not cleared before submitting a new one. From the docs of DBI::dbSendQuery
Usage
dbSendQuery(conn, statement, ...)
...
Value
dbSendQuery() returns an S4 object that inherits from DBIResult. The result set can be used with dbFetch() to extract records. Once you have finished using a result, make sure to clear it with dbClearResult(). An error is raised when issuing a query over a closed or invalid connection, or if the query is not a non-NA string. An error is also raised if the syntax of the query is invalid and all query parameters are given (by passing the params argument) or the immediate argument is set to TRUE.
To get rid of the warning the get_id() function must be modified as follows:
get_id <- function(connection, table){
query <- toString(paste("SELECT id FROM ", table, sep = ""))
data_extract_query <- dbSendQuery(connection, query)
data_extract <- dbFetch(data_extract_query)
# Here we clear whatever remains on the server
dbClearResult(data_extract_query)
return(data_extract)
}
See Examples section in help for more.

failed to authenticate google translate in R

So I tried to use gl_translate function to 500,000 characters in Rstudio, which means I have to authenticate my google translate API. The problem is that I tried it like two months ago with my old google account and now I'm using the new one.
So when I tried to authenticate new client_id with my new google account, I got error message that my API hadn't been enabled yet, which I had enabled it. I restarted my Rstudio and now I got this error message:
020-01-22 19:01:24 -- Translating html: 147 characters -
2020-01-22 19:01:24> Request Status Code: 403
Error: API returned: Request had insufficient authentication scopes.
It is very frustrating because then I tried to enable the old google account and it requires me to put my credit card number, which is again I did and then now they asked me to wait several days.
Anyone can figure out what's problem with this?
here is my R code for authentication:
install.packages("googleAnalyticsR", dependencies = TRUE)
library(googleAnalyticsR)
install.packages("googleLanguageR")
library(googleLanguageR)
install.packages("dplyr")
library(dplyr)
library(tidyverse)
install.packages("googleAuthR")
library(googleAuthR)
client_id <- "107033903887214478396"
private_key <- "-----BEGIN PRIVATE KEY-----\nMIIEvAIBADANBgkqhkiG9w0BAQEFAASCBKYwggSiAgEAAoIBAQChPmvib1v9/CFA\nX7fG8b8iXyS370ivkufMmX30C6/rUNOttA+zMhamS/EO0uaYtPw44B4QdNzRsSGq\nm+0fQ5Sp1SHJaVPkoImuUXZdMlLO73pvY48nMmEFg7deoOZI+CNWZYgIvPY8whMo\nk4vKE2iyuG+pl9MT7dT6dwWNmXDWr8kgxAfryfVEUIeqaf+57Z9g3FfVPLARz4iH\nCaTu55hhbmo/XknUx0hPsrwBMPaNLGl2+o5MU1ZuIkl47EJvdL8CdUuEmb9qJDtv\nUyKqANlwFa7cVW8ij2tjFpjJ7bigRVJsI8odbsEbwmx1b5SLckDOQ7t4l8zERtmo\nUv5fxyNNAgMBAAECggEAApeLyWyL2IXcjPnc7OxG68kGwJQuoW/lnQLcpPcpIUm/\n1Vt/IxzLg2nWGqxmO48xPMLRiOcwA4jq5yCxi56c/avo6qFwUU0JWY2CrxXXge8U\nk0TQ8MrdB2cqI/HHMeYXP1TLfoR3GtvtzemtRhbQyIqxdNL1eC0LDump47BTQYg0\nVPuCxU3zXVIj+Qt0FZa+Pa/nAOGHf5b4ye56L7vxL2BCeRncuHdDcE6Ilkpz79Gv\nkXP1K5j22uEVCmobe1qRlq3BLx2Qimj4h8MI8CKiNS40tGR/oewJ5uMgmeCePYKv\nqSFOwCDvRkw9V2KdGu40WQFEq21mczlv9gMWhp2/EQKBgQDRmBZZM7ugIpo64wx6\nDFYhZo05LmMiwymIfWs2CibzKDeXPuy3OSytvTPVFCkG+RlcYthxAIAn1Z/qJ4UI\n+8c8Zwfg+toYtEa2gTYM2185vmnqQwqmAsaK+4xKZzgfqxie/CBuPzUOZO41q6P8\ni7A2KqXHcDb4SMqnkdGGLk/7+QKBgQDE8dBesgx4DsHFYg1sJyIqKO4d2pnLPkDS\nAzx5xvQuUcVCNTbugOC7e0vGxWmQ/Eqal5b3nitH590m8WHnU9UiE4HciVLe+JDe\nDe5CWzBslnncBjpgiDudeeEubhO7MHv/qZyZXMh73H2NBdO8j0uiGTNbBFoOSTYq\nsFACiCZu9QKBgE2KjMoXn5SQ+KpMkbMdmUfmHt1G0hpsRZNfgyiM/Pf8qwRjnUPz\n/RmR4/ky6jLQOZe6YgT8gG08VVtVn5xBOeaY34tWgxWcrIScrRh4mHROg/TNNMVS\nRY3pnm9wXI0qyYMYGA9xhvl6Ub69b3/hViHUCV0NoOieVYtFIVUZETJRAoGAW/Y2\nQCGPpPfvD0Xr0parY1hdZ99NdRQKnIYaVRrLpl1UaMgEcHYJekHmblh8JNFJ3Mnw\nGovm1dq075xDBQumOBU3zEzrP2Z97tI+cQm3oNza5hyaYbz7aVsiBNYtrHjFTepb\nT1l93ChnD9SqvB+FR5nQ2y07B/SzsFdH5QbCO4kCgYBEdRFzRLvjdnUcxoXRcUpf\nfVMZ6fnRYeV1+apRSiaEDHCO5dyQP8vnW4ewISnAKdjKv/AtaMdzJ5L3asGRWDKU\n1kP/KDBlJkOsOvTkmJ4TxbIhgcSI62/wqDBi5Xqw1ljR2mh8njzRwqDRKs12EtQ0\n9VaUDm7LCNTAskn2SR/o4Q==\n-----END PRIVATE KEY-----\n"
options(googleAuthR.client_id = client_id)
options(googleAuthR.client_secret = private_key)
devtools::reload(pkg = devtools::inst("googleAnalyticsR"))
ga_auth()
in case you need to see what's my translate code like:
translate <- function(tibble) {
tibble <- tibble
count <- data.frame(nchar = 0, cumsum = 0) # create count file to stay within API limits
for (i in 1:nrow(tibble)) {
des <- pull(tibble[i,2]) # extract description as single character string
if (count$cumsum[nrow(count)] >= 80000) { # API limit check
print("nearing 100000 character per 100 seconds limit, pausing for 100 seconds")
Sys.sleep(100)
count <- count[1,] # reset count file
}
if (grepl("^\\s*$", des) == TRUE) { # if description is only whitespace then skip
trns <- tibble(translatedText = "", detectedSourceLanguage = "", text = "")
} else { # else request translation from API
trns <- gl_translate(des, target='en', format='html') # request in html format to anticipate html descriptions
}
tibble[i,3:4] <- trns[,1:2] # add to tibble
nchar = nchar(pull(tibble[i,2])) # count number of characters
req <- data.frame(nchar = nchar, cumsum = nchar + sum(count$nchar))
count <- rbind(count, req) # add to count file
if (nchar > 20000) { # addtional API request limit safeguard for large descriptions
print("large description (>20,000), pausing to manage API limit")
Sys.sleep(100)
count <- count[1,] # reset count file
}
}
return(tibble)
}
I figured it out after 24 hours.
Apparently it is really easy. I just followed the step from this link.
But yesterday I make mistake because the json file I downloaded is the json file from service client Id, while I actually need the json file from service account.
Then I install the googleLanguageR package with this code:
remotes::install_github("ropensci/googleLanguageR")
library(googleLanguageR)
and then just set the file location of my download Google Project JSON file in a GL_AUTH argument like this:
gl_auth("G:/My Drive/0. Thesis/R-Script/ZakiServiceAccou***************kjadjib****.json")
and now I'm happy :)

Creating a persistant connection to twitter stream API using R

I am presently using the streamR package in R to stream tweets from the filter stream in twitter. I have a handshaken ROAuth object that I use for this. My piece of code looks like:
# load the Twitter auth object
load("twitter_oAuth3.RData")
load("keywords3.RData")
streamTweet = function(){
require(streamR)
require(ROAuth)
stack = filterStream(file.name="",track=keywords,timeout=500,oauth=twitter_oAuth)
return(stack)
}
I wanted to create a real time application, which involves dumping these tweets into an activeMQ topic. My code for that is:
require(Rjms)
# Set logger properties
url = "tcp://localhost:61616"
type = "T"
name = "TwitterStream"
# initialize logger
topicWriter = initialize.logger(url,type,name)
topicWrite = function(input){
# print("writing to topic")
to.logger(topicWriter,input,asString=TRUE,propertyName='StreamerID',propertyValue='1')
return()
}
logToTopic = function(streamedStack){
# print("inside stack-writer")
stacklength = length(streamedStack)
print(c("Length: ",stacklength))
for(i in 1:stacklength){
print(c("calling for: ",i))
topicWrite(streamedStack[i])
}
return()
}
Now my problem is that of the timeout that filterStream() needs. I looked under the hood, and found this call that the function makes:
url <- "https://stream.twitter.com/1.1/statuses/filter.json"
output <- tryCatch(oauth$OAuthRequest(URL = url, params = params,
method = "POST", customHeader = NULL,
writefunction = topicWrite, cainfo = system.file("CurlSSL",
"cacert.pem", package = "RCurl")), error = function(e) e)
I tried removing the timeout component but it doesn't seem to work. Is there a way I can maintain a stream forever (until I kill it) which dumps each tweet as it comes into a topic?
P.S. I know of a java implementation that makes a call to the twitter4j API. I, however, have no idea how to do it in R.
The documentation for streamR package mentions that the default option for timeout option in filterStream() is 0 which will keep the connection open permanently.
I quote:
"numeric, maximum length of time (in seconds) of connection to stream. The
connection will be automatically closed after this period. For example, setting
timeout to 10800 will keep the connection open for 3 hours. The default is 0,
which will keep the connection open permanently."
Hope this helps.

getURL (from RCurl package) doesn't work in a loop

I have a list of URL named URLlist and I loop over it to get the source code for each of those URL :
for (k in 1:length(URLlist)){
temp = getURL(URLlist[k])
}
Problem is for some random URL, the code get stuck and I get the error message:
Error in function (type, msg, asError = TRUE) :
transfer closed with outstanding read data remaining
But when I try the getURL function, not in the loop, with the URL which had a problem, it perfectly works.
Any help please ? thank you very much
Hard to tell for sure without more information, but it could just be the requests getting sent too quickly, in which case just pausing between requests could help :
for (k in 1:length (URLlist)) {
temp = getURL (URLlist[k])
Sys.sleep (0.2)
}
I'm assuming that your actual code does something with 'temp' before writing over it in every iteration of the loop, and whatever it does is very fast.
You could also try building in some error handling so that one problem doesn't kill the whole thing. Here's a crude example that tries twice on each URL before giving up:
for (url in URLlist) {
temp = try (getURL (url))
if (class (temp) == "try-error") {
temp = try (getURL (url))
if (class (temp) == "try-error")
temp = paste ("error accessing", url)
}
Sys.sleep(0.2)
}

Resources