I'm trying to understand internet connections in R little bit better and while there is some information scattered around, I find it hard to follow. Hopefully this question can bring information together. My problem is following. I'm working from my office computer and there is firewall in place. I have tested out how to scrape webpages in R and found out following.
This code works now, because I have deleted curl from my package list.
library(rvest)
lego_movie <- read_html("http://www.imdb.com/title/tt1490017/")
lego_movie %>%
html_node("strong span") %>%
html_text() %>%
as.numeric()
[1] 7.8
But I'm trying to follow https://datascienceplus.com/scraping-javascript-rendered-web-content-using-r/ this example which needs V8 package (which needs curl). When I install curl package following takes place:
lego_movie <- read_html("http://www.imdb.com/title/tt1490017/")
Error in open.connection(x, "rb") : Empty reply from server
So there is proxy problem but how do I go around it with curl installed as many packages are using it? If I again remove curl I can continue with rvest only scraping. How curl can block my normal rvest scraping even if it's not using curl (httr seems to be using curl, but rvest still works)? I find this very confusing.
curl installation 3.1
rvest installation 0.3.2
R 3.4.3
Related
I can not find how to replicate the internal CRAN test for the URLs healthy.
It is important that this test is run only on the Debian winbuilder (yes, debian under winbuilder). As this test is not run on the Windows machine so we could NOT use the https://win-builder.r-project.org/upload.aspx website service to replicate it.
The example error message from the CRAN server, as the website was moved.
Such message is producing the NOTE so the package is not automatically processed.
Found the following (possibly) invalid URLs:
URL: http://blog.obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/
From: inst/doc/tinyverse.html
Status: Error
Message: Could not resolve host: blog.obeautifulcode.com
Edit:
There is a useful source with CRAN policy in this area https://cran.r-project.org/web/packages/URL_checks.html
(Promoting comment to answer as suggested...)
The test code has been pulled out of R itself and made into a package you can install. Other than that it is of course part of any (recent enough) R or R-devel build.
FWIW I also wrapped this into a convenience script I call all the time on my systems.
I am trying to make a cURL API call with R and I am unable to retrieve data. Or more specifically I am unable to figure out how to translate a multi-line curl call into an R command.
I am trying to get data from Twitch, the Twitch Developers API page offers the following curl code. Though I am unsure about the syntax of the call.
curl -H 'Accept: application/vnd.twitchtv.v5+json' \
-H 'Client-ID: uo6dggojyb8d6soh92zknwmi5ej1q2' \
-X GET 'https://api.twitch.tv/kraken/games/top'
I have attempted variations of:
library(curl)
library(httr)
library(jsonlite)
df <- GET('https://api.twitch.tv/kraken/games/top', add_headers('Accept: application/vnd.twitchtv.v5+json', 'Client-ID: uo6dggojyb8d6soh92zknwmi5ej1q2'))
fromJSON(df)
df <- curl_download('https://api.twitch.tv/kraken/games/top', destfile = 'C:\\....\\curldta.csv')
fromJSON(df)
Thanks for any help in advance.
I wrote a package that is a wrapper of twitch API for R language (you can install packages from github it with devtools package). The data frame you're trying to get can be obtained with
library(rTwitchAPI)
twitch_auth("YOUR_CLIENT_ID")
df = get_top_games()$data
I am trying to scrape some text from fb using the 'Rfacebook' package.
Even after installing Rfacebook & tm, running relevant libraries (Rfacebook, httr, tm, and httpuv), fboAuth(appid, appsecret) is failing to get the reqd. access token. Please see the code and ensuing error below:
install.packages("Rfacebook")
install.packages("tm")
library(devtools)
library(Rfacebook)
library(httr)
library(httpuv)
library(tm)
appid <- 123
appsecret <- 'mysecret123'
fboauth <- fbOAuth(appid, appsecret, extended_permissions = T)
Which returns
Copy and paste into Site URL on Facebook App Settings:
http://localhost:1410/
When done, press any key to continue...
Upon pasting the redirect url in the cell here, a browser opened.
And, although the browser displayed "Authentication complete. Please close this page and return to R.", an error msg was returned in the RStudio console (attached below.) Again, I tried this with Safari as well as Chrome as default browsers - no change.
Authentication complete.
Error in init_oauth2.0(self$endpoint, self$app, scope = self$params$scope, :
Bad Request (HTTP 400). Failed to get an access token.
Any help in resolving this is truly appreciated!
Best,
S
p.s. Using R-Studio v3.3.2 on a Mac (OS Sierra.)
Try the following
a) remove httr package
b) remove RCurl package
c) remove RFacebook
d) Reinstall httr , then RCurl, and then RFacebook
e) get the latest version of R studio
f) After you authenticate, and when you get the message Copy and paste into Site URL on Facebook App Settings: http://localhost:1410/ When done, press any key to continue...
You may want to do save(user_fbOauth) where user_fbOauth are your credentials
g) then pass them to any other request
I have tried this and dont see any issues
I would like to build a simple webserver using Rook, however I am having strange errors when trying it in R-Studio:
The code
library(Rook)
s <- Rhttpd$new()
s$start()
print(s)
returns the rather useless error
"Error in listenPort > 0 :
comparison (6) is possible only for atomic and list types".
When trying the same code in a simple R-Console,everything works - so I would like to understand why that happens and how I can fix it.
RStudio is Version 0.99.484 and R is R 3.2.2
I've experienced same thing.
TLDR: This pull request solves the problem: https://github.com/jeffreyhorner/Rook/pull/31
RStudio is treated in different way and Rook port is same as tools:::httpdPort value. The problem is that in current Rook master tools:::httpdPort is assigned directly. It's a function that's why we need to evaluate it first.
If you want to have it solved right now, without waiting for merge into master: install devtools and load package from my fork #github.
install.packages("devtools")
library(devtools)
install_github("filipstachura/Rook")
Related questions:
RCurl errors when fetching ssl endpoint
R: Specify SSL version in Rcurl getURL statement
I am looking at the following:
url = https://www.veilingbiljet.nl/resultaten-ajax.asp?order=datum&direction=D&page=1&field=0®io=39
Then,
getURL(url)
gives the following error:
error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list
Adding the followinf curl option, suggested in one of the related questions,
getURL(url, ssl.verifypeer = TRUE,sslversion=3L)
returns
Unknown SSL protocol error in connection to www.veilingbiljet.nl:443
Any help would be greatly appreciated.
The RCurl package is unmaintained and defunct. Use the curl package or httr instead:
library(curl)
con <- curl("https://www.veilingbiljet.nl/resultaten-ajax.asp?order=datum&direction=D&page=1&field=0®io=39")
readLines(con)
Or:
library(httr)
req <- GET("https://www.veilingbiljet.nl/resultaten-ajax.asp?order=datum&direction=D&page=1&field=0®io=39")
stop_for_status(req)
content(req)
error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list
This is a known bug in old OpenSSL versions and was fixed about 4 years ago. Thus upgrading your local OpenSSL should help.
If this is not possible you might disable the use of the affected ciphers by setting the ssl.cipher.list option of RCurl to HIGH:!ECDH.
Unknown SSL protocol error in connection to www.veilingbiljet.nl:443
The server does not support SSL 3.0 so trying to enforce this version will fail.