I am trying to read a webpage and am getting the error message that the certificate can not be authenticated. My code is:
qurl<-"https://www.chemspider.com/Chemical-Structure.1.html"
h <- try(read_html(qurl), silent = TRUE)
I can access the webpage no worries directly in my browser and I have tried
library(httr)
set_config(config(ssl_verifypeer = 0L))
(also ssl.verifypeer - I read somewhere that was an older version), but I am still getting the error message:
Peer certificate cannot be authenticated with given CA certificates
I have also tried re-installing curl and even R, but without success. I am using R3.4.0 (3.3.3 before re-installing). Any ideas how I can read this webpage)
I had the same problem with Amazon Linux on an EC2 instance.
I eventually, having tried every suggestion I could find, resorted to:
library(RCurl)
webpage <- getURL("https://sourceforge.net/", .opts=list(followlocation=TRUE, ssl.verifyhost=FALSE, ssl.verifypeer=FALSE))
R version 3.3.3 produced the following:
install.packages("rvest")
library(rvest)
qurl<-"https://www.chemspider.com/Chemical-Structure.1.html"
h <- try(read_html(qurl), silent = TRUE)
h
{xml_document}
<html xmlns="http://www.w3.org/1999/xhtml">
[1] <head id="ctl00_ctl00_Head1">\n<meta http-equiv="Content-Type"
content="text/html; charset=UTF-8">\n<link rel="shortcut icon" href=" ...
[2] <body id="ctl00_ctl00_chemspider_body" class="rsc-ui">\r\n <form
name="aspnetForm" method="post" action="/Chemical-Structure.1.ht ...
Related
I'm just starting a web development course, and I need to install packages on Atom, but I get an error code every time I try to install any package. I have followed all the troubleshooting tips provided in the course materials as well as read all the questions that might be related to my problem, but nothing has fixed the problem. Any help is appreciated. This is the error I get:
Request for package information failed: <html>
<head>
<title> Server Error </title>
</head>
<body>
<font color =\"#aa0000\">
<h2>Server Error.</h2>
</font>
There was an unexpected error in the request processing.
</body>
I'm receiving an error on SSL Certificates when I try to use the function xml2::read_html() on a Brazilian Government's webpage.
When I try to access
page = xml2::read_html("https://www.gov.br/planalto/pt-br/acompanhe-o-planalto/discursos")
I receive the following error:
Error in open.connection(x, "rb") :
SSL certificate problem: unable to get local issuer certificate
I found another SO question with 3 possible solutions:
httr::set_config(config(ssl_verifypeer = 0L)) #1
httr::set_config(config(ssl_verifypeer = FALSE)) #2
Sys.setenv(LIBCURL_BUILD="winssl") #3
None of them solved my problem, then I tried running the code on a Kaggle Notebook, and I received the same error message so I could see the problem isn't on my PC.
from https://curl.haxx.se/docs/sslcerts.html:
Certificate Verification
...
Tell libcurl to not verify the peer. With libcurl you disable this with curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, FALSE);
With the curl command line tool, you disable this with -k/--insecure.
So, from the command line (or terminal) the following does work:
curl -k https://www.gov.br/planalto/pt-br/acompanhe-o-planalto/discursos
the following, as a workaround, also works (using curl library):
url <- "https://www.gov.br/planalto/pt-br/acompanhe-o-planalto/discursos"
curl::handle_setopt(h, ssl_verifyhost = 0, ssl_verifypeer=0)
curl::curl_download(url=url, destfile = "file_test.html", handle = h)
I couldn't find a way to set the insecure option within the xml2 package options, which would be the right answer to this question.
Funnily, though, the following also "works", but only to download the html file, to parse it directly, no luck.
curl::handle_setopt(h, ssl_verifyhost = 0, ssl_verifypeer=0)
xml2::download_html(url, handle = h)
xml2::read_xml(url, handle = h) #doesnt work
xml2::read_html(url, handle = h) #doesnt work
edit:
actually, following the info here, option 181
#> 181 ssl_verifypeer CURLOPT_SSL_VERIFYPEER integer
should be what you've tried and didn't work. Might be a bug, since it is the same option that works from the command line.
I am trying to scrape some text from fb using the 'Rfacebook' package.
Even after installing Rfacebook & tm, running relevant libraries (Rfacebook, httr, tm, and httpuv), fboAuth(appid, appsecret) is failing to get the reqd. access token. Please see the code and ensuing error below:
install.packages("Rfacebook")
install.packages("tm")
library(devtools)
library(Rfacebook)
library(httr)
library(httpuv)
library(tm)
appid <- 123
appsecret <- 'mysecret123'
fboauth <- fbOAuth(appid, appsecret, extended_permissions = T)
Which returns
Copy and paste into Site URL on Facebook App Settings:
http://localhost:1410/
When done, press any key to continue...
Upon pasting the redirect url in the cell here, a browser opened.
And, although the browser displayed "Authentication complete. Please close this page and return to R.", an error msg was returned in the RStudio console (attached below.) Again, I tried this with Safari as well as Chrome as default browsers - no change.
Authentication complete.
Error in init_oauth2.0(self$endpoint, self$app, scope = self$params$scope, :
Bad Request (HTTP 400). Failed to get an access token.
Any help in resolving this is truly appreciated!
Best,
S
p.s. Using R-Studio v3.3.2 on a Mac (OS Sierra.)
Try the following
a) remove httr package
b) remove RCurl package
c) remove RFacebook
d) Reinstall httr , then RCurl, and then RFacebook
e) get the latest version of R studio
f) After you authenticate, and when you get the message Copy and paste into Site URL on Facebook App Settings: http://localhost:1410/ When done, press any key to continue...
You may want to do save(user_fbOauth) where user_fbOauth are your credentials
g) then pass them to any other request
I have tried this and dont see any issues
Trying to download information from a specific web page, and although it opens fine in any browser, RCurl says it does not exists:
url.exists("http://www.transfermarkt.es/liga-mx-apertura/startseite/wettbewerb/MEXA")
[1] FALSE
Same results when using ".de".
url.exists("http://www.transfermarkt.de/liga-mx-clausura/startseite/wettbewerb/MEX1")
[1] FALSE
It also returns an error when using other functions of RCurl
> htmlParse("http://www.transfermarkt.es/liga-mx-apertura/startseite/wettbewerb/MEXA")
Error: failed to load HTTP resource
> htmlTreeParse("http://www.transfermarkt.es/liga-mx-apertura/startseite/wettbewerb/MEXA")
Error: failed to load HTTP resource
> htmlParse(getURL("http://www.transfermarkt.es/liga-mx-apertura/startseite/wettbewerb/MEXA"))
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><title>403 Forbidden</title></head>
<body bgcolor="white">
<center><h1>403 Forbidden</h1></center>
<hr>
<center>nginx</center>
</body>
</html>
Why is this happening?
How can successfully use htmlParse with this webpage?
EDIT:
I'm getting familiar with httr package, and this works just fine:
content(GET("http://www.transfermarkt.es/liga-mx-apertura/startseite/wettbewerb/MEXA"))
That webserver appears to return a 403 Forbidden error when your HTTP request does not include a user-agent string. RCurl by default does not pass a user-agent. You can set one with the useragent= parameter.
myurl<-"http://www.transfermarkt.es/liga-mx-apertura/startseite/wettbewerb/MEXA"
url.exists(myurl, useragent="curl/7.39.0 Rcurl/1.95.4.5")
# [1] TRUE
htmlTreeParse(getURL(myurl, useragent="curl/7.39.0 Rcurl/1.95.4.5"))
The httr package is a bit nicer than RCurl for making HTTP requests in my opinion (and it sets a user-agent string by default). Here's the corresponding code
library(httr)
GET(myurl)
I found a great project called r-google-analytics that I'd like to use so I can manipulate GA dat in R at this website http://code.google.com/p/r-google-analytics/.
I run this portion of the code:
library(RCurl)
library(XML)
# 1. Create a new Google Analytics API object
ga <- RGoogleAnalytics()
# 2. Authorize the object with your Google Analytics Account Credentials
ga$SetCredentials("INSERT_USER_NAME", "INSERT_PASSWORD")
And I get this error message:
Error in postForm("https://www.google.com/accounts/ClientLogin", Email = username, :
SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Any ideas as to what could be causing the error?
Thanks!
Kim
see http://www.omegahat.org/RCurl/FAQ.html for a thorough explanation and particularly (depending on your preference for security):
If you don't have a certificate from an appropriate signing agent, you can suppress verifying the certificate with the ssl.verifypeer option:
x = getURLContent("https://www.google.com", ssl.verifypeer = FALSE)
I had a similar problem and this helped me out:
Use the alternative internet2.dll by
starting R with the flag --internet2
(see How do I install R for Windows?)
or calling setInternet2(TRUE). These
cause R to use the Internet Explorer
internals, which may already be
configured for use with proxies. Note
that this does not work with proxies
that need authentication.
While I was researching the issue, I also discovered that other users reported this issue when they had non-alphanumeric characters (i.e. not A-Za-z0-9) in their password.
As a good practice some reference to both the R sessionInfo() and the OS (uname -a in unix-like systems) could be of some use!
Some basic Googling could also guide you in finding a solution, see for example:
http://curl.haxx.se/docs/sslcerts.html
http://www.linuxquestions.org/questions/slackware-14/openssl-ssl-error-code-14090086-verify-the-ca-cert-is-ok-certificate-verify-failed-703523/
HIH!
Here is the shortcut, just copy, change pathway and paste:
source("C:\\Users\\cloudstat\\Desktop\\Google analytics Plus\\RGoogleAnalytics.R")
source("C:\\Users\\cloudstat\\Desktop\\Google analytics Plus\\QueryBuilder.R")
install.packages("C:\\Users\\cloudstat\\Desktop\\Google analytics Plus\\RGoogleAnalytics_1.1.tar.gz",repos=NULL,type="source")
library(XML)
library(RCurl)
library(RGoogleAnalytics)
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
curl <- getCurlHandle()
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))
curlSetOpt(.opts = list(proxy = "proxyserver:port"), curl = curl)
ga <- RGoogleAnalytics()
ga$SetCredentials("USERNAME", "PASSWORD")
Good luck :)
Due to changes in Google API system, this is temporary not available to use. I have written a blog on "How to extract the Google aanalytics data in R" with developed R script.
Try running this . Enter the Client Id and Server from the Google Console API manager.
install.packages("RGoogleAnalytics")
install.packages("googleAuthR")
library(RGoogleAnalytics)
client.id <-"################.apps.googleusercontent.com"
client.secret <-"##############_TknUI"
token<-Auth(client.id,client.secret)