How to access a web service that requires authetication [duplicate] - r

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Reading information from a password protected site
I have a webservice that provides data in csv form, based on the url you use to access it. i.e. http://sever.com/parameter1 returns a csv for parameter 1, http://sever.com/parameter1 returns a csv for parameter 2, etc. When I first access the site in my browser, I type in a username and password and can then access any data I want.
The problem arises when I try to import that data into R. I tried this function:
readLines('http://sever.com/parameter1')
But got the following error:
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") : cannot open: HTTP status was '401 Unauthorized'
Of course, this is because R doesn't know to pass my username and password along with the request. How do I define these additional parameters in R? Is there any way to add a cookie to the request or something?
Thank you.
/edit: The answer here (different question wording wasn't picked up by SO) worked for me:
Reading information from a password protected site
If anyone else has any other advice, please let me know.

Why don't you use curl to grab the file? That way you can set http headers for username and password:
curl --user name:password http://www.example.com
There is a curl library for R
http://curl.haxx.se/libcurl/r/

Related

Unable to pull JSON data from stats.nba.com

I've been having some difficulty getting data from stats.nba.com. I've been able to pull info pretty easily in the past, so wanted to see if you guys noticed any issues in my code or if you're running into the same problems.
I'm using rjson.
library(rjson)
url <- "https://stats.nba.com/stats/boxscoresummaryv2?GameID=0041800406"
a <- fromJSON(file = url)
When I run this, I get:
Error in file(con, "r") :
cannot open the connection to 'https://stats.nba.com/stats/boxscoresummaryv2?GameID=0041800406'
In addition: Warning message:
In file(con, "r") :
URL 'https://stats.nba.com/stats/boxscoresummaryv2?GameID=0041800406': status was 'Failure when receiving data from the peer'
I can, however, see the data in JSON format by following the request url. Anybody notice any mistakes I'm making?
The following code can read the json file into a list object.
library(jsonlite)
read_json("https://stats.nba.com/stats/boxscoresummaryv2?GameID=0041800406")
Not really a specific answer; think it was due to some issue with my firewall. Was able to get everything to work on a different network.

Unable to connect R with Youtube API

I am trying to connect with the youtube API using yt_oauth function but getting error as follows :
Error in readRDS(token) : error reading from connection.
I have checked my application several times. First I got the API Key and then the Client ID and Client Secret. I am using the Client ID and Client Secret for app_id and app_secret below. I have enabled all the 3 youtube APIs(Data,Analytics,Reporting)So where can I be going wrong? Any help appreciated.
Below is the code I am using
library("tuber")
app_id <- "XYZ"
app_secret<-"abc"
yt_oauth(app_id,app_secret)
Seems that you called yt_oauth previously with some invalid parameter. A file called .httr-oauth may exists at your current path (in my case it was on My Documents folder). One can remove that file or alternatively do this:
yt_oauth(app_id, app_secret, token = '')
It will force tuber to refresh the token state.

RDJDBC::dbConnect failing to connect to HiveServer2 (kerberos +sasl)

I am trying to connect to Hive2 using RJDBC but it failing with "GSS initiate failed". However same things working fine using beeline client. Any idea what may have caused different behavior when running both on same node with same credentials?
drv <- RJDBC::JDBC("org.apache.hive.jdbc.HiveDriver", cp, "`")
following is just for illustrative purpose as I wanted to show what all parameter I am using as JDBC url.
conn <- RJDBC::dbConnect(drv, "jdbc:hive2://node1:10000/default;principal=hive/hive_node#REALM;ssl=true;sslTrustStore=store_path;trustStorePassword=store_password", "user", "password")
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://:10000/default;principal=hive/hive_node#REALM;ssl=true;sslTrustStore=store_path;trustStorePassword=store_password: GSS initiate failed
A bit late for you, but... look at that post about the details of configuring Kerberos authentication for Hive/Impala JDBC (note also that "user" and "password" connection args are ignored by Kerberos auth)
The post assumes that you have the password stored in a "keytab" file, and use it to create a private Kerberos ticket. If you want to use the default, public ticket instead, then change the JAAS conf accordingly (i.e. useTicketCache=true useKeyTab=false and no keyTab entry)
And to pass the configuration to Java from your R code, the easiest way is to set the JAVA_TOOL_OPTIONS env variable before anything else bootstraps the RJava initialization
Sys.setenv("JAVA_TOOL_OPTIONS"="-Djava.security.auth.login.config=/Path/To/jaas.conf -Djavax.security.auth.useSubjectCredsOnly=false")
PS: on Windows the path would look like C:/Path/To/jaas.conf (Java converts slashes to backslashes automatically; that's easier that escaping each and every backslash because of the way R Strings interpret \)
Final note: if any jerk tags this with "answers should not rely on links", since the aforementioned link points to another post of mine in S.O., then he/she is really a jerk, and I will gladly tell him/her to his/her face, loudly and with exotic words.

Error while Uploading file for API in apigee

I have an existing API proxy in Apigee edge. When I am trying to edit and save any policy I am getting below error.
Error while Uploading file for API <api name>.
org.apache.xerces.dom.ElementNSImpl cannot be cast to com.apigee.messaging.config.beans.TargetConnection
I am not able to figureout how to resolve this.
Thanks in advance.
When you get an error message like 'cannot be cast to...' that suggests that the field is expecting a certain type (message type in your case, like a request object or response object).
Try ensuring that the variable you enter into that particular field is a message type. It may be that the variable you are using is only a subset of a whole message.

Reading information from a password protected site

I have been using readLines() to scrape information from a website in an R tutorial. I now wish to extract data from my own website (specifically the awstats data) however the domain is password protected.
Is there a way that I can pass the url for the specific awstats data I require with a username and password.
the format of the url is:
http://domain.name:port/awstats.pl?month=02&year=2011&config=domain.name&lang=en&framename=mainright&output=alldomains
Thanks.
If it is indeed a http basic access authentication, the documentation on connections provides some help:
URLs
Note that https:// connections are
only supported if --internet2 or
setInternet2(TRUE) was used (to make
use of Internet Explorer internals),
and then only if the certificate is
considered to be valid. With that
option only, the http://user:pass#site
notation for sites requiring
authentication is also accepted.
So your URL string should look like this:
http://username:password#domain.name:port/awstats.pl?month=02&year=2011&config=domain.name&lang=en&framename=mainright&output=alldomains
This might be Windows-only though.
Hope this helps!
You can embed the username and password in the url like :
http://userid:passw#domain.name:port/...
This you can try to use with readLines(). If that doesn't work, you can always try a workaround using url() to open the connection :
zz <- url("http://userid:passw#domain.name:port/...")
readLines(zz)
close(zz)
You can also download the file and save it somewhere using download.file()
download.file("theurl","/path/to/file/filename",method="wget")
This saves the file on the local path that is specified.
EDIT :
as csgillespie said, you shouldn't include your username and password in the script. If you run scripts with source() or interactively, you could add eg :
user <- readline("Give the username : ")
passw <- readline("Give the password : ")
Url <- paste("http://",user,":",passw,"#domain.name...")
readLines(Url,...)
When running from the commandline, you could pass the arguments after --args and access them using commandArgs (see ?commandArgs)
If you have access to the box, you could always just read the awstats log files. If you can ssh into the box, then you could easily sync the latest file using rsync.
The slight snag with using
http://username:password#domain...
is that you are putting your password in an R script - best to avoid this. Of course you can secure it the script, but it only takes one slip. For example,
Someone asks you a similar question and you publish your script
The url http://username:password#domain... will(?) now show up on your server logs
...
Formatting the url as http://username:password#domain... for use with download.file didn't work for me, but R.utils provides the function downloadFile that works perfectly:
require(R.utils)
downloadFile(myurl, myfile, username = "myusername", password ="mypassword")
See #joris-meys answer for a way to avoid including your username and password in plain text in your script.
EDIT Except it looks like downloadFile just reformats the URL to http://username:password#domain...? Hmm...

Resources