Post to Tumblr API with R - Error "fromJSON" only for photo posts - r

I have been working with R to post to tumblr through their API. I use the R package tumblR. Everything works fine for text- and link posts but I keep getting errors when trying to post photos.
Not sure what the issue is here, maybe some wrong syntax the way I insert the link for the photo? Tried to debug but could not solve it. Was hoping you guys can help me?
I am also posting the code for text and link posts, maybe some of you see value in this for yourself.
[R CODE]
require(tumblR)
require(httpuv)
### Authorize
consumer_key <-'key'
consumer_secret <- 'secret'
appname <- 'appname'
tokenURL <- 'http://www.tumblr.com/oauth/request_token'
accessTokenURL <- 'http://www.tumblr.com/oauth/access_token'
authorizeURL <- 'http://www.tumblr.com/oauth/authorize'
app <- oauth_app(appname, consumer_key, consumer_secret)
endpoint <- oauth_endpoint(tokenURL, authorizeURL, accessTokenURL)
token <- oauth1.0_token(endpoint, app)
sig <- sign_oauth1.0(app,
token = token$credentials$oauth_token,
token_secret = token$credentials$oauth_token_secret)
### Post Text
post(base_hostname = "blogname.tumblr.com", type = "text", state = "published", tags = 'tag',
body = 'this is the body', token = token, consumer_key = consumer_key, consumer_secret = consumer_secret)
# => Shows: "* Hostname was NOT found in DNS cache" but posts the textpost to tumblr
### Post Link
post(base_hostname = "blogname.tumblr.com", type = "link", state = "published", tags = 'tag', url_link= 'www.somelink.de',
title_link= 'linkTitle', description= 'this is the description', token = token, consumer_key = consumer_key, consumer_secret = consumer_secret)
# => Shows: "* Hostname was NOT found in DNS cache" but posts the linkpost to tumblr
### Post Photo
post(base_hostname = "blogname.tumblr.com", type = "photo", tags = "tag", caption_photo = 'photoTitle',
link = "http://bilder.bild.de/fotos/bde-logo-35166394/Bild/20.bild.png",
source_photo = "http://bilder.bild.de/fotos/bde-logo-35166394/Bild/20.bild.png", data_photo= NA,
token = token, consumer_key = consumer_key, consumer_secret = consumer_secret)
# => Shows the following error and doese NOT (!) post the photo:
# * Hostname was NOT found in DNS cache
# * Trying 66.6.41.23...
# * Connected to api.tumblr.com (66.6.41.23) port 80 (#0)
# > POST /v2/blog/blogname.tumblr.com/post HTTP/1.1
# User-Agent: RCurl
# Host: api.tumblr.com
# Accept: application/json
# Authorization: ...
# Content-Length: 490
# Content-Type: application/x-www-form-urlencoded
#
# * upload completely sent off: 490 out of 490 bytes
# < HTTP/1.1 401 Not Authorized (!!!!!!)
# < Server: nginx
# < Date: Sun, 22 Mar 2015 12:14:43 GMT
# < Content-Type: application/json; charset=utf-8
# < Transfer-Encoding: chunked
# < Connection: close
# * Closing connection 0
# Warning message:
# In if (class(token) != "Token1.0") stop("token must be a Token1.0 type") :
# the condition has length > 1 and only the first element will be used
# Error in fromJSON(http.connection(url, token, bodyParams, consumer_key, :
# error in evaluating the argument 'content' in selecting a method for function 'fromJSON': Error: Not Authorized
For what I researched, tumblr API might just return "401 Not authorized" because the syntax of the request is not correct, it does not have to be related to the authorization (key, secret etc.) itself. As the same credentials work for text and link post, I believe it should be a different problem.
tumblR description says either us "data_photo" or "source_photo", so I guess NA for one of them is ok. Tried both anyway.
Any help is appreciated, thanks!!!

A new version (1.1) of tumblR has been released.
With this version:
the dependency from the ROAuth package has been eliminated;
the "post" function with "photo" type has been fixed;
the warning message about token has been removed.
Thanks for reporting!

Related

In glassnode, how to use the API key to retreive the data in R?

In glassnode, how to use the API key to retreive the data in R?
https://docs.glassnode.com/basic-api/api-key
I tried the following code in R:
url <- "http://api.glassnode.com" path <- "v1/metrics/addresses/active_count"
raw.result <- GET(url = url, path = path, authenticate = "AUTHENTICATION KEY")
names(raw.result)
raw.result
But it returned the following:
Response [https://api.glassnode.com/v1/metrics/addresses/active_count]
Date: 2022-03-10 07:11 Status: 401 Content-Type: text/html Size:
172 B
401 Authorization Required
401 Authorization Required
nginx

how to get data from the WTO API in R

library(httr)
library(jsonlite)
headers = c(
# Request headers
'Ocp-Apim-Subscription-Key' = '{subscription key}'
)
params = list()
# Request parameters
params['countries[]'] = '{array}'
resp <- GET(paste0("https://api.wto.org/tfad/transparency/procedures_contacts_single_window?"
, paste0(names(params),'=',params,collapse = "&")),
add_headers(headers))
if(!http_error(resp)){
jsonRespText<-fromJSON(rawToChar(content(resp,encoding = 'UTF-8')))$Dataset
jsonRespText
}else{
stop('Error in Response')
}
I don't know how to get response from an API in R. I have executed this code but the server is not responding...
If you examine the value of the resp object after running your code you'll notice a status code:
> resp
Response [https://tfadatabase.org/api/transparency/procedures_contacts_single_window?countries[]=%7Barray%7D]
Date: 2020-04-17 19:25
Status: 422
Content-Type: application/json
Size: 77 B
So the server actually did respond, it just didn't give you what you were hoping for. In the API documentation we can look up this code:
422 Unprocessable Entity
If a member cannot be found, or the request parameters are poorly
formed.
So I just went to the Query Builder and looked for a valid request URL and updated the code. It ran fine - i.e. Status 200.
This was the URL I used in the code:
https://api.wto.org/timeseries/v1/data?i=TP_A_0100&r=000&fmt=json&mode=full&lang=1&meta=false
and the value of resp was
Date: 2020-04-17 19:30
Status: 200
Content-Type: application/json; charset=utf-8
Size: 88 B
I cut out the subscription key in my results above. You can find the Query Builder here. Incidentally, in the Query Builder it automatically includes the subscription key and other "header" info in the URL. You can either remove that first and re-add it in your code, or just change your code to run GET() directly on their version of the URL.

R: fetching pdf documents from Companies House API

I'm trying to fetch documents from the API using R. Appreciate the clarification of the process in this post. I've been following the above steps with partial success, but still fail the last step to get access to documents' content:
Find the document filing you're interested in (e.g. make a filing history request1 for the company). Parse the response for the link to the document in the field "links" : { "document_metadata" : "link URI fragment here" }.
No problem:
library(httr)
library(jsonlite)
library(openssl)
### retrieving filing history ####
company_num = 'FC013908'
key = 'my_key'
fh_path = paste0('/company/', str_to_upper(company_num), "/filing-history")
fh_url <- modify_url("https://api.companieshouse.gov.uk/", path = fh_path)
fh_test <- GET(fh_url, authenticate(key, "")) #status_code = 200
fh_parsed <- jsonlite::fromJSON(content(fh_test, "text",encoding = "utf-8"), flatten = TRUE)
docs <- fh_parsed$items
Done.
2 For a given document request the document metadata via CH Document API3. Parse the response to get the document (mime) types available and the link to the actual document data (document URI fragment).
No problems here:
md_meta_url = docs$links.document_metadata[1]
key_pass <- paste0(key,":")
decoded_auth <- paste0('Basic ', base64_encode(key_pass))
md_test <- GET(md_meta_url,
add_headers(Authorization = decoded_auth)
)
md_test #status_code = 200!
md_parsed <- jsonlite::fromJSON(content(md_test, "text",encoding = "utf-8"), flatten = TRUE)
This way I can obtain the content URL:
cont_url = md_parsed$links$document
Request the actual document9, specifying the mime type (e.g. "application/pdf").
I do it while NOT following the redirect and, as expected, I get the 302 status code with the location header:
accept = 'application/pdf'
cont_test <- GET(cont_url,
add_headers(Authorization = decoded_auth,
Accept = accept),
config(followlocation = FALSE)
)
final_url <- cont_test$headers$location
> final_url
[1] "https://s3-eu-west-1.amazonaws.com/document-api-images-prod/docs/LjBouRHeXXpIYAvqYIPWL06iXaliPz6Pucp1OXCXQhI/application-pdf?AWSAccessKeyId=ASIAJX7TVURFXZTY5DNQ&Expires=1529483765&Signature=uUQx6RTW7XBLqx4L6pYr5tOUySg%3D&x-amz-security-token=FQoDYXdzEP%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDGxe7meYGe3OYhNwcSK3AwcVYJUXaUMf19oVO9s4qNPWN8AHjNNd5rrZhgE9YTkF1OmzyZSL5xHbls664kDP%2Bxd7dz9PIU5O1D%2BVxoDyoYcFiS6acDnO28KpfFE56lUZNfedf1jys%2FP0SJ8f%2F50Cbn93bfOlm0MZA9%2BQ2DYQvPfkWSvrDjMyCXHbu57gpZHjQKPNRTgzGXzUUCvFwREytGMM4eThhn4Glvvx%2FA8IiLbnsvgmEKw9iAj7KWIenhoJq3cTRytUpVeipLnQoBVLau8dFYkKdAHZaYM2Tlx0z6ObRb%2BGdm7W7eOVA1bFXuUXmUmnAHruDIwwLlgOVN2IJ9CxmJU22lY8jrEm%2BUivtrdp2oofn32PryBEJ8jJOg9cIpLbBBx%2FeOkng9zJwnZbute7Nmh%2BnaY2btsId6JjraFNsTvR%2B1qEZX9uuznUdJdqgVfTMj2gGrAmntwk0JAkILlvamzjWC%2F9vAqK7Xvt8aC6hlIMB2vdzTCU9Jf%2FrIMTClTJkk0BzBuvJ86t1l%2BXb4rF5Pab%2FegFpJ6nvZKqde%2F77wMMiTyG35EndmYx4AWqTIh9EofYwKZa9uciNvRT0E2%2BYnT5jZMo%2BdWn2QU%3D"
However, when I try to
Request this URI from Amazon again passing the content type you want again.
I get 400 error:
final_test <- GET(final_url,
add_headers(Authorization = decoded_auth,
Accept = accept
))
> final_test
Response [https://s3-eu-west-1.amazonaws.com/document-api-images-prod/docs/LjBouRHeXXpIYAvqYIPWL06iXaliPz6Pucp1OXCXQhI/application-pdf?AWSAccessKeyId=ASIAJX7TVURFXZTY5DNQ&Expires=1529483765&Signature=uUQx6RTW7XBLqx4L6pYr5tOUySg%3D&x-amz-security-token=FQoDYXdzEP%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDGxe7meYGe3OYhNwcSK3AwcVYJUXaUMf19oVO9s4qNPWN8AHjNNd5rrZhgE9YTkF1OmzyZSL5xHbls664kDP%2Bxd7dz9PIU5O1D%2BVxoDyoYcFiS6acDnO28KpfFE56lUZNfedf1jys%2FP0SJ8f%2F50Cbn93bfOlm0MZA9%2BQ2DYQvPfkWSvrDjMyCXHbu57gpZHjQKPNRTgzGXzUUCvFwREytGMM4eThhn4Glvvx%2FA8IiLbnsvgmEKw9iAj7KWIenhoJq3cTRytUpVeipLnQoBVLau8dFYkKdAHZaYM2Tlx0z6ObRb%2BGdm7W7eOVA1bFXuUXmUmnAHruDIwwLlgOVN2IJ9CxmJU22lY8jrEm%2BUivtrdp2oofn32PryBEJ8jJOg9cIpLbBBx%2FeOkng9zJwnZbute7Nmh%2BnaY2btsId6JjraFNsTvR%2B1qEZX9uuznUdJdqgVfTMj2gGrAmntwk0JAkILlvamzjWC%2F9vAqK7Xvt8aC6hlIMB2vdzTCU9Jf%2FrIMTClTJkk0BzBuvJ86t1l%2BXb4rF5Pab%2FegFpJ6nvZKqde%2F77wMMiTyG35EndmYx4AWqTIh9EofYwKZa9uciNvRT0E2%2BYnT5jZMo%2BdWn2QU%3D]
Date: 2018-06-20 08:37
Status: 400
Content-Type: application/xml
Size: 523 B
<BINARY BODY>
Needless to say, executing
browseURL(final_test$url)
returns Access Denied error. I suspect it may have something to do with Amazon authorization problems similar to those described here. Any ideas how to solve this final hurdle?
Thanks!
The answer was provided by #voracityemail in response to my question on Companies House Developers Hub. Basically, the final call doesn't require the Authorization header, so if you run the following code for final_test:
final_test <- GET(final_url, add_headers(Accept = accept))
It will return 200 code
> final_test
Response [https://s3-eu-west-1.amazonaws.com/document-api-images-prod/docs/Rl1qKy2kNqdskHUIsqU9u0bGzH2goTfJfnCrNg4S0lg/application-pdf?AWSAccessKeyId=ASIAJMG7NTZHYC4NH3MA&Expires=1530093768&Signature=EteMSmwXS%2FqqdOFRmYY%2Fgf187Aw%3D&x-amz-security-token=FQoDYXdzELf%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDOMKrcNPR6jb5bnzGSK3A1yzaoVZWhgAeXYCN9WJnxx8b%2BTKCEEZyZui3aR5j0WoNWIQhW9GIQ8R4xTGVkRjwQIhzgDp%2BRCfXGQ0CfPCOfseaQri5m%2BWTEWBgjfToL7%2FMdcC1IINMTFRrih1APE%2FmmTcQaW7SvyZWv3Q4bVQB%2FtOsiX5k8rWVsT7%2FecfQmnJMljcKF0%2F3vDRTtLRURTCtrdegfnIFrSqXkelLxVVypKY9UeURBgxAgngOgoP7YhYt3wD%2BEz5rBdNfMvF1Zuv91hLGDyBaKuV4fRKMRXlymDHCwNgNZl3JeyuAmnX8pexK6PJzH7MerM8QX8LoPfge1yutvqEj0%2FjRSYEShOWUebecQ2tJqWIEOZly0Ji8fc%2BMtFDO1FWZBrMl6lXgkwTMpELnTH5%2BP4ULMdFfEz30bWSnAuTGXcAxsoFWsFTIE2uO35zgkOsAUT2un4UNGnL2S8XexWbgwq%2B%2Bhtxo9ruP9WA8mTpjBkup2Qe5EpvUiNwGX9APjThi7QFTllVWWvpKgzKTSBh%2Btua9xK8RgiNAYDgEa5k%2BH%2FmWIP56WglBE6r3HGsXgbi%2Bff8Rg8z2lVFLo8f9hVv%2BCYoptXM2QU%3D]
Date: 2018-06-27 10:02
Status: 200
Content-Type: application/pdf
Size: 21.7 kB
<BINARY BODY>
and then
browseURL(final_test$url)
will open the specified document in the browser. Victory!

Refresh Token for Access Token Google API: R Code

I am attempting to retrieve an access token using my refresh token, client id and client secret for the youtube api using R Code.
This is google's example of how to POST a request.
POST /o/oauth2/token HTTP/1.1 Host: accounts.google.com Content-Type: application/x-www-form-urlencoded client_id=21302922996.apps.googleusercontent.com&client_secret=XTHhXh1SlUNgvyWGwDk1EjXB&refresh_token=1/6BMfW9j53gdGImsixUH6kU5RsR4zwI9lUVX-tqf8JXQ&grant_type=refresh_token
This was my r code:
library(httr)
url<- paste("https://accounts.google.com/o/oauth2/token?client_id=", client_id, "&client_secret=", client_secret, "&refresh_token=", refresh_token, "&grant_type=access_token", sep="")
POST(url)
And I keep getting this response:
Response [https://accounts.google.com/o/oauth2/token?client_id=xxxxxxxxxx&client_secret=xxxxxxxx&refresh_token=xxxxxxxxxxxxxxxxxxxxxx&grant_type=refresh_token]
Date: 2015-09-02 16:43
Status: 400
Content-Type: application/json
Size: 102 B
{
"error" : "invalid_request",
"error_description" : "Required parameter is missing: grant_type"
Is there a better way to do this? Maybe using RCurl? If so, what would the format of the request be? I would appreciate help on this!
The RAdwords package has a function to retrieve the refresh token. If you don't want to add the entire package you can just add the following code to your script.
refreshToken = function(google_auth) {
# This function refreshes the access token.
# The access token deprecates after one hour and has to updated
# with the refresh token.
#
# Args:
# access.token$refreh_token and credentials as input
# Returns:
# New access.token with corresponding time stamp
rt = rjson::fromJSON(RCurl::postForm('https://accounts.google.com/o/oauth2/token',
refresh_token=google_auth$access$refresh_token,
client_id=google_auth$credentials$c.id,
client_secret=google_auth$credentials$c.secret,
grant_type="refresh_token",
style="POST",
.opts = list(ssl.verifypeer = FALSE)))
access <- rt
access
}

Debugging RCurl-based authentication & form submission

SourceForge Research Data Archive (SRDA) is one of the data sources for my dissertation research. I'm having difficulty in debugging the following issue related to SRDA data collection.
Data collection from SRDA requires authentication and then submitting Web form with an SQL query. Upon successful processing of the query, the system generates a text file with query results. While testing my R code for SRDA data collection, I've changed the SQL request to make sure that the results file is being regenerated. However, I've discovered that the file contents stays the same (corresponds to previous query). I think that the lack of refresh of the file contents could be due to failure of either authentication, or query form submission. The following is the debug output from the code (https://github.com/abnova/diss-floss/blob/master/import/getSourceForgeData.R):
make importSourceForge
Rscript --no-save --no-restore --verbose getSourceForgeData.R
running
'/usr/lib/R/bin/R --slave --no-restore --no-save --no-restore --file=getSourceForgeData.R'
Loading required package: RCurl
Loading required package: methods
Loading required package: bitops
Loading required package: digest
Retrieving SourceForge data...
Checking request "SELECT *
FROM sf1104.users a, sf1104.artifact b
WHERE a.user_id = b.submitted_by AND b.artifact_id = 304727"...
* About to connect() to zerlot.cse.nd.edu port 80 (#0)
* Trying 129.74.152.47... * connected
> POST /mediawiki/index.php?title=Special:Userlogin&action=submitlogin&type=login HTTP/1.1
Host: zerlot.cse.nd.edu
Accept: */*
Content-Length: 37
Content-Type: application/x-www-form-urlencoded
* upload completely sent off: 37out of 37 bytes
< HTTP/1.1 200 OK
< Date: Tue, 11 Mar 2014 03:49:04 GMT
< Server: Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.25 with Suhosin-Patch
< X-Powered-By: PHP/5.2.4-2ubuntu5.25
* Added cookie wiki_db_session="c61...a3c" for domain zerlot.cse.nd.edu, path /, expire 0
< Set-Cookie: wiki_db_session=c61...a3c; path=/
< Content-language: en
< Vary: Accept-Encoding,Cookie
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Cache-Control: private, must-revalidate, max-age=0
< Transfer-Encoding: chunked
< Content-Type: text/html; charset=UTF-8
<
* Connection #0 to host zerlot.cse.nd.edu left intact
[1] "Before second postForm()"
* Re-using existing connection! (#0) with host zerlot.cse.nd.edu
* Connected to zerlot.cse.nd.edu (129.74.152.47) port 80 (#0)
> POST /cgi-bin/form.pl HTTP/1.1
Host: zerlot.cse.nd.edu
Accept: */*
Cookie: wiki_db_session=c61...a3c
Content-Length: 129
Content-Type: application/x-www-form-urlencoded
* upload completely sent off: 129out of 129 bytes
< HTTP/1.1 500 Internal Server Error
< Date: Tue, 11 Mar 2014 03:49:04 GMT
< Server: Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.25 with Suhosin-Patch
< Vary: Accept-Encoding
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
<
* Closing connection #0
Error: Internal Server Error
Execution halted
make: *** [importSourceForge] Error 1
I've tried to figure this out using debug output as well as Network protocol analyzer from Firefox embedded Developer Tools, but so far without much success. Would appreciate any advice and help.
UPDATE:
if (!require(RCurl)) install.packages('RCurl')
if (!require(digest)) install.packages('digest')
library(RCurl)
library(digest)
# Users must authenticate to access Query Form
SRDA_HOST_URL <- "http://zerlot.cse.nd.edu"
SRDA_LOGIN_URL <- "/mediawiki/index.php?title=Special:Userlogin"
SRDA_LOGIN_REQ <- "&action=submitlogin&type=login"
# SRDA URL that Query Form sends POST requests to
SRDA_QUERY_URL <- "/cgi-bin/form.pl"
# SRDA URL that Query Form sends POST requests to
SRDA_QRESULT_URL <- "/qresult/blekh/blekh.txt"
# Parameters for result's format
DATA_SEP <- ":" # data separator
ADD_SQL <- "1" # add SQL to file
curl <<- getCurlHandle()
srdaLogin <- function (loginURL, username, password) {
curlSetOpt(curl = curl, cookiejar = 'cookies.txt',
ssl.verifyhost = FALSE, ssl.verifypeer = FALSE,
followlocation = TRUE, verbose = TRUE)
params <- list('wpName1' = username, 'wpPassword1' = password)
if(url.exists(loginURL)) {
reply <- postForm(loginURL, .params = params, curl = curl,
style = "POST")
#if (DEBUG) print(reply)
info <- getCurlInfo(curl)
return (ifelse(info$response.code == 200, TRUE, FALSE))
}
else {
error("Can't access login URL!")
}
}
srdaConvertRequest <- function (request) {
return (list(select = "*",
from = "sf1104.users a, sf1104.artifact b",
where = "b.artifact_id = 304727"))
}
srdaRequestData <- function (requestURL, select, from, where, sep, sql) {
params <- list('uitems' = select,
'utables' = from,
'uwhere' = where,
'useparator' = sep,
'append_query' = sql)
if(url.exists(requestURL)) {
reply <- postForm(requestURL, .params = params, #.opts = opts,
curl = curl, style = "POST")
}
}
srdaGetData <- function(request) {
resultsURL <- paste(SRDA_HOST_URL, SRDA_QRESULT_URL,
collapse="", sep="")
results.query <- readLines(resultsURL, n = 1)
return (ifelse(results.query == request, TRUE, FALSE))
}
getSourceForgeData <- function (request) {
# Construct SRDA login and query URLs
loginURL <- paste(SRDA_HOST_URL, SRDA_LOGIN_URL, SRDA_LOGIN_REQ,
collapse="", sep="")
queryURL <- paste(SRDA_HOST_URL, SRDA_QUERY_URL, collapse="", sep="")
# Log into the system
if (!srdaLogin(loginURL, USER, PASS))
error("Login failed!")
rq <- srdaConvertRequest(request)
srdaRequestData(queryURL,
rq$select, rq$from, rq$where, DATA_SEP, ADD_SQL)
if (!srdaGetData(request))
error("Data collection failed!")
}
message("\nTesting SourceForge data collection...\n")
getSourceForgeData("SELECT *
FROM sf1104.users a, sf1104.artifact b
WHERE a.user_id = b.submitted_by AND b.artifact_id = 304727")
# clean up
close(curl)
UPDATE 2 (no functions version):
if (!require(RCurl)) install.packages('RCurl')
library(RCurl)
# Users must authenticate to access Query Form
SRDA_HOST_URL <- "http://zerlot.cse.nd.edu"
SRDA_LOGIN_URL <- "/mediawiki/index.php?title=Special:Userlogin"
SRDA_LOGIN_REQ <- "&action=submitlogin&type=login"
# SRDA URL that Query Form sends POST requests to
SRDA_QUERY_URL <- "/cgi-bin/form.pl"
# SRDA URL that Query Form sends POST requests to
SRDA_QRESULT_URL <- "/qresult/blekh/blekh.txt"
# Parameters for result's format
DATA_SEP <- ":" # data separator
ADD_SQL <- "1" # add SQL to file
message("\nTesting SourceForge data collection...\n")
curl <- getCurlHandle()
curlSetOpt(curl = curl, cookiejar = 'cookies.txt',
ssl.verifyhost = FALSE, ssl.verifypeer = FALSE,
followlocation = TRUE, verbose = TRUE)
# === Authentication ===
loginParams <- list('wpName1' = USER, 'wpPassword1' = PASS)
loginURL <- paste(SRDA_HOST_URL, SRDA_LOGIN_URL, SRDA_LOGIN_REQ,
collapse="", sep="")
if (url.exists(loginURL)) {
postForm(loginURL, .params = loginParams, curl = curl, style = "POST")
info <- getCurlInfo(curl)
message("\nLogin results - HTTP status code: ", info$response.code, "\n\n")
} else {
error("\nCan't access login URL!\n\n")
}
# === Data collection ===
# Previous query was: "SELECT * FROM sf0305.users WHERE user_id < 100"
query <- list(select = "*",
from = "sf1104.users a, sf1104.artifact b",
where = "b.artifact_id = 304727")
getDataParams <- list('uitems' = query$select,
'utables' = query$from,
'uwhere' = query$where,
'useparator' = DATA_SEP,
'append_query' = ADD_SQL)
queryURL <- paste(SRDA_HOST_URL, SRDA_QUERY_URL, collapse="", sep="")
if(url.exists(queryURL)) {
postForm(queryURL, .params = getDataParams, curl = curl, style = "POST")
resultsURL <- paste(SRDA_HOST_URL, SRDA_QRESULT_URL,
collapse="", sep="")
results.query <- readLines(resultsURL, n = 1)
request <- paste(query$select, query$from, query$where)
if (results.query == request)
message("\nData request is successful, SQL query: ", request, "\n\n")
else
message("\nData request failed, SQL query: ", request, "\n\n")
} else {
error("\nCan't access data query URL!\n\n")
}
close(curl)
UPDATE 3 (server-side debugging)
Finally, I was able to get in touch with a person responsible for the system and he helped me to narrow down the issue to cookie management IMHO. Here's the error log record, corresponding to running my code:
[Fri Mar 21 15:33:14 2014] [error] [client 54.204.180.203] [Fri Mar 21
15:33:14 2014] form.pl: /tmp/sess_3e55593e436a013597cd320e4c6a2fac:
at /var/www/cgi-bin/form.pl line 43
The following is the snippet of the server-side script (Perl) that generated that error (line #1 in the script is bash interpreter directive, so reported line number 43 is most likely line number 44):
42 if (-e "/tmp/sess_$file") {
43 $session = PHP::Session->new($cgi->cookie("$session_name"));
44 $user_id = $session->get('wsUserID');
45 $user_name = $session->get('wsUserName');
The following is a session information (1) after authentication and (2) after submitting data request, obtained by tracing manual authentication and manual data request form submission:
(1) "wiki_dbUserID=449; expires=Sun, 20-Apr-2014 21:04:14 GMT;
path=/wiki_dbUserName=Blekh; expires=Sun, 20-Apr-2014 21:04:14 GMT;
path=/wiki_dbToken=deleted; expires=Thu, 21-Mar-2013 21:04:13 GMT"
(2) wiki_db_session=aaed058f97059174a59effe44b137cbc;
_ga=GA1.2.2065853334.1395410153; EDSSID=e24ff5ed891c28c61f2d1f8dec424274; wiki_dbUserName=Blekh;
wiki_dbLoggedOut=20140321210314; wiki_dbUserID=449
Would appreciate any help in figuring out the problem with my code!
Finally, finally, finally! I have figured out what was causing this problem, which gave me so much headache (figuratively and literally). It forced me to spend a lot of time reading various Internet resources (including many SO questions and answers), debugging my code and communicating with people. I spent a lot of time, but not in vain, as I learned a lot about RCurl, cookies, Web forms and HTTP protocol.
The reason appeared much simpler than I thought. While the direct reason of the form submission failure was related to cookie management, the underlying reason was using wrong parameter names (IDs) of the authentication form fields. The two pairs were very similar and it took only one extra character to trigger the whole problem.
Lesson learned: when facing issues, especially ones dealing with authentication, it's very important to check all names and IDs multiple times and very carefully to make sure they correspond the ones supposed to be used. Thank you to everyone who was helping or trying to help me with this issue!
I've simplified the code still further:
library(httr)
base_url <- "http://srda.cse.nd.edu"
loginURL <- modify_url(
base_url,
path = "mediawiki/index.php",
query = list(
title = "Special:Userlogin",
action = "submitlogin",
type = "login",
wpName1 = USER,
wpPasswor1 = PASS
)
)
r <- POST(loginURL)
stop_for_status(r)
queryURL <- modify_url(base_url, path = "cgi-bin/form.pl")
query <- list(
uitems = "user_name",
utables = "sf1104.users a, sf1104.artifact b",
uwhere = "a.user_id = b.submitted_by AND b.artifact_id = 304727",
useparator = ":",
append_query = "1"
)
r <- POST(queryURL, body = query, multipart = FALSE)
stop_for_status(r)
But I'm still getting a 500. I tried:
setting extra cookies that I see in the browser (wiki_dbUserID, wiki_dbUserName)
setting header DNT to 1
setting referer to http://srda.cse.nd.edu/cgi-bin/form.pl
setting user-agent the same as chrome
setting accept "text/html"
The following provides clarification for the scenario (error situation).
From W3C RFC 2616 - HTTP/1.1 Specification:
10.5 Server Error 5xx
Response status codes beginning with the digit "5" indicate cases in
which the server is aware that it has erred or is incapable of
performing the request. Except when responding to a HEAD request, the
server SHOULD include an entity containing an explanation of the error
situation, and whether it is a temporary or permanent condition. User
agents SHOULD display any included entity to the user. These response
codes are applicable to any request method.
10.5.1 500 Internal Server Error
The server encountered an unexpected condition which prevented it from
fulfilling the request.
My interpretation of the paragraph 10.5 is that it implies that there should be a more detailed explanation of the error situation beyond the one provided in paragraph 10.5.1. However, I recognize that it very well may be that the message for status code 500 (paragraph 10.5.1) is considered sufficient. Confirmations for either of interpretations are welcome!

Resources