Debugging RCurl-based authentication & form submission - r

SourceForge Research Data Archive (SRDA) is one of the data sources for my dissertation research. I'm having difficulty in debugging the following issue related to SRDA data collection.
Data collection from SRDA requires authentication and then submitting Web form with an SQL query. Upon successful processing of the query, the system generates a text file with query results. While testing my R code for SRDA data collection, I've changed the SQL request to make sure that the results file is being regenerated. However, I've discovered that the file contents stays the same (corresponds to previous query). I think that the lack of refresh of the file contents could be due to failure of either authentication, or query form submission. The following is the debug output from the code (https://github.com/abnova/diss-floss/blob/master/import/getSourceForgeData.R):
make importSourceForge
Rscript --no-save --no-restore --verbose getSourceForgeData.R
running
'/usr/lib/R/bin/R --slave --no-restore --no-save --no-restore --file=getSourceForgeData.R'
Loading required package: RCurl
Loading required package: methods
Loading required package: bitops
Loading required package: digest
Retrieving SourceForge data...
Checking request "SELECT *
FROM sf1104.users a, sf1104.artifact b
WHERE a.user_id = b.submitted_by AND b.artifact_id = 304727"...
* About to connect() to zerlot.cse.nd.edu port 80 (#0)
* Trying 129.74.152.47... * connected
> POST /mediawiki/index.php?title=Special:Userlogin&action=submitlogin&type=login HTTP/1.1
Host: zerlot.cse.nd.edu
Accept: */*
Content-Length: 37
Content-Type: application/x-www-form-urlencoded
* upload completely sent off: 37out of 37 bytes
< HTTP/1.1 200 OK
< Date: Tue, 11 Mar 2014 03:49:04 GMT
< Server: Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.25 with Suhosin-Patch
< X-Powered-By: PHP/5.2.4-2ubuntu5.25
* Added cookie wiki_db_session="c61...a3c" for domain zerlot.cse.nd.edu, path /, expire 0
< Set-Cookie: wiki_db_session=c61...a3c; path=/
< Content-language: en
< Vary: Accept-Encoding,Cookie
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Cache-Control: private, must-revalidate, max-age=0
< Transfer-Encoding: chunked
< Content-Type: text/html; charset=UTF-8
<
* Connection #0 to host zerlot.cse.nd.edu left intact
[1] "Before second postForm()"
* Re-using existing connection! (#0) with host zerlot.cse.nd.edu
* Connected to zerlot.cse.nd.edu (129.74.152.47) port 80 (#0)
> POST /cgi-bin/form.pl HTTP/1.1
Host: zerlot.cse.nd.edu
Accept: */*
Cookie: wiki_db_session=c61...a3c
Content-Length: 129
Content-Type: application/x-www-form-urlencoded
* upload completely sent off: 129out of 129 bytes
< HTTP/1.1 500 Internal Server Error
< Date: Tue, 11 Mar 2014 03:49:04 GMT
< Server: Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.25 with Suhosin-Patch
< Vary: Accept-Encoding
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
<
* Closing connection #0
Error: Internal Server Error
Execution halted
make: *** [importSourceForge] Error 1
I've tried to figure this out using debug output as well as Network protocol analyzer from Firefox embedded Developer Tools, but so far without much success. Would appreciate any advice and help.
UPDATE:
if (!require(RCurl)) install.packages('RCurl')
if (!require(digest)) install.packages('digest')
library(RCurl)
library(digest)
# Users must authenticate to access Query Form
SRDA_HOST_URL <- "http://zerlot.cse.nd.edu"
SRDA_LOGIN_URL <- "/mediawiki/index.php?title=Special:Userlogin"
SRDA_LOGIN_REQ <- "&action=submitlogin&type=login"
# SRDA URL that Query Form sends POST requests to
SRDA_QUERY_URL <- "/cgi-bin/form.pl"
# SRDA URL that Query Form sends POST requests to
SRDA_QRESULT_URL <- "/qresult/blekh/blekh.txt"
# Parameters for result's format
DATA_SEP <- ":" # data separator
ADD_SQL <- "1" # add SQL to file
curl <<- getCurlHandle()
srdaLogin <- function (loginURL, username, password) {
curlSetOpt(curl = curl, cookiejar = 'cookies.txt',
ssl.verifyhost = FALSE, ssl.verifypeer = FALSE,
followlocation = TRUE, verbose = TRUE)
params <- list('wpName1' = username, 'wpPassword1' = password)
if(url.exists(loginURL)) {
reply <- postForm(loginURL, .params = params, curl = curl,
style = "POST")
#if (DEBUG) print(reply)
info <- getCurlInfo(curl)
return (ifelse(info$response.code == 200, TRUE, FALSE))
}
else {
error("Can't access login URL!")
}
}
srdaConvertRequest <- function (request) {
return (list(select = "*",
from = "sf1104.users a, sf1104.artifact b",
where = "b.artifact_id = 304727"))
}
srdaRequestData <- function (requestURL, select, from, where, sep, sql) {
params <- list('uitems' = select,
'utables' = from,
'uwhere' = where,
'useparator' = sep,
'append_query' = sql)
if(url.exists(requestURL)) {
reply <- postForm(requestURL, .params = params, #.opts = opts,
curl = curl, style = "POST")
}
}
srdaGetData <- function(request) {
resultsURL <- paste(SRDA_HOST_URL, SRDA_QRESULT_URL,
collapse="", sep="")
results.query <- readLines(resultsURL, n = 1)
return (ifelse(results.query == request, TRUE, FALSE))
}
getSourceForgeData <- function (request) {
# Construct SRDA login and query URLs
loginURL <- paste(SRDA_HOST_URL, SRDA_LOGIN_URL, SRDA_LOGIN_REQ,
collapse="", sep="")
queryURL <- paste(SRDA_HOST_URL, SRDA_QUERY_URL, collapse="", sep="")
# Log into the system
if (!srdaLogin(loginURL, USER, PASS))
error("Login failed!")
rq <- srdaConvertRequest(request)
srdaRequestData(queryURL,
rq$select, rq$from, rq$where, DATA_SEP, ADD_SQL)
if (!srdaGetData(request))
error("Data collection failed!")
}
message("\nTesting SourceForge data collection...\n")
getSourceForgeData("SELECT *
FROM sf1104.users a, sf1104.artifact b
WHERE a.user_id = b.submitted_by AND b.artifact_id = 304727")
# clean up
close(curl)
UPDATE 2 (no functions version):
if (!require(RCurl)) install.packages('RCurl')
library(RCurl)
# Users must authenticate to access Query Form
SRDA_HOST_URL <- "http://zerlot.cse.nd.edu"
SRDA_LOGIN_URL <- "/mediawiki/index.php?title=Special:Userlogin"
SRDA_LOGIN_REQ <- "&action=submitlogin&type=login"
# SRDA URL that Query Form sends POST requests to
SRDA_QUERY_URL <- "/cgi-bin/form.pl"
# SRDA URL that Query Form sends POST requests to
SRDA_QRESULT_URL <- "/qresult/blekh/blekh.txt"
# Parameters for result's format
DATA_SEP <- ":" # data separator
ADD_SQL <- "1" # add SQL to file
message("\nTesting SourceForge data collection...\n")
curl <- getCurlHandle()
curlSetOpt(curl = curl, cookiejar = 'cookies.txt',
ssl.verifyhost = FALSE, ssl.verifypeer = FALSE,
followlocation = TRUE, verbose = TRUE)
# === Authentication ===
loginParams <- list('wpName1' = USER, 'wpPassword1' = PASS)
loginURL <- paste(SRDA_HOST_URL, SRDA_LOGIN_URL, SRDA_LOGIN_REQ,
collapse="", sep="")
if (url.exists(loginURL)) {
postForm(loginURL, .params = loginParams, curl = curl, style = "POST")
info <- getCurlInfo(curl)
message("\nLogin results - HTTP status code: ", info$response.code, "\n\n")
} else {
error("\nCan't access login URL!\n\n")
}
# === Data collection ===
# Previous query was: "SELECT * FROM sf0305.users WHERE user_id < 100"
query <- list(select = "*",
from = "sf1104.users a, sf1104.artifact b",
where = "b.artifact_id = 304727")
getDataParams <- list('uitems' = query$select,
'utables' = query$from,
'uwhere' = query$where,
'useparator' = DATA_SEP,
'append_query' = ADD_SQL)
queryURL <- paste(SRDA_HOST_URL, SRDA_QUERY_URL, collapse="", sep="")
if(url.exists(queryURL)) {
postForm(queryURL, .params = getDataParams, curl = curl, style = "POST")
resultsURL <- paste(SRDA_HOST_URL, SRDA_QRESULT_URL,
collapse="", sep="")
results.query <- readLines(resultsURL, n = 1)
request <- paste(query$select, query$from, query$where)
if (results.query == request)
message("\nData request is successful, SQL query: ", request, "\n\n")
else
message("\nData request failed, SQL query: ", request, "\n\n")
} else {
error("\nCan't access data query URL!\n\n")
}
close(curl)
UPDATE 3 (server-side debugging)
Finally, I was able to get in touch with a person responsible for the system and he helped me to narrow down the issue to cookie management IMHO. Here's the error log record, corresponding to running my code:
[Fri Mar 21 15:33:14 2014] [error] [client 54.204.180.203] [Fri Mar 21
15:33:14 2014] form.pl: /tmp/sess_3e55593e436a013597cd320e4c6a2fac:
at /var/www/cgi-bin/form.pl line 43
The following is the snippet of the server-side script (Perl) that generated that error (line #1 in the script is bash interpreter directive, so reported line number 43 is most likely line number 44):
42 if (-e "/tmp/sess_$file") {
43 $session = PHP::Session->new($cgi->cookie("$session_name"));
44 $user_id = $session->get('wsUserID');
45 $user_name = $session->get('wsUserName');
The following is a session information (1) after authentication and (2) after submitting data request, obtained by tracing manual authentication and manual data request form submission:
(1) "wiki_dbUserID=449; expires=Sun, 20-Apr-2014 21:04:14 GMT;
path=/wiki_dbUserName=Blekh; expires=Sun, 20-Apr-2014 21:04:14 GMT;
path=/wiki_dbToken=deleted; expires=Thu, 21-Mar-2013 21:04:13 GMT"
(2) wiki_db_session=aaed058f97059174a59effe44b137cbc;
_ga=GA1.2.2065853334.1395410153; EDSSID=e24ff5ed891c28c61f2d1f8dec424274; wiki_dbUserName=Blekh;
wiki_dbLoggedOut=20140321210314; wiki_dbUserID=449
Would appreciate any help in figuring out the problem with my code!

Finally, finally, finally! I have figured out what was causing this problem, which gave me so much headache (figuratively and literally). It forced me to spend a lot of time reading various Internet resources (including many SO questions and answers), debugging my code and communicating with people. I spent a lot of time, but not in vain, as I learned a lot about RCurl, cookies, Web forms and HTTP protocol.
The reason appeared much simpler than I thought. While the direct reason of the form submission failure was related to cookie management, the underlying reason was using wrong parameter names (IDs) of the authentication form fields. The two pairs were very similar and it took only one extra character to trigger the whole problem.
Lesson learned: when facing issues, especially ones dealing with authentication, it's very important to check all names and IDs multiple times and very carefully to make sure they correspond the ones supposed to be used. Thank you to everyone who was helping or trying to help me with this issue!

I've simplified the code still further:
library(httr)
base_url <- "http://srda.cse.nd.edu"
loginURL <- modify_url(
base_url,
path = "mediawiki/index.php",
query = list(
title = "Special:Userlogin",
action = "submitlogin",
type = "login",
wpName1 = USER,
wpPasswor1 = PASS
)
)
r <- POST(loginURL)
stop_for_status(r)
queryURL <- modify_url(base_url, path = "cgi-bin/form.pl")
query <- list(
uitems = "user_name",
utables = "sf1104.users a, sf1104.artifact b",
uwhere = "a.user_id = b.submitted_by AND b.artifact_id = 304727",
useparator = ":",
append_query = "1"
)
r <- POST(queryURL, body = query, multipart = FALSE)
stop_for_status(r)
But I'm still getting a 500. I tried:
setting extra cookies that I see in the browser (wiki_dbUserID, wiki_dbUserName)
setting header DNT to 1
setting referer to http://srda.cse.nd.edu/cgi-bin/form.pl
setting user-agent the same as chrome
setting accept "text/html"

The following provides clarification for the scenario (error situation).
From W3C RFC 2616 - HTTP/1.1 Specification:
10.5 Server Error 5xx
Response status codes beginning with the digit "5" indicate cases in
which the server is aware that it has erred or is incapable of
performing the request. Except when responding to a HEAD request, the
server SHOULD include an entity containing an explanation of the error
situation, and whether it is a temporary or permanent condition. User
agents SHOULD display any included entity to the user. These response
codes are applicable to any request method.
10.5.1 500 Internal Server Error
The server encountered an unexpected condition which prevented it from
fulfilling the request.
My interpretation of the paragraph 10.5 is that it implies that there should be a more detailed explanation of the error situation beyond the one provided in paragraph 10.5.1. However, I recognize that it very well may be that the message for status code 500 (paragraph 10.5.1) is considered sufficient. Confirmations for either of interpretations are welcome!

Related

Posting a file to an outside API with a JSON web token

I am trying to upload a database file to an outside organization's API using R. I have a username and password, as well as an separate address to get the token from, and then to upload the file.
usr<-"username"
pw<-"passwood"
url <- "https:/routurl/api/"
Token='Token'
UploadFile='UploadFile'
#Get Token
r <- httr::POST(url = paste0(url,Token),
body = list(
UserName = usr,
Password = pw,
grant_type = "password"
), verbose())
tkn=jsonlite::prettify(httr::content(r, "text"))
This seems to work, as I can extract a token from the content.
> tkn
{
"result": {
"token": "eyJhbGciOiJIUzFAKEIsInR5cCI6IkpCJ9.eyJodHRwOi8vc2NoZW1hcy54bWxzb2FwLm9yZy93cy8yMDA1LzA1L2lkZW50aXR5L2NsYWltcy9uYW1lIjoiZ3JphzZSIsImp0aSI6IjUwNmIwN2MyLTTHISISFAKEIwMDUvMDUvaWRlbnRpdHkvY2xhaW1zL2VIVECHANGEDTHINGScyI6ImVtaWx5dGdyaWZmaXRoc0BiaW9zLmF1LmRrIiwiZXhwIjoxNTk4NzEwMTU3LCJpc3MiOiJ2bXNhcHAiLCJhdWQiOiJ2bXN1c2VycyJ9.z8sr-HT21u1bN7qCEXAMPLEONLY-TKAluO3k",
"expiration": "29 August 2020 16:09:17"
},
"id": 2,
"exception": null,
"status": 5,
"isCanceled": false,
"isCompleted": true,
"isCompletedSuccessfully": true,
"creationOptions": 0,
"asyncState": null,
"isFaulted": false
}
#re-formatting
tkn=jsonlite::fromJSON(content(r, "text"), simplifyVector = FALSE)
So, this all seems ok, however, if I try to double check this on the JSON DeCoder, my correct web information comes up in the payload, but at the bottom it claims it is an invalid signature.
Also, the auth_token variable is NULL in the request, and that doesn't seem right.
> r$request$auth_token
NULL
However, I can't test this because I cannot, for the life of me, figure out how to use this JWT to POST a file to the rooturl/UploadFile. Every document I look at that goes over how to POST to an API does not include how to include your JWT in the POST, or at least it isn't very clear. Is it in the header? Is it like this?
r2=POST(url=paste0(url,UploadFile), body = list(y = upload_file('O:/Igoturfilerighthere.h5')),
add_headers('Authorization' = paste("Bearer", tkn$result$token, sep = " ")), encode = "json", verbose())
Am I setting the headers incorrectly?
r3=POST(url=paste0(url,UploadFile), body = list(y = upload_file('O:/Igoturfilerighthere.h5')),
httr::add_headers("x-auth-token"=tkn$result$token), verbose())
For the r3 request I get a 401 error, which makes me think that I am on the correct path and that I am entering my token information incorrectly. If anyone could help guide me on the next step, I'd appreciate it. I just don't know where else to place that information.
Cheers,
etg
UPDATE:
If, in the initial request, I add 'encode = "json"', it throws a 400 Bad Request Error. This is how the website I am trying to upload to writes its own code. I've double checked my username and password, and they are correct.
r <- httr::POST(url = paste0(url,Token),
body = list(
UserName = usr,
Password = pw,
grant_type = "password"
),encode = "json", verbose())
HTTP/1.1 400 Bad Request
Transfer-Encoding: chunked
Content-Type: application/problem+json; charset=utf-8
Server: Microsoft-IIS/10.0
Strict-Transport-Security: max-age=2592000
X-Powered-By: ASP.NET
So, I reached out to the org behind the API I was trying to access, and there few a few problems with my JWT request. This is the correct code:
r <- httr::POST(paste0(url,Token),
body = list(UserName = usr, password = pw),
encode = "form", verbose())
The big difference is 'grant_type' is removed and 'encode="form"', as I was trying to log in via a form on their site. With that difference, I was able to upload a file using the following:
r2=POST(url=paste0(url,UploadFile), body = list(fileToUpload = httr::upload_file('O:/IGotUrFileHere.h5')),
httr::add_headers('Authorization' = paste("Bearer", tkn$result$token, sep = " ")), verbose())
Again, the verbose() function isn't necessary. It just helps you troubleshoot. Good luck!

how to get data from the WTO API in R

library(httr)
library(jsonlite)
headers = c(
# Request headers
'Ocp-Apim-Subscription-Key' = '{subscription key}'
)
params = list()
# Request parameters
params['countries[]'] = '{array}'
resp <- GET(paste0("https://api.wto.org/tfad/transparency/procedures_contacts_single_window?"
, paste0(names(params),'=',params,collapse = "&")),
add_headers(headers))
if(!http_error(resp)){
jsonRespText<-fromJSON(rawToChar(content(resp,encoding = 'UTF-8')))$Dataset
jsonRespText
}else{
stop('Error in Response')
}
I don't know how to get response from an API in R. I have executed this code but the server is not responding...
If you examine the value of the resp object after running your code you'll notice a status code:
> resp
Response [https://tfadatabase.org/api/transparency/procedures_contacts_single_window?countries[]=%7Barray%7D]
Date: 2020-04-17 19:25
Status: 422
Content-Type: application/json
Size: 77 B
So the server actually did respond, it just didn't give you what you were hoping for. In the API documentation we can look up this code:
422 Unprocessable Entity
If a member cannot be found, or the request parameters are poorly
formed.
So I just went to the Query Builder and looked for a valid request URL and updated the code. It ran fine - i.e. Status 200.
This was the URL I used in the code:
https://api.wto.org/timeseries/v1/data?i=TP_A_0100&r=000&fmt=json&mode=full&lang=1&meta=false
and the value of resp was
Date: 2020-04-17 19:30
Status: 200
Content-Type: application/json; charset=utf-8
Size: 88 B
I cut out the subscription key in my results above. You can find the Query Builder here. Incidentally, in the Query Builder it automatically includes the subscription key and other "header" info in the URL. You can either remove that first and re-add it in your code, or just change your code to run GET() directly on their version of the URL.

Calling a REST API in R

I recently discovered the dataforseo api and tryed to call it via R
library(httr)
username <- 'mygmailadress#gmail.com'
password <- 'mypassword'
dataforseo_api <- POST('https://api.dataforseo.com/v2/op_tasks_post/$data',
authenticate(username,password),
body = list(grant_type = 'client_credentials'),
type = "basic",
verbose()
)
This is the message I have received:
<- HTTP/1.1 401 Unauthorized
<- Server: nginx/1.14.0 (Ubuntu)
<- Date: Sun, 08 Jul 2018 13:31:34 GMT
<- Content-Type: application/json
<- Transfer-Encoding: chunked
<- Connection: keep-alive
<- WWW-Authenticate: Basic realm="Rest Server"
<- Cache-Control: no-cache, must-revalidate
<- Expires: 0
<- Access-Control-Allow-Origin: *
<- Access-Control-Allow-Methods: POST, GET, OPTIONS
<- Access-Control-Allow-Headers: Content-Type, Access-Control-Allow-Headers, Authorization, X-Requested-With
Do you know where my issue should come? Can you please help?
It looks like you're improperly configuring config. I don't see a config= in your code. The body is also not encoded correctly.
Also, in the API documentation I don't see anything about grant_type. It looks like an array of tasks should go there, e.g. something like:
{882394209: {'site': 'ranksonic.com', 'crawl_max_pages': 10}}
Response:
{'results_count': 1, 'results_time': '0.0629 sec.', 'results': {'2308949': {'post_id': 2308949, 'post_site': 'ranksonic.com',
'task_id': 882394209, 'status': 'ok'}}, 'status': 'ok'}
OK, so first off we need set_config or config=:
username <- 'Hack-R#stackoverflow.com' # fake email
password <- 'vxnyM9s7FAKESeIO' # fake password
set_config(authenticate(username,password), override = TRUE)
GET("https://api.dataforseo.com/v2/cmn_se")
Response [https://api.dataforseo.com/v2/cmn_se]
Date: 2018-07-08 16:20
Status: 200
Content-Type: application/json
Size: 551 kB
{
"status": "ok",
"results_time": "0.0564 sec.",
"results_count": 2187,
"results": [
{
"se_id": 37,
"se_name": "google.com.af",
"se_country_iso_code": "AF",
"se_country_name": "Afghanistan",
...
GET("https://api.dataforseo.com/v2/cmn_se/$country_iso_code")
Response [https://api.dataforseo.com/v2/cmn_se/$country_iso_code]
Date: 2018-07-08 15:48
Status: 200
Content-Type: application/json
Size: 100 B
{
"status": "ok",
"results_time": "0.0375 sec.",
"results_count": 0,
"results": []
GET("https://api.dataforseo.com/v2/cmn_se/$op_tasks_post")
Response [https://api.dataforseo.com/v2/cmn_se/$op_tasks_post]
Date: 2018-07-08 16:10
Status: 200
Content-Type: application/json
Size: 100 B
{
"status": "ok",
"results_time": "0.0475 sec.",
"results_count": 0,
"results": []
That was one thing. Also to POST data they need you to specify it as json, e.g. encode = "json". From their docs:
All POST data should be sent in the JSON format (UTF-8 encoding). The
keywords are sent by POST method passing tasks array. The data should
be specified in the data field of this POST array. We recommend to
send up to 100 tasks at a time.
Further:
The task setting is done using POST method when array of tasks is sent to
the data field. Each of the array elements has the following
structure:
then it goes on to list 2 required fields and many optional ones.
Note also that you can use reset_config() after as a better practice. If you're going to be running this a lot, sharing it, or using more than 1 computer I would also suggest to put your credentials in environment variables instead of your script for security and ease.
Another final word of advice is that you may want to just leverage their published Python client library and large compilation of examples. Since every new API request is something you'll be pioneering in R without their support, it may pay off to just do the data collection in Python.
This is an interesting API. If you get over to the Open Data Stack Exchange you should consider sharing it with that community.

R: fetching pdf documents from Companies House API

I'm trying to fetch documents from the API using R. Appreciate the clarification of the process in this post. I've been following the above steps with partial success, but still fail the last step to get access to documents' content:
Find the document filing you're interested in (e.g. make a filing history request1 for the company). Parse the response for the link to the document in the field "links" : { "document_metadata" : "link URI fragment here" }.
No problem:
library(httr)
library(jsonlite)
library(openssl)
### retrieving filing history ####
company_num = 'FC013908'
key = 'my_key'
fh_path = paste0('/company/', str_to_upper(company_num), "/filing-history")
fh_url <- modify_url("https://api.companieshouse.gov.uk/", path = fh_path)
fh_test <- GET(fh_url, authenticate(key, "")) #status_code = 200
fh_parsed <- jsonlite::fromJSON(content(fh_test, "text",encoding = "utf-8"), flatten = TRUE)
docs <- fh_parsed$items
Done.
2 For a given document request the document metadata via CH Document API3. Parse the response to get the document (mime) types available and the link to the actual document data (document URI fragment).
No problems here:
md_meta_url = docs$links.document_metadata[1]
key_pass <- paste0(key,":")
decoded_auth <- paste0('Basic ', base64_encode(key_pass))
md_test <- GET(md_meta_url,
add_headers(Authorization = decoded_auth)
)
md_test #status_code = 200!
md_parsed <- jsonlite::fromJSON(content(md_test, "text",encoding = "utf-8"), flatten = TRUE)
This way I can obtain the content URL:
cont_url = md_parsed$links$document
Request the actual document9, specifying the mime type (e.g. "application/pdf").
I do it while NOT following the redirect and, as expected, I get the 302 status code with the location header:
accept = 'application/pdf'
cont_test <- GET(cont_url,
add_headers(Authorization = decoded_auth,
Accept = accept),
config(followlocation = FALSE)
)
final_url <- cont_test$headers$location
> final_url
[1] "https://s3-eu-west-1.amazonaws.com/document-api-images-prod/docs/LjBouRHeXXpIYAvqYIPWL06iXaliPz6Pucp1OXCXQhI/application-pdf?AWSAccessKeyId=ASIAJX7TVURFXZTY5DNQ&Expires=1529483765&Signature=uUQx6RTW7XBLqx4L6pYr5tOUySg%3D&x-amz-security-token=FQoDYXdzEP%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDGxe7meYGe3OYhNwcSK3AwcVYJUXaUMf19oVO9s4qNPWN8AHjNNd5rrZhgE9YTkF1OmzyZSL5xHbls664kDP%2Bxd7dz9PIU5O1D%2BVxoDyoYcFiS6acDnO28KpfFE56lUZNfedf1jys%2FP0SJ8f%2F50Cbn93bfOlm0MZA9%2BQ2DYQvPfkWSvrDjMyCXHbu57gpZHjQKPNRTgzGXzUUCvFwREytGMM4eThhn4Glvvx%2FA8IiLbnsvgmEKw9iAj7KWIenhoJq3cTRytUpVeipLnQoBVLau8dFYkKdAHZaYM2Tlx0z6ObRb%2BGdm7W7eOVA1bFXuUXmUmnAHruDIwwLlgOVN2IJ9CxmJU22lY8jrEm%2BUivtrdp2oofn32PryBEJ8jJOg9cIpLbBBx%2FeOkng9zJwnZbute7Nmh%2BnaY2btsId6JjraFNsTvR%2B1qEZX9uuznUdJdqgVfTMj2gGrAmntwk0JAkILlvamzjWC%2F9vAqK7Xvt8aC6hlIMB2vdzTCU9Jf%2FrIMTClTJkk0BzBuvJ86t1l%2BXb4rF5Pab%2FegFpJ6nvZKqde%2F77wMMiTyG35EndmYx4AWqTIh9EofYwKZa9uciNvRT0E2%2BYnT5jZMo%2BdWn2QU%3D"
However, when I try to
Request this URI from Amazon again passing the content type you want again.
I get 400 error:
final_test <- GET(final_url,
add_headers(Authorization = decoded_auth,
Accept = accept
))
> final_test
Response [https://s3-eu-west-1.amazonaws.com/document-api-images-prod/docs/LjBouRHeXXpIYAvqYIPWL06iXaliPz6Pucp1OXCXQhI/application-pdf?AWSAccessKeyId=ASIAJX7TVURFXZTY5DNQ&Expires=1529483765&Signature=uUQx6RTW7XBLqx4L6pYr5tOUySg%3D&x-amz-security-token=FQoDYXdzEP%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDGxe7meYGe3OYhNwcSK3AwcVYJUXaUMf19oVO9s4qNPWN8AHjNNd5rrZhgE9YTkF1OmzyZSL5xHbls664kDP%2Bxd7dz9PIU5O1D%2BVxoDyoYcFiS6acDnO28KpfFE56lUZNfedf1jys%2FP0SJ8f%2F50Cbn93bfOlm0MZA9%2BQ2DYQvPfkWSvrDjMyCXHbu57gpZHjQKPNRTgzGXzUUCvFwREytGMM4eThhn4Glvvx%2FA8IiLbnsvgmEKw9iAj7KWIenhoJq3cTRytUpVeipLnQoBVLau8dFYkKdAHZaYM2Tlx0z6ObRb%2BGdm7W7eOVA1bFXuUXmUmnAHruDIwwLlgOVN2IJ9CxmJU22lY8jrEm%2BUivtrdp2oofn32PryBEJ8jJOg9cIpLbBBx%2FeOkng9zJwnZbute7Nmh%2BnaY2btsId6JjraFNsTvR%2B1qEZX9uuznUdJdqgVfTMj2gGrAmntwk0JAkILlvamzjWC%2F9vAqK7Xvt8aC6hlIMB2vdzTCU9Jf%2FrIMTClTJkk0BzBuvJ86t1l%2BXb4rF5Pab%2FegFpJ6nvZKqde%2F77wMMiTyG35EndmYx4AWqTIh9EofYwKZa9uciNvRT0E2%2BYnT5jZMo%2BdWn2QU%3D]
Date: 2018-06-20 08:37
Status: 400
Content-Type: application/xml
Size: 523 B
<BINARY BODY>
Needless to say, executing
browseURL(final_test$url)
returns Access Denied error. I suspect it may have something to do with Amazon authorization problems similar to those described here. Any ideas how to solve this final hurdle?
Thanks!
The answer was provided by #voracityemail in response to my question on Companies House Developers Hub. Basically, the final call doesn't require the Authorization header, so if you run the following code for final_test:
final_test <- GET(final_url, add_headers(Accept = accept))
It will return 200 code
> final_test
Response [https://s3-eu-west-1.amazonaws.com/document-api-images-prod/docs/Rl1qKy2kNqdskHUIsqU9u0bGzH2goTfJfnCrNg4S0lg/application-pdf?AWSAccessKeyId=ASIAJMG7NTZHYC4NH3MA&Expires=1530093768&Signature=EteMSmwXS%2FqqdOFRmYY%2Fgf187Aw%3D&x-amz-security-token=FQoDYXdzELf%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDOMKrcNPR6jb5bnzGSK3A1yzaoVZWhgAeXYCN9WJnxx8b%2BTKCEEZyZui3aR5j0WoNWIQhW9GIQ8R4xTGVkRjwQIhzgDp%2BRCfXGQ0CfPCOfseaQri5m%2BWTEWBgjfToL7%2FMdcC1IINMTFRrih1APE%2FmmTcQaW7SvyZWv3Q4bVQB%2FtOsiX5k8rWVsT7%2FecfQmnJMljcKF0%2F3vDRTtLRURTCtrdegfnIFrSqXkelLxVVypKY9UeURBgxAgngOgoP7YhYt3wD%2BEz5rBdNfMvF1Zuv91hLGDyBaKuV4fRKMRXlymDHCwNgNZl3JeyuAmnX8pexK6PJzH7MerM8QX8LoPfge1yutvqEj0%2FjRSYEShOWUebecQ2tJqWIEOZly0Ji8fc%2BMtFDO1FWZBrMl6lXgkwTMpELnTH5%2BP4ULMdFfEz30bWSnAuTGXcAxsoFWsFTIE2uO35zgkOsAUT2un4UNGnL2S8XexWbgwq%2B%2Bhtxo9ruP9WA8mTpjBkup2Qe5EpvUiNwGX9APjThi7QFTllVWWvpKgzKTSBh%2Btua9xK8RgiNAYDgEa5k%2BH%2FmWIP56WglBE6r3HGsXgbi%2Bff8Rg8z2lVFLo8f9hVv%2BCYoptXM2QU%3D]
Date: 2018-06-27 10:02
Status: 200
Content-Type: application/pdf
Size: 21.7 kB
<BINARY BODY>
and then
browseURL(final_test$url)
will open the specified document in the browser. Victory!

Post to Tumblr API with R - Error "fromJSON" only for photo posts

I have been working with R to post to tumblr through their API. I use the R package tumblR. Everything works fine for text- and link posts but I keep getting errors when trying to post photos.
Not sure what the issue is here, maybe some wrong syntax the way I insert the link for the photo? Tried to debug but could not solve it. Was hoping you guys can help me?
I am also posting the code for text and link posts, maybe some of you see value in this for yourself.
[R CODE]
require(tumblR)
require(httpuv)
### Authorize
consumer_key <-'key'
consumer_secret <- 'secret'
appname <- 'appname'
tokenURL <- 'http://www.tumblr.com/oauth/request_token'
accessTokenURL <- 'http://www.tumblr.com/oauth/access_token'
authorizeURL <- 'http://www.tumblr.com/oauth/authorize'
app <- oauth_app(appname, consumer_key, consumer_secret)
endpoint <- oauth_endpoint(tokenURL, authorizeURL, accessTokenURL)
token <- oauth1.0_token(endpoint, app)
sig <- sign_oauth1.0(app,
token = token$credentials$oauth_token,
token_secret = token$credentials$oauth_token_secret)
### Post Text
post(base_hostname = "blogname.tumblr.com", type = "text", state = "published", tags = 'tag',
body = 'this is the body', token = token, consumer_key = consumer_key, consumer_secret = consumer_secret)
# => Shows: "* Hostname was NOT found in DNS cache" but posts the textpost to tumblr
### Post Link
post(base_hostname = "blogname.tumblr.com", type = "link", state = "published", tags = 'tag', url_link= 'www.somelink.de',
title_link= 'linkTitle', description= 'this is the description', token = token, consumer_key = consumer_key, consumer_secret = consumer_secret)
# => Shows: "* Hostname was NOT found in DNS cache" but posts the linkpost to tumblr
### Post Photo
post(base_hostname = "blogname.tumblr.com", type = "photo", tags = "tag", caption_photo = 'photoTitle',
link = "http://bilder.bild.de/fotos/bde-logo-35166394/Bild/20.bild.png",
source_photo = "http://bilder.bild.de/fotos/bde-logo-35166394/Bild/20.bild.png", data_photo= NA,
token = token, consumer_key = consumer_key, consumer_secret = consumer_secret)
# => Shows the following error and doese NOT (!) post the photo:
# * Hostname was NOT found in DNS cache
# * Trying 66.6.41.23...
# * Connected to api.tumblr.com (66.6.41.23) port 80 (#0)
# > POST /v2/blog/blogname.tumblr.com/post HTTP/1.1
# User-Agent: RCurl
# Host: api.tumblr.com
# Accept: application/json
# Authorization: ...
# Content-Length: 490
# Content-Type: application/x-www-form-urlencoded
#
# * upload completely sent off: 490 out of 490 bytes
# < HTTP/1.1 401 Not Authorized (!!!!!!)
# < Server: nginx
# < Date: Sun, 22 Mar 2015 12:14:43 GMT
# < Content-Type: application/json; charset=utf-8
# < Transfer-Encoding: chunked
# < Connection: close
# * Closing connection 0
# Warning message:
# In if (class(token) != "Token1.0") stop("token must be a Token1.0 type") :
# the condition has length > 1 and only the first element will be used
# Error in fromJSON(http.connection(url, token, bodyParams, consumer_key, :
# error in evaluating the argument 'content' in selecting a method for function 'fromJSON': Error: Not Authorized
For what I researched, tumblr API might just return "401 Not authorized" because the syntax of the request is not correct, it does not have to be related to the authorization (key, secret etc.) itself. As the same credentials work for text and link post, I believe it should be a different problem.
tumblR description says either us "data_photo" or "source_photo", so I guess NA for one of them is ok. Tried both anyway.
Any help is appreciated, thanks!!!
A new version (1.1) of tumblR has been released.
With this version:
the dependency from the ROAuth package has been eliminated;
the "post" function with "photo" type has been fixed;
the warning message about token has been removed.
Thanks for reporting!

Resources