Error connecting to azure blob storage API from R - r

I am attempting to work with Azure storage via the REST API in R. I'm using the package httr which overlays Curl.
Setup
You can use R-fiddle: http://www.r-fiddle.org/#/fiddle?id=vh8uqGmM
library(httr)
requestdate<-format(Sys.time(),"%a, %d %b %Y %H:%M:%S GMT")
url<-"https://preconstuff.blob.core.windows.net/pings?restype=container&comp=list"
sak<-"Q8HvUVJLBJK+wkrIEG6LlsfFo19iDjneTwJxX/KXSnUCtTjgyyhYnH/5azeqa1bluGD94EcPcSRyBy2W2A/fHQ=="
signaturestring<-paste0("GET",paste(rep("\n",12),collapse=""),
"x-ms-date:",requestdate,"
x-ms-version:2009-09-19
/preconstuff/pings
comp:list
restype:container")
headerstuff<-add_headers(Authorization=paste0("SharedKey preconstuff:",
RCurl::base64(digest::hmac(key=sak,
object=enc2utf8(signaturestring),
algo= "sha256"))),
`x-ms-date`=requestdate,
`x-ms-version`= "2009-09-19")
Trying to list blobs:
content(GET(url,config = headerstuff, verbose() ))
Error
Top level message
The MAC signature found in the HTTP request 'Q8HvUVJLBJK+wkrIEG6LlsfFo19iDjneTwJxX/KXSnUCtTjgyyhYnH/5azeqa1bluGD94EcPcSRyBy2W2A/fHQ==' is not the same as any computed signature.
Response content
[1] "<?xml version=\"1.0\" encoding=\"utf-8\"?><Error>
<Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:1ab26da5-0001-00dc-6ddb-15e35c000000\nTime:2015-03-26T17:51:42.7190620Z</Message>
<AuthenticationErrorDetail>The MAC signature found in the HTTP request 'NTM1ODZjMjhhZmMyZGM3NDM0YTFjZDgwNGE0ODVmMzVjNDhkNjBkNzk1ZjNkZjJjOTNlNjUxYTMwMjRhNzNlYw==' is not the same as any computed signature. Server used following string to sign:
'GET\n\n\n\n\n\n\n\n\n\n\n\nx-ms-date:Thu, 26 Mar 2015 17:52:37 GMT\nx-ms-version:2009-09-19\n/preconstuff/pings\ncomp:list\nrestype:container'.
</AuthenticationErrorDetail></Error>"
Verbose output
-> GET /pings?restype=container&comp=list HTTP/1.1
-> User-Agent: curl/7.39.0 Rcurl/1.95.4.5 httr/0.6.1
-> Host: preconstuff.blob.core.windows.net
-> Accept-Encoding: gzip
-> Accept: application/json, text/xml, application/xml, */*
-> Authorization: SharedKey preconstuff:OTRhNTgzYmY3OTY3M2UzNjk3ODdjMzk3OWM3ZmU0OTA4MWU5NTE2OGYyZGU3YzRjNjQ1M2NkNzY0ZTcyZDRhYQ==
-> x-ms-date: Thu, 26 Mar 2015 17:56:27 GMT
-> x-ms-version: 2009-09-19
->
<- HTTP/1.1 403 Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
<- Content-Length: 719
<- Content-Type: application/xml
<- Server: Microsoft-HTTPAPI/2.0
<- x-ms-request-id: 3d47770c-0001-0085-2313-6d466f000000
<- Date: Thu, 26 Mar 2015 17:56:27 GMT
<-
Resolving the error
Googling for this issue doesn't seem to yield a consistent cause, but it's likely due to bad formatting / request structure on my part. To that end I checked:
I've verified my key is correct (it's just c&p from portal)
I've made sure the date is correctly formatted
There was a recent documentDB SO that suggested it could be to be a clock skew issue and I do note that my x-ms-date is a second ahead of the Date in the response. I've tried sending a fixed value that was definitely in the past, but within the 15 mins tolerance. Didn't get a change in message.
Added encoding="Base64" in headerstuff further to an MSDN forum question but the same error message was returned
Further to #Serdar's answer, I incorporated the construction of a signature string (I've verified that this matches the one provided in the error message so it), then encoding in base64 a hmac-sha256 (using the secondary access key (sak) as the encryption key) version of the UTF8 converted signaturestring as the value to be used in the SharedKey authorisation.
Further to #Serdar's comment, the date used in the signature string and for the main request must be the same, so defined once and reused
Is there anything obviously wrong? Are there other things to check? Does the code work for others?

Looks like your problem is with the key. The string of the key you have provided is actually base64 encoded. You need to decode that to the raw vector before you use it to sign the request. For example:
url<-"https://preconstuff.blob.core.windows.net/pings?restype=container&comp=list"
sak<-"Q8HvUVJLBJK+wkrIEG6LlsfFo19iDjneTwJxX/KXSnUCtTjgyyhYnH/5azeqa1bluGD94EcPcSRyBy2W2A/fHQ=="
requestdate<-format(Sys.time(),"%a, %d %b %Y %H:%M:%S %Z", tz="GMT")
signaturestring<-paste0("GET",paste(rep("\n",12),collapse=""),
"x-ms-date:",requestdate,"
x-ms-version:2009-09-19
/preconstuff/pings
comp:list
restype:container")
headerstuff<-add_headers(Authorization=paste0("SharedKey preconstuff:",
RCurl::base64(digest::hmac(key=RCurl::base64Decode(sak, mode="raw"),
object=enc2utf8(signaturestring),
algo= "sha256", raw=TRUE))),
`x-ms-date`=requestdate,
`x-ms-version`= "2009-09-19")
content(GET(url,config = headerstuff, verbose() ))
There are no more authentication errors this way, though no blobs are listed. Perhaps that's a different issue.
Also, I changed the way the date/time was created to more "safely" change the local time to GMT.

It looks like you are using the key of your account directly in the Authorization header. To authenticate a request, you must sign the request with the key for the account that is making the request and pass that signature as part of the request. Please see Authentication for the Azure Storage Services for more information on how to construct the Authorization header.
Please also note that the service returns StringToSign in the error response. So, what your code should have done is to apply the following formula to StringToSign="GET\n\n\n\n\n\n\n\n\n\n\n\nx-ms-date:Wed, 25 Mar 2015 22:24:12 GMT\nx-ms-version:2014-02-14\n/preconstuff/pings\ncomp:list\nrestype:container" (without quotes):
Signature=Base64(HMAC-SHA256(AccountKey, UTF8(StringToSign)))
How the service calculates StringToSign is explained in detail in the link shared above.

Worked trough the example code by MrFlick above and to get it to work I had to change a few things.
The date string has to be set in an English locale, for example:
lct <- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME", "us")
requestdate <- format(Sys.time(),"%a, %d %b %Y %H:%M:%S %Z", tz="GMT")
Sys.setlocale("LC_TIME", lct)
The 'signaturestring' should be formated with \n between parameters:
signaturestring <- paste0("GET", paste(rep("\n", 12), collapse=""),
"x-ms-date:", requestdate,
"\nx-ms-version:2009-09-19\n/preconstuff/pings\ncomp:list\nrestype:container")
EDIT: Following procedure works for me. Based on Steph Locke example.
library(httr)
library(RCurl)
azureBlobCall <- function(url, verb, key, requestBody=NULL, headers=NULL, ifMatch="", md5="") {
urlcomponents <- httr::parse_url(url)
account <- gsub(".blob.core.windows.net", "", urlcomponents$hostname, fixed = TRUE)
container <- urlcomponents$path
# get timestamp in us locale
lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "us")
`x-ms-date` <- format(Sys.time(),"%a, %d %b %Y %H:%M:%S %Z", tz="GMT")
Sys.setlocale("LC_TIME", lct)
# if requestBody exist get content length in bytes and content type
`Content-Length` <- ""; `Content-Type` <- ""
if(!is.null(requestBody)) {
if(class(requestBody) == "form_file") {
`Content-Length` <- (file.info(requestBody$path))$size
`Content-Type` <- requestBody$type
} else {
requestBody <- enc2utf8(as.character(requestBody))
`Content-Length` <- nchar(requestBody, "bytes")
`Content-Type` <- "text/plain; charset=UTF-8"
}
}
# combine timestamp and version headers with any input headers, order and create the CanonicalizedHeaders
headers <- setNames(c(`x-ms-date`, "2015-04-05", unlist(headers)),
c("x-ms-date", "x-ms-version", unclass(names(unlist(headers)))))
headers <- headers[order(names(headers))]
CanonicalizedHeaders <- paste(names(headers), headers, sep=":", collapse = "\n")
# create CanonicalizedResource headers and add any queries to it
if(!is.null(urlcomponents$query)) {
components <- setNames(unlist(urlcomponents$query), unclass(names(unlist(urlcomponents$query))))
componentstring <- paste0("\n", paste(names(components[order(names(components))]),
components[order(names(components))], sep=":", collapse = "\n"))
} else componentstring <- ""
CanonicalizedResource <- paste0("/",account,"/",container, componentstring)
# create the authorizationtoken
signaturestring <- paste0(verb, "\n\n\n", `Content-Length`, "\n", md5, "\n", `Content-Type`, "\n\n\n",
ifMatch, "\n\n\n\n", CanonicalizedHeaders, "\n", CanonicalizedResource)
requestspecificencodedkey <- RCurl::base64(
digest::hmac(key=RCurl::base64Decode(key, mode="raw"),
object=enc2utf8(signaturestring),
algo= "sha256", raw=TRUE)
)
authorizationtoken <- paste0("SharedKey ", account, ":", requestspecificencodedkey)
# make the call
headers_final <- add_headers(Authorization=authorizationtoken, headers, `Content-Type` = `Content-Type`)
call <- httr::VERB(verb=verb, url=url, config=headers_final, body=requestBody, verbose())
print("signaturestring");print(signaturestring);
print(headers_final); print(call)
return(content(call))
}
## Tests. Replace 'key' and 'accountName' with yours
key <- "YowThr***********RDw=="
# Creates a container named 'test'
azureBlobCall("https://accountName.blob.core.windows.net/test?restype=container", "PUT", key)
# Creates a blob named 'blob' under container 'test' with the content of "Hej världen!"
azureBlobCall("https://accountName.blob.core.windows.net/test/blob", "PUT", key,
headers = c("x-ms-blob-type"="BlockBlob"), requestBody = "Hej världen!") #upload_file("blob.txt"))
# List all blob in the container 'test'
azureBlobCall("https://accountName.blob.core.windows.net/test?comp=list&restype=container", "GET", key)
# deletes the blobl named 'blob'
azureBlobCall("https://accountName.blob.core.windows.net/test/blob", "DELETE", key)
# Creates a blob named 'blob' under container 'test' with and upload the file 'blob.txt'
azureBlobCall("https://accountName.blob.core.windows.net/test/blob", "PUT", key,
headers = c("x-ms-blob-type"="BlockBlob"), requestBody = upload_file("blob.txt"))
# deletes the container named 'test'
azureBlobCall("https://accountName.blob.core.windows.net/test?restype=container", "DELETE", key)

Related

httr POST request error: upstream connect error or disconnect/reset before headers. reset reason: connection termination

I am currently trying to upload a CSV file to a proprietary service via httr::POST(). Unfortunately the Admins are not experienced in R and can give only little support.
This is an example on how it should look like in the command line:
curl -X POST --header 'Content-Type: multipart/form-data' \
-F file=#"member-1-test.csv.gz" 'https://some/api/endpoint'
So, in the following code I just try to stick with the example (and additionally provide a token).
> library(tidyverse)
> library(httr)
# Provide some test data with characters specifically quoted
> test_file <- tibble::tribble(
~keytype, ~key, ~action, ~segment,
6,"\"https://www.google.com\"", 0, 37372818,
6,"\"https://www.sport1.de\"" , 0, 37372818
)
> data.table::fwrite(test_file, "test.csv", quote = FALSE)
> file <- upload_file(path = "C:/R/projects/DefaultWorkingDirectory/test.csv")
> res <- POST(
url = "https://some/api/endpoint",
body = list(file = file),
add_headers(.headers = c('Content-Type' = "multipart/form-data", "Authorization" = token))
)
This gives me the follwing error:
> res
Response [https://some/api/endpoint]
Date: 2020-11-18 09:35
Status: 503
Content-Type: text/plain
Size: 95 B
> content(res, encoding = "UTF8")
"upstream connect error or disconnect/reset before headers. reset reason: connection termination"
Any help or guidance on how to move forward with this issue is very much appreciated. Thanks!
Directly after posting the question it was already solved by one of the Admins :)
The issue was that the encoding needs to be set to "multipart" and that a specific content type needs to be provided which is similar to JSON but needs to be added to the "Accept"-header field.
Here the answer for any future references:
> res <- POST(
url = "https://some/api/endpoint",
body = list(
file = upload_file("test.csv")
),
add_headers(c(
"Authorization" = token,
"Accept" = "specific_format_for_the_application"
)),
encode = "multipart",
verbose()
)

Connecting to Azure Table Storage in R

I've been trying to connect to Azure Table Storage in R. Google Searching has returned nothing on people using R to connect to the Rest APIs for table storage. The documentation is here. I've tried taking an existing question about blob storage to connect( I couldn't connect to even a blob using this) and re working it for table storage queries. Below:
library(httr)
url <- "https://rpoc.table.core.windows.net:443/dummytable(PartitionKey='0dfe725b-bd43-4d9d-b58a-90654d1d8741',RowKey='00b7595d-97c3-4f29-93de-c1146bcd3d33')?$select=<comma-separated-property-names>"
sak<-"u4RzASEJ3qbxSpf5VL1nY08MwRz4VKJXsyYKV2wSFlhf/1ZYV6eGkKD3UALSblXsloCs8k4lvCS6sDE9wfVIDg=="
requestdate<- http_date(Sys.time())
signaturestring<-paste0("GET",paste(rep("\n",12),collapse=""),
"x-ms-date:",requestdate,"
x-ms-version:2015-12-11")
headerstuff<-add_headers(Authorization=paste0("SharedKey rpoc:",
RCurl::base64(digest::hmac(key=RCurl::base64Decode(sak, mode="raw"),
object=enc2utf8(signaturestring),
algo= "sha256", raw=TRUE))),
`x-ms-date`=requestdate,
`x-ms-version`= "2015-12-11",
`DataServiceVersion` = "3.0;NetFx",
`MaxDataServiceVersion` = "3.0;NetFx" )
content(GET(url,config = headerstuff, verbose() ))
Console output:
-> GET /dummytable(PartitionKey='0dfe725b-bd43-4d9d-b58a-90654d1d8741',RowKey='00b7595d-97c3-4f29-93de-c1146bcd3d33')?$select=<comma-separated-property-names> HTTP/1.1
-> Host: rpoc.table.core.windows.net
-> User-Agent: libcurl/7.53.1 r-curl/2.6 httr/1.2.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Authorization: SharedKey rpoc:nQWNoPc1l/kXydUw4rNq8MBIf/arJXkI3jZv+NttqMs=
-> x-ms-date: Mon, 24 Jul 2017 18:49:52 GMT
-> x-ms-version: 2015-12-11
-> DataServiceVersion: 3.0;NetFx
-> MaxDataServiceVersion: 3.0;NetFx
->
<- HTTP/1.1 403 Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
<- Content-Length: 299
<- Content-Type: application/json
<- Server: Microsoft-HTTPAPI/2.0
<- x-ms-request-id: 2c74433e-0002-00b3-5aad-04d4db000000
<- Date: Mon, 24 Jul 2017 18:49:53 GMT
<-
$odata.error
$odata.error$code
[1] "AuthenticationFailed"
$odata.error$message
$odata.error$message$lang
[1] "en-US"
$odata.error$message$value
[1] "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:2c74433e-0002-00b3-5aad-04d4db000000\nTime:2017-07-24T18:49:54.3878127Z"
The issue looks to be the authentication headers. Any help on how I could resolve this would appreciated. I'm really surprised more people don't use ATS with R since its so versatile.
I based my solution in PUT blob question (Azure PUT Blob authentication fails in R), then I adapted to use GET instead of PUT and table instead of blob.
library(httr)
account <- "account"
container <- "container"
key <- "u4RzASEJ..9wfVIDg=="
url <- paste0("https://", account, ".table.core.windows.net/", container)
requestdate <- format(Sys.time(),"%a, %d %b %Y %H:%M:%S %Z", tz="GMT")
content_length <- 0
signature_string <- paste0("GET", "\n", # HTTP Verb
"\n", # Content-MD5
"text/plain", "\n", # Content-Type
requestdate, "\n", # Date
# Here comes the Canonicalized Resource
"/",account, "/",container)
headerstuff <- add_headers(Authorization=paste0("SharedKey ",account,":",
RCurl::base64(digest::hmac(key =
RCurl::base64Decode(key, mode = "raw"),
object = enc2utf8(signature_string),
algo = "sha256", raw = TRUE))),
`x-ms-date`= requestdate,
`x-ms-version`= "2015-02-21",
`Content-Type`="text/plain")
xml_body = content(GET(url, config = headerstuff, verbose()))
According to the REST reference for the Authentication of Azure Storage, based on your error information & code, the issue AuthenticationFailed should be caused by the incorrect signature string for Table Service without 12 repeat symbol \n, which is different from that for Blob, Queue and File services. Please see the reference Authentication for the Azure Storage Services carefully to know the difference format for Table service, as below.
Table Service (Shared Key Authentication)
StringToSign = VERB + "\n" +
Content-MD5 + "\n" +
Content-Type + "\n" +
Date + "\n" +
CanonicalizedResource;
Table Service (Shared Key Lite Authentication)
StringToSign = Date + "\n"
CanonicalizedResource
Hope it helps.
Somewhat late to the party, but: there is now an AzureTableStor package, which is also on CRAN.
library(AzureTableStor)
# storage account endpoint
endp <- table_endpoint("https://mystorageacct.table.core.windows.net", key="mykey")
# Cosmos DB w/table API endpoint
endp <- table_endpoint("https://mycosmosdb.table.cosmos.azure.com:443", key="mykey")
list_storage_tables(endp)
tab <- storage_table(endp, "mytable")
insert_table_entity(tab, list(
RowKey="row1",
PartitionKey="partition1",
firstname="Bill",
lastname="Gates"
))
get_table_entity(tab, "row1", "partition1")
Disclaimer: I'm the developer of this package.

Azure PUT Blob authentication fails in R

I would like to use R and the Azure Storage's Put Blob API to put files into my blob storage account but it fails to authenticate my request. Unfortunately, I couldn't find any documentation or sample code for R. General documentation of Put Blob API:
https://learn.microsoft.com/en-us/rest/api/storageservices/put-blob
Here is the code that I tried to use:
library(httr)
account <- "myAccount"
container <- "myContainer"
filename <- "test.txt"
key <- "primaryKey"
object <- "Hello World"
url <- paste0("https://", account, ".blob.core.windows.net/", container, "/", filename)
requestdate <- format(Sys.time(),"%a, %d %b %Y %H:%M:%S %Z", tz="GMT")
content_length <- nchar(object, type = "bytes")
signature_string <- paste0("PUT", "\n", "\n", "\n",
content_length, "\n",
"\n",
"x-ms-date:",requestdate, "\n",
"x-ms-version:2015-02-21", "\n",
"x-ms-blob-type:BlockBlob", "\n",
"Content-Type:text/plain", "\n",
"\n",
"x-ms-blob-content-dis filename=", filename, "\n",
"\n",
"/",account, "/",container,"/", filename)
headerstuff <- add_headers(Authorization=paste0("SharedKey ",account,":",
RCurl::base64(digest::hmac(key =
RCurl::base64Decode(key, mode = "raw"),
object = enc2utf8(signature_string),
algo = "sha256", raw = TRUE))),
`Content-Length` = content_length,
`x-ms-date`= requestdate,
`x-ms-version`= "2015-02-21",
`x-ms-blob-type`="BlockBlob",
`Content-Type`="text/plain")
content(PUT(url, config = headerstuff, body = object, verbose()), as = "text")`
Request it sends:
-> PUT /myContainer/test.txt HTTP/1.1
-> Host: myAccount.blob.core.windows.net
-> User-Agent: libcurl/7.49.1 r-curl/2.3 httr/1.2.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Authorization: SharedKey myAccount:hashedSignatureString
-> Content-Length: 11
-> x-ms-date: Tue, 13 Jun 2017 08:50:38 GMT
-> x-ms-version: 2015-02-21
-> x-ms-blob-type: BlockBlob
-> Content-Type: text/plain
->
>> Hello World
Response:
<- HTTP/1.1 403 Server failed to authenticate the request. Make sure the
value of Authorization header is formed correctly including the signature.
<- Content-Length: 693
<- Content-Type: application/xml
<- Server: Microsoft-HTTPAPI/2.0
<- x-ms-request-id: efc2c8de-0001-00a9-3d21-e41b06000000
<- Date: Tue, 13 Jun 2017 08:48:56 GMT
I tried the same with the List Blobs API (with some minor changes in the formatting of the headers) and it works well, but I can't make it work with Put Blob.
List Blob solution from here: https://stackoverflow.com/a/29286040/8085694
Could you please provide some sample R code for Authentication header creation at Put Blob or help me resolve this issue?
Also, if I go further, is it possible somehow to upload R objects as blobs to the storage?
Thanks in advance,
Gábor
I managed to resolve this issue by putting the "\n" characters and everything in the right place.
Based on Gaurav Mantri's help, I used:
https://learn.microsoft.com/en-us/rest/api/storageservices/authentication-for-the-azure-storage-services
The following changes in the 'signature_string' worked:
signature_string <- paste0("PUT", "\n", # HTTP Verb
"\n", # Content-Encoding
"\n", # Content-Language
content_length, "\n", # Content-Length
"\n", # Content-MD5
"text/plain", "\n", # Content-Type
"\n", # Date
"\n", # If-Modified-Since
"\n", # If-Match
"\n", # If-None-Match
"\n", # If-Unmodified-Since
"\n", # Range
# Here comes the Canonicalized Headers
"x-ms-blob-type:BlockBlob","\n",
"x-ms-date:",requestdate,"\n",
"x-ms-version:2015-02-21","\n",
# Here comes the Canonicalized Resource
"/",account, "/",container,"/", filename)
There is an Azure offical R package Microsoft/AzureSMR on GitHub, which can help you easier using R & Azure Blob Storage, you can refer to its tutorial to know more details.
If you just want to use some Azure services like Blob Storage, not else, I think some source codes of this project are very valuable for rebuilding your code better, such as createAzureStorageSignature method which can directly help building the signature to resolve your issue.

authentication to github private repositories with httr

I am trying to access a private repository on Github using httr. I am able to do so with no problem if I add my github token (stored as an environment variable in GITHUB_TOKEN):
httr::GET("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674",
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Authorization = paste("token", Sys.getenv("GITHUB_TOKEN"))))
However, if I try to specify another header, I get an error. In this case, I want to download the binary file associated with a release (the "asset", in github terminology):
httr::GET("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674",
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Authorization = paste("token", Sys.getenv("GITHUB_TOKEN"))),
httr::add_headers(Accept = "application/octet-stream"))
?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidArgument</Code><Message>Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified</Message>
That's only part of the message (the rest includes my token).
Apparently my authorization is being sent twice! How can I prevent this? Is it related to httr::handle_pool()
EDIT -- connection info
It appears that the original request receives a reply, which contains a signature. This signature, along with my token is then sent back, causing an error. A similar thing happened to these people
-> GET /repos/aammd/miniature-meme/releases/assets/2859674 HTTP/1.1
-> Host: api.github.com
-> User-Agent: libcurl/7.43.0 r-curl/2.3 httr/1.2.1.9000
-> Accept-Encoding: gzip, deflate
-> Authorization: token tttttttt
-> Accept: application/octet-stream
->
<- HTTP/1.1 302 Found
<- Server: GitHub.com
<- Date: Tue, 17 Jan 2017 13:28:12 GMT
<- Content-Type: text/html;charset=utf-8
<- Content-Length: 0
<- Status: 302 Found
<- X-RateLimit-Limit: 5000
<- X-RateLimit-Remaining: 4984
<- X-RateLimit-Reset: 1484662101
<- location: https://github-cloud.s3.amazonaws.com/releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170117%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170117T132812Z&X-Amz-Expires=300&X-Amz-Signature=ssssssssss&X-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream
<- Access-Control-Expose-Headers: ETag, Link, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval
<- Access-Control-Allow-Origin: *
<- Content-Security-Policy: default-src 'none'
<- Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
<- X-Content-Type-Options: nosniff
<- X-Frame-Options: deny
<- X-XSS-Protection: 1; mode=block
<- Vary: Accept-Encoding
<- X-Served-By: 3e3b9690823fb031da84658eb58aa83b
<- X-GitHub-Request-Id: 82782802:6E1B:E9F0BE:587E1BEC
<-
-> GET /releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170117%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170117T132812Z&X-Amz-Expires=300&X-Amz-Signature=sssssssssssssss&X-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream HTTP/1.1
-> Host: github-cloud.s3.amazonaws.com
-> User-Agent: libcurl/7.43.0 r-curl/2.3 httr/1.2.1.9000
-> Accept-Encoding: gzip, deflate
-> Authorization: token ttttttttttttt
-> Accept: application/octet-stream
->
<- HTTP/1.1 400 Bad Request
<- x-amz-request-id: FA56B3D23B468704
<- x-amz-id-2: 49X1mT5j5BrZ4HApeR/+wb7iVOWA8yn1obrgMoeOy44RH414bo/Ov8AAWSx2baEXO0H/WHX5jK0=
<- Content-Type: application/xml
<- Transfer-Encoding: chunked
<- Date: Tue, 17 Jan 2017 13:28:12 GMT
<- Connection: close
<- Server: AmazonS3
<-
gh doesn't work either
I created a public repo to test this idea out. the JSON can be returned from the API, but not the binary file:
# this works fine
gh::gh("https://api.github.com/repos/aammd/test_idea/releases/assets/2998763")
# this does not
gh::gh("https://api.github.com/repos/aammd/test_idea/releases/assets/2998763", .send_headers = c("Accept" = "application/octet-stream"))
wget might work, however
I've found a gist that shows how to do this with wget. The key component seems to be:
wget -q --auth-no-challenge --header='Accept:application/octet-stream' \
https://$TOKEN:#api.github.com/repos/$REPO/releases/assets/$asset_id \
-O $2
However if I try to replicate that in httr::GET I am not successful:
auth_url <- sprintf("https://%s:#api.github.com/repos/aammd/miniature-meme/releases/assets/2859674", Sys.getenv("GITHUB_TOKEN"))
httr::GET(auth_url,
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Accept = "application/octet-stream"))
Calling wget from R DOES work, but this solution is not totally satisfying because I can't guarantee that all my users have wget installed (unless there is a way to do that?).
system(sprintf("wget --auth-no-challenge --header='Accept:application/octet-stream' %s -O testwget.rds", auth_url))
output of wget (note the absence of -q above) included here (again, tokens and signatures redacted, hopefully):
--2017-01-18 13:21:55-- https://ttttt:*password*#api.github.com/repos/aammd/miniature-meme/releases/assets/2859674
Resolving api.github.com... 192.30.253.117, 192.30.253.116
Connecting to api.github.com|192.30.253.117|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-cloud.s3.amazonaws.com/releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170118T122156Z&X-Amz-Expires=300&X-Amz-Signature=SSSSSSSS-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream [following]
--2017-01-18 13:21:55-- https://github-cloud.s3.amazonaws.com/releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170118T122156Z&X-Amz-Expires=300&X-Amz-Signature=SSSSSSSSSSSS-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream
Resolving github-cloud.s3.amazonaws.com... 52.216.226.120
Connecting to github-cloud.s3.amazonaws.com|52.216.226.120|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 682 [application/octet-stream]
Saving to: ‘testwget.rds’
0K 100% 15.5M=0s
2017-01-18 13:21:56 (15.5 MB/s) - ‘testwget.rds’ saved [682/682]
It turns out that there are two possible solutions to this problem!
solution the first: token as parameter
As suggested by #user7433058, we can indeed pass the token through as a parameter! note however that we have to use paste0. This is the approach suggested by Github themselves on their API documentation
## pass oauth in the url
httr::GET(paste0("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674?access_token=", Sys.getenv("GITHUB_TOKEN")),
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Accept = "application/octet-stream"))
tt <- readRDS("test.rds")
Solution the second: ask again
Another solution is to make the request the first time, then extract the URL and use it to make a second request. Since the problem is caused by sending Authorization information twice -- once in the URL, once in the header -- we can avoid the problem by only using the URL.
## alternatively, get the query url (containing signature) from the (failed) html request made the first time
firsttry <- httr::GET("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674",
httr::add_headers(Authorization = paste("token", Sys.getenv("GITHUB_TOKEN")),
Accept = "application/octet-stream"))
httr::GET(firsttry$url, httr::write_disk("test.rds", overwrite = TRUE),
httr::write_disk("test2.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Accept = "application/octet-stream"))
tt2 <- readRDS("test2.rds")
This is, I suppose, a bit less efficient (making 3 requests total instead of 2). However, since only the first request is to the actual github API, it only counts for 1 towards your rate-limiting step.
a small refinement: no redirect from httr
We can make only 2, not 3, http requests if you tell httr not to follow redirects. To do this use httr::config(followlocation = FALSE) in the first of the two requests (i.e. to get firsttry)
Try sending the auth token as a query param instead of an auth header. That way when GitHub's Oauth redirects you it'll strip the original token & the X-Amz-Algorithm param will be left to do it's job.
httr::GET(paste("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674?access_token=", Sys.getenv("GITHUB_TOKEN")),
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"))

How to POST multipart/related content with httr (for Google Drive API)

I got simple file uploads to Google Drive working using httr. The problem is that every document is uploaded as "untitled", and I have to PATCH the metadata to set the title. The PATCH request occasionally fails.
According to the API, I ought to be able to do a multipart upload, allowing me to specify the title as part of the same POST request that uploads the file.
res<-POST(
"https://www.googleapis.com/upload/drive/v2/files?convert=true",
config(token=google_token),
body=list(y=upload_file(file))
)
id<-fromJSON(rawToChar(res$content))$id
if(is.null(id)) stop("Upload failed")
url<-paste(
"https://www.googleapis.com/drive/v2/files/",
id,
sep=""
)
title<-strsplit(basename(file), "\\.")[[1]][1]
Sys.sleep(2)
res<-PATCH(url,
config(token=google_token),
body=paste('{"title": "',title,'"}', sep = ""),
add_headers("Content-Type" = "application/json; charset=UTF-8")
)
stopifnot(res$status_code==200)
cat(id)
What I'd like to do is something like this:
res<-POST(
"https://www.googleapis.com/upload/drive/v2/files?uploadType=multipart&convert=true",
config(token=google_token),
body=list(y=upload_file(file),
#add_headers("Content-Disposition" = "text/json"),
json=toJSON(data.frame(title))
),
encode="multipart",
add_headers("Content-Type" = "multipart/related"),
verbose()
)
The output I get shows that the content encoding of the individual parts is wrong, and it results in a 400 error:
-> POST /upload/drive/v2/files?uploadType=multipart&convert=true HTTP/1.1
-> User-Agent: curl/7.19.7 Rcurl/1.96.0 httr/0.6.1
-> Host: www.googleapis.com
-> Accept-Encoding: gzip
-> Accept: application/json, text/xml, application/xml, */*
-> Authorization: Bearer ya29.ngGLGA9iiOrEFt0ycMkPw7CZq23e6Dgx3Syjt3SXwJaQuH4B6dkDdFXyIC6roij2se7Fs-Ue_A9lfw
-> Content-Length: 371
-> Expect: 100-continue
-> Content-Type: multipart/related; boundary=----------------------------938934c053c6
->
<- HTTP/1.1 100 Continue
>> ------------------------------938934c053c6
>> Content-Disposition: form-data; name="y"; filename="db_biggest_tables.csv"
>> Content-Type: application/octet-stream
>>
>> table rows DATA idx total_size idxfrac
>>
>> ------------------------------938934c053c6
>> Content-Disposition: form-data; name="json"
>>
>> {"title":"db_biggest_tables"}
>> ------------------------------938934c053c6--
<- HTTP/1.1 400 Bad Request
<- Vary: Origin
<- Vary: X-Origin
<- Content-Type: application/json; charset=UTF-8
<- Content-Length: 259
<- Date: Fri, 26 Jun 2015 18:50:38 GMT
<- Server: UploadServer
<- Alternate-Protocol: 443:quic,p=1
<-
Is there any way to set the content encoding properly for individual parts? The second part should be "text/json", for example.
I have been through R documentation, Hadley's httr project pages at Github, this site and some general googling. I can't find any examples of how to do a multipart upload and set content-encoding.
You shoud be able to do this using curl::form_file or its alias httr::upload_file. See also the curl vignette. Following the example from the Google API doc:
library(httr)
media <- tempfile()
png(media, with = 800, height = 600)
plot(cars)
dev.off()
metadata <- tempfile()
writeLines(jsonlite::toJSON(list(title = unbox("My file"))), metadata)
#post
req <- POST("https://httpbin.org/post",
body = list(
metadata = upload_file(metadata, type = "application/json; charset=UTF-8"),
media = upload_file(media, type = "image/png")
),
add_headers("Content-Type" = "multipart/related"),
verbose()
)
unlink(media)
unlink(metadata)
The only difference here is that curl will automatically add a Content-Disposition header for each file, which is required for multipart/form-data but not for multipart/related. The server will probably just ignore this redundant header in this case.
For now there is no way to accomplish this without writing the content to a file. Perhaps we could add something like that in a future version of httr/curl, although this has not come up before.

Resources