authentication to github private repositories with httr - r

I am trying to access a private repository on Github using httr. I am able to do so with no problem if I add my github token (stored as an environment variable in GITHUB_TOKEN):
httr::GET("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674",
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Authorization = paste("token", Sys.getenv("GITHUB_TOKEN"))))
However, if I try to specify another header, I get an error. In this case, I want to download the binary file associated with a release (the "asset", in github terminology):
httr::GET("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674",
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Authorization = paste("token", Sys.getenv("GITHUB_TOKEN"))),
httr::add_headers(Accept = "application/octet-stream"))
?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidArgument</Code><Message>Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified</Message>
That's only part of the message (the rest includes my token).
Apparently my authorization is being sent twice! How can I prevent this? Is it related to httr::handle_pool()
EDIT -- connection info
It appears that the original request receives a reply, which contains a signature. This signature, along with my token is then sent back, causing an error. A similar thing happened to these people
-> GET /repos/aammd/miniature-meme/releases/assets/2859674 HTTP/1.1
-> Host: api.github.com
-> User-Agent: libcurl/7.43.0 r-curl/2.3 httr/1.2.1.9000
-> Accept-Encoding: gzip, deflate
-> Authorization: token tttttttt
-> Accept: application/octet-stream
->
<- HTTP/1.1 302 Found
<- Server: GitHub.com
<- Date: Tue, 17 Jan 2017 13:28:12 GMT
<- Content-Type: text/html;charset=utf-8
<- Content-Length: 0
<- Status: 302 Found
<- X-RateLimit-Limit: 5000
<- X-RateLimit-Remaining: 4984
<- X-RateLimit-Reset: 1484662101
<- location: https://github-cloud.s3.amazonaws.com/releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170117%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170117T132812Z&X-Amz-Expires=300&X-Amz-Signature=ssssssssss&X-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream
<- Access-Control-Expose-Headers: ETag, Link, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval
<- Access-Control-Allow-Origin: *
<- Content-Security-Policy: default-src 'none'
<- Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
<- X-Content-Type-Options: nosniff
<- X-Frame-Options: deny
<- X-XSS-Protection: 1; mode=block
<- Vary: Accept-Encoding
<- X-Served-By: 3e3b9690823fb031da84658eb58aa83b
<- X-GitHub-Request-Id: 82782802:6E1B:E9F0BE:587E1BEC
<-
-> GET /releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170117%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170117T132812Z&X-Amz-Expires=300&X-Amz-Signature=sssssssssssssss&X-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream HTTP/1.1
-> Host: github-cloud.s3.amazonaws.com
-> User-Agent: libcurl/7.43.0 r-curl/2.3 httr/1.2.1.9000
-> Accept-Encoding: gzip, deflate
-> Authorization: token ttttttttttttt
-> Accept: application/octet-stream
->
<- HTTP/1.1 400 Bad Request
<- x-amz-request-id: FA56B3D23B468704
<- x-amz-id-2: 49X1mT5j5BrZ4HApeR/+wb7iVOWA8yn1obrgMoeOy44RH414bo/Ov8AAWSx2baEXO0H/WHX5jK0=
<- Content-Type: application/xml
<- Transfer-Encoding: chunked
<- Date: Tue, 17 Jan 2017 13:28:12 GMT
<- Connection: close
<- Server: AmazonS3
<-
gh doesn't work either
I created a public repo to test this idea out. the JSON can be returned from the API, but not the binary file:
# this works fine
gh::gh("https://api.github.com/repos/aammd/test_idea/releases/assets/2998763")
# this does not
gh::gh("https://api.github.com/repos/aammd/test_idea/releases/assets/2998763", .send_headers = c("Accept" = "application/octet-stream"))
wget might work, however
I've found a gist that shows how to do this with wget. The key component seems to be:
wget -q --auth-no-challenge --header='Accept:application/octet-stream' \
https://$TOKEN:#api.github.com/repos/$REPO/releases/assets/$asset_id \
-O $2
However if I try to replicate that in httr::GET I am not successful:
auth_url <- sprintf("https://%s:#api.github.com/repos/aammd/miniature-meme/releases/assets/2859674", Sys.getenv("GITHUB_TOKEN"))
httr::GET(auth_url,
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Accept = "application/octet-stream"))
Calling wget from R DOES work, but this solution is not totally satisfying because I can't guarantee that all my users have wget installed (unless there is a way to do that?).
system(sprintf("wget --auth-no-challenge --header='Accept:application/octet-stream' %s -O testwget.rds", auth_url))
output of wget (note the absence of -q above) included here (again, tokens and signatures redacted, hopefully):
--2017-01-18 13:21:55-- https://ttttt:*password*#api.github.com/repos/aammd/miniature-meme/releases/assets/2859674
Resolving api.github.com... 192.30.253.117, 192.30.253.116
Connecting to api.github.com|192.30.253.117|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-cloud.s3.amazonaws.com/releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170118T122156Z&X-Amz-Expires=300&X-Amz-Signature=SSSSSSSS-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream [following]
--2017-01-18 13:21:55-- https://github-cloud.s3.amazonaws.com/releases/76993567/aee5d0d6-c70a-11e6-9078-b5bee39f9fbc.RDS?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20170118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170118T122156Z&X-Amz-Expires=300&X-Amz-Signature=SSSSSSSSSSSS-Amz-SignedHeaders=host&actor_id=1198242&response-content-disposition=attachment%3B%20filename%3Dff.RDS&response-content-type=application%2Foctet-stream
Resolving github-cloud.s3.amazonaws.com... 52.216.226.120
Connecting to github-cloud.s3.amazonaws.com|52.216.226.120|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 682 [application/octet-stream]
Saving to: ‘testwget.rds’
0K 100% 15.5M=0s
2017-01-18 13:21:56 (15.5 MB/s) - ‘testwget.rds’ saved [682/682]

It turns out that there are two possible solutions to this problem!
solution the first: token as parameter
As suggested by #user7433058, we can indeed pass the token through as a parameter! note however that we have to use paste0. This is the approach suggested by Github themselves on their API documentation
## pass oauth in the url
httr::GET(paste0("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674?access_token=", Sys.getenv("GITHUB_TOKEN")),
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Accept = "application/octet-stream"))
tt <- readRDS("test.rds")
Solution the second: ask again
Another solution is to make the request the first time, then extract the URL and use it to make a second request. Since the problem is caused by sending Authorization information twice -- once in the URL, once in the header -- we can avoid the problem by only using the URL.
## alternatively, get the query url (containing signature) from the (failed) html request made the first time
firsttry <- httr::GET("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674",
httr::add_headers(Authorization = paste("token", Sys.getenv("GITHUB_TOKEN")),
Accept = "application/octet-stream"))
httr::GET(firsttry$url, httr::write_disk("test.rds", overwrite = TRUE),
httr::write_disk("test2.rds", overwrite = TRUE),
httr::progress("down"),
httr::add_headers(Accept = "application/octet-stream"))
tt2 <- readRDS("test2.rds")
This is, I suppose, a bit less efficient (making 3 requests total instead of 2). However, since only the first request is to the actual github API, it only counts for 1 towards your rate-limiting step.
a small refinement: no redirect from httr
We can make only 2, not 3, http requests if you tell httr not to follow redirects. To do this use httr::config(followlocation = FALSE) in the first of the two requests (i.e. to get firsttry)

Try sending the auth token as a query param instead of an auth header. That way when GitHub's Oauth redirects you it'll strip the original token & the X-Amz-Algorithm param will be left to do it's job.
httr::GET(paste("https://api.github.com/repos/aammd/miniature-meme/releases/assets/2859674?access_token=", Sys.getenv("GITHUB_TOKEN")),
httr::write_disk("test.rds", overwrite = TRUE),
httr::progress("down"))

Related

R RestRserve Add Etag to static path

Let's say I have a REST API using RestRserve like this, is there a way to add an Etag to enable caching on cloud services?
writeLines("Hello World", "myfile.txt")
app <- Application$new(content_type = "application/json")
app$add_static("/", ".")
backend <- BackendRserve$new()
# backend$start(app, http_port = 8080)
req <- Request$new(path = "/myfile.txt", method = "GET")
app$process_request(req)$headers
#> $Server
#> [1] "RestRserve/0.4.1001"
As we see, there is no Etag.
Example using Go fiber
Using GO fiber, I would use it like this:
package main
import (
"flag"
"log"
"github.com/gofiber/fiber/v2"
"github.com/gofiber/fiber/v2/middleware/etag"
)
var (
port = flag.String("port", ":3000", "Port to listen on")
)
func main() {
app := fiber.New()
app.Use(etag.New())
app.Static("/", ".")
log.Fatal(app.Listen(*port))
}
and then querying localhost:3000/myfile.txt I would see headers like this
HTTP/1.1 200 OK
Date: Fri, 18 Mar 2022 13:13:44 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 12
Last-Modified: Fri, 21 Jan 2022 16:24:47 GMT
Etag: "12-823400506"
Connection: close
Hello World
Is there a way to add Etag headers to static files using RestRserve?
As of RestRserve version 1.1.1 (on CRAN), there is an ETag Middleware class.
Use it like so:
# ... code like before
etag <- ETagMiddleware$new()
app$append_middleware(etag)
# ...
See also https://restrserve.org/reference/ETagMiddleware.html

Creating a rvest::html_session to a servr::httd http server

I'm working on a project that requires accessing webpages and I do this via
rvest::html_session(). For documentation and training I would like to set
up a reproducible example and have considered the following.
Use servr::httd(system.file("egwebsite", package = "<pkgname>"), daemon =
TRUE, browser = FALSE) to set up a simple HTTP server
Use rvest::html_session("http://127.0.0.1:4321") to set up the html
session.
However, the following simple example behaves differently on Linux (Debian 9)
and Windows 10. (I do not have easy access to OSx and have not tested on
that OS).
# On Windows
servr::httd(daemon = TRUE, browser = FALSE, port = 4321)
## Serving the directory /home/dewittpe/so/my-servr-question at http://127.0.0.1:4321
## To stop the server, run servr::daemon_stop("94019719908480") or restart your R session
R.utils::withTimeout(
{
s <- rvest::html_session("http://127.0.0.1:4321")
},
timeout = 3,
onTimeout = "error")
s
## <session> http://127.0.0.1:4321/
## Status: 200
## Type: text/html
## Size: 2352
servr::daemon_stop()
However, on my Linux box (Debian 9) I get the following
servr::httd(daemon = TRUE, browser = FALSE, port = 4321)
## Serving the directory /home/dewittpe/so/my-servr-question at http://127.0.0.1:4321
## To stop the server, run servr::daemon_stop("94019719908480") or restart your R session
R.utils::withTimeout(
{
s <- rvest::html_session("http://127.0.0.1:4321")
},
timeout = 3,
onTimeout = "error")
## Error: reached elapsed time limit
## Error in curl::curl_fetch_memory(url, handle = handle) :
## Operation was aborted by an application callback
That is, I am unable to create a html_session in the same R interactive
session that spawned the http server. If, however, I start a second R
session while the leaving the initial session running, I am able to create
the html_session without error.
What can I do so that I can create an html_session based on a servr::httd
HTTP server within the same R session on Linux?
Edit 1
If I add httr::verbose() to the html_session call I get the following when the session is created successfully. When the process hangs and fails to create the session the output stops on the last -> and none of the lines with <- are shown.
> s <- html_session("http://127.0.0.1:4321", httr::verbose())
-> GET / HTTP/1.1
-> Host: 127.0.0.1:4321
-> User-Agent: libcurl/7.52.1 r-curl/3.1 httr/1.3.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
->
<- HTTP/1.1 200 OK
<- Content-Type: text/html
<- Content-Length: 61303
<-
I have found a solution to my problem, run servr::httd in a subprocess. This solution requires the subprocess package.
First, a helper function R_binary will return the file path for the R binary on Windows or unix based OS.
R_binary <- function () {
R_exe <- ifelse (tolower(.Platform$OS.type) == "windows", "R.exe", "R")
return(file.path(R.home("bin"), R_exe))
}
Next, start R vanilla as a subprocess.
subR <- subprocess::spawn_process(R_binary(), c("--vanilla"))
Then start the HTTP server in the subprocess
subprocess::process_write(subR, 'servr::httd(".", browser = FALSE, port = 4321)\n')
## [1] 47
subprocess::process_read(subR)$stderr
## [1] "Serving the directory /home/dewittpe/so/my-servr-question at http://127.0.0.1:4321"
A quick test to show that there is communication between the active R session and the HTTP server:
session <- rvest::html_session("http://127.0.0.1:4321")
session
## <session> http://127.0.0.1:4321/
## Status: 200
## Type: text/html
## Size: 1054
And finally, kill the subprocess
subprocess::process_kill(subR)

Connecting to Azure Table Storage in R

I've been trying to connect to Azure Table Storage in R. Google Searching has returned nothing on people using R to connect to the Rest APIs for table storage. The documentation is here. I've tried taking an existing question about blob storage to connect( I couldn't connect to even a blob using this) and re working it for table storage queries. Below:
library(httr)
url <- "https://rpoc.table.core.windows.net:443/dummytable(PartitionKey='0dfe725b-bd43-4d9d-b58a-90654d1d8741',RowKey='00b7595d-97c3-4f29-93de-c1146bcd3d33')?$select=<comma-separated-property-names>"
sak<-"u4RzASEJ3qbxSpf5VL1nY08MwRz4VKJXsyYKV2wSFlhf/1ZYV6eGkKD3UALSblXsloCs8k4lvCS6sDE9wfVIDg=="
requestdate<- http_date(Sys.time())
signaturestring<-paste0("GET",paste(rep("\n",12),collapse=""),
"x-ms-date:",requestdate,"
x-ms-version:2015-12-11")
headerstuff<-add_headers(Authorization=paste0("SharedKey rpoc:",
RCurl::base64(digest::hmac(key=RCurl::base64Decode(sak, mode="raw"),
object=enc2utf8(signaturestring),
algo= "sha256", raw=TRUE))),
`x-ms-date`=requestdate,
`x-ms-version`= "2015-12-11",
`DataServiceVersion` = "3.0;NetFx",
`MaxDataServiceVersion` = "3.0;NetFx" )
content(GET(url,config = headerstuff, verbose() ))
Console output:
-> GET /dummytable(PartitionKey='0dfe725b-bd43-4d9d-b58a-90654d1d8741',RowKey='00b7595d-97c3-4f29-93de-c1146bcd3d33')?$select=<comma-separated-property-names> HTTP/1.1
-> Host: rpoc.table.core.windows.net
-> User-Agent: libcurl/7.53.1 r-curl/2.6 httr/1.2.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Authorization: SharedKey rpoc:nQWNoPc1l/kXydUw4rNq8MBIf/arJXkI3jZv+NttqMs=
-> x-ms-date: Mon, 24 Jul 2017 18:49:52 GMT
-> x-ms-version: 2015-12-11
-> DataServiceVersion: 3.0;NetFx
-> MaxDataServiceVersion: 3.0;NetFx
->
<- HTTP/1.1 403 Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
<- Content-Length: 299
<- Content-Type: application/json
<- Server: Microsoft-HTTPAPI/2.0
<- x-ms-request-id: 2c74433e-0002-00b3-5aad-04d4db000000
<- Date: Mon, 24 Jul 2017 18:49:53 GMT
<-
$odata.error
$odata.error$code
[1] "AuthenticationFailed"
$odata.error$message
$odata.error$message$lang
[1] "en-US"
$odata.error$message$value
[1] "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\nRequestId:2c74433e-0002-00b3-5aad-04d4db000000\nTime:2017-07-24T18:49:54.3878127Z"
The issue looks to be the authentication headers. Any help on how I could resolve this would appreciated. I'm really surprised more people don't use ATS with R since its so versatile.
I based my solution in PUT blob question (Azure PUT Blob authentication fails in R), then I adapted to use GET instead of PUT and table instead of blob.
library(httr)
account <- "account"
container <- "container"
key <- "u4RzASEJ..9wfVIDg=="
url <- paste0("https://", account, ".table.core.windows.net/", container)
requestdate <- format(Sys.time(),"%a, %d %b %Y %H:%M:%S %Z", tz="GMT")
content_length <- 0
signature_string <- paste0("GET", "\n", # HTTP Verb
"\n", # Content-MD5
"text/plain", "\n", # Content-Type
requestdate, "\n", # Date
# Here comes the Canonicalized Resource
"/",account, "/",container)
headerstuff <- add_headers(Authorization=paste0("SharedKey ",account,":",
RCurl::base64(digest::hmac(key =
RCurl::base64Decode(key, mode = "raw"),
object = enc2utf8(signature_string),
algo = "sha256", raw = TRUE))),
`x-ms-date`= requestdate,
`x-ms-version`= "2015-02-21",
`Content-Type`="text/plain")
xml_body = content(GET(url, config = headerstuff, verbose()))
According to the REST reference for the Authentication of Azure Storage, based on your error information & code, the issue AuthenticationFailed should be caused by the incorrect signature string for Table Service without 12 repeat symbol \n, which is different from that for Blob, Queue and File services. Please see the reference Authentication for the Azure Storage Services carefully to know the difference format for Table service, as below.
Table Service (Shared Key Authentication)
StringToSign = VERB + "\n" +
Content-MD5 + "\n" +
Content-Type + "\n" +
Date + "\n" +
CanonicalizedResource;
Table Service (Shared Key Lite Authentication)
StringToSign = Date + "\n"
CanonicalizedResource
Hope it helps.
Somewhat late to the party, but: there is now an AzureTableStor package, which is also on CRAN.
library(AzureTableStor)
# storage account endpoint
endp <- table_endpoint("https://mystorageacct.table.core.windows.net", key="mykey")
# Cosmos DB w/table API endpoint
endp <- table_endpoint("https://mycosmosdb.table.cosmos.azure.com:443", key="mykey")
list_storage_tables(endp)
tab <- storage_table(endp, "mytable")
insert_table_entity(tab, list(
RowKey="row1",
PartitionKey="partition1",
firstname="Bill",
lastname="Gates"
))
get_table_entity(tab, "row1", "partition1")
Disclaimer: I'm the developer of this package.

Azure PUT Blob authentication fails in R

I would like to use R and the Azure Storage's Put Blob API to put files into my blob storage account but it fails to authenticate my request. Unfortunately, I couldn't find any documentation or sample code for R. General documentation of Put Blob API:
https://learn.microsoft.com/en-us/rest/api/storageservices/put-blob
Here is the code that I tried to use:
library(httr)
account <- "myAccount"
container <- "myContainer"
filename <- "test.txt"
key <- "primaryKey"
object <- "Hello World"
url <- paste0("https://", account, ".blob.core.windows.net/", container, "/", filename)
requestdate <- format(Sys.time(),"%a, %d %b %Y %H:%M:%S %Z", tz="GMT")
content_length <- nchar(object, type = "bytes")
signature_string <- paste0("PUT", "\n", "\n", "\n",
content_length, "\n",
"\n",
"x-ms-date:",requestdate, "\n",
"x-ms-version:2015-02-21", "\n",
"x-ms-blob-type:BlockBlob", "\n",
"Content-Type:text/plain", "\n",
"\n",
"x-ms-blob-content-dis filename=", filename, "\n",
"\n",
"/",account, "/",container,"/", filename)
headerstuff <- add_headers(Authorization=paste0("SharedKey ",account,":",
RCurl::base64(digest::hmac(key =
RCurl::base64Decode(key, mode = "raw"),
object = enc2utf8(signature_string),
algo = "sha256", raw = TRUE))),
`Content-Length` = content_length,
`x-ms-date`= requestdate,
`x-ms-version`= "2015-02-21",
`x-ms-blob-type`="BlockBlob",
`Content-Type`="text/plain")
content(PUT(url, config = headerstuff, body = object, verbose()), as = "text")`
Request it sends:
-> PUT /myContainer/test.txt HTTP/1.1
-> Host: myAccount.blob.core.windows.net
-> User-Agent: libcurl/7.49.1 r-curl/2.3 httr/1.2.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Authorization: SharedKey myAccount:hashedSignatureString
-> Content-Length: 11
-> x-ms-date: Tue, 13 Jun 2017 08:50:38 GMT
-> x-ms-version: 2015-02-21
-> x-ms-blob-type: BlockBlob
-> Content-Type: text/plain
->
>> Hello World
Response:
<- HTTP/1.1 403 Server failed to authenticate the request. Make sure the
value of Authorization header is formed correctly including the signature.
<- Content-Length: 693
<- Content-Type: application/xml
<- Server: Microsoft-HTTPAPI/2.0
<- x-ms-request-id: efc2c8de-0001-00a9-3d21-e41b06000000
<- Date: Tue, 13 Jun 2017 08:48:56 GMT
I tried the same with the List Blobs API (with some minor changes in the formatting of the headers) and it works well, but I can't make it work with Put Blob.
List Blob solution from here: https://stackoverflow.com/a/29286040/8085694
Could you please provide some sample R code for Authentication header creation at Put Blob or help me resolve this issue?
Also, if I go further, is it possible somehow to upload R objects as blobs to the storage?
Thanks in advance,
Gábor
I managed to resolve this issue by putting the "\n" characters and everything in the right place.
Based on Gaurav Mantri's help, I used:
https://learn.microsoft.com/en-us/rest/api/storageservices/authentication-for-the-azure-storage-services
The following changes in the 'signature_string' worked:
signature_string <- paste0("PUT", "\n", # HTTP Verb
"\n", # Content-Encoding
"\n", # Content-Language
content_length, "\n", # Content-Length
"\n", # Content-MD5
"text/plain", "\n", # Content-Type
"\n", # Date
"\n", # If-Modified-Since
"\n", # If-Match
"\n", # If-None-Match
"\n", # If-Unmodified-Since
"\n", # Range
# Here comes the Canonicalized Headers
"x-ms-blob-type:BlockBlob","\n",
"x-ms-date:",requestdate,"\n",
"x-ms-version:2015-02-21","\n",
# Here comes the Canonicalized Resource
"/",account, "/",container,"/", filename)
There is an Azure offical R package Microsoft/AzureSMR on GitHub, which can help you easier using R & Azure Blob Storage, you can refer to its tutorial to know more details.
If you just want to use some Azure services like Blob Storage, not else, I think some source codes of this project are very valuable for rebuilding your code better, such as createAzureStorageSignature method which can directly help building the signature to resolve your issue.

How to POST multipart/related content with httr (for Google Drive API)

I got simple file uploads to Google Drive working using httr. The problem is that every document is uploaded as "untitled", and I have to PATCH the metadata to set the title. The PATCH request occasionally fails.
According to the API, I ought to be able to do a multipart upload, allowing me to specify the title as part of the same POST request that uploads the file.
res<-POST(
"https://www.googleapis.com/upload/drive/v2/files?convert=true",
config(token=google_token),
body=list(y=upload_file(file))
)
id<-fromJSON(rawToChar(res$content))$id
if(is.null(id)) stop("Upload failed")
url<-paste(
"https://www.googleapis.com/drive/v2/files/",
id,
sep=""
)
title<-strsplit(basename(file), "\\.")[[1]][1]
Sys.sleep(2)
res<-PATCH(url,
config(token=google_token),
body=paste('{"title": "',title,'"}', sep = ""),
add_headers("Content-Type" = "application/json; charset=UTF-8")
)
stopifnot(res$status_code==200)
cat(id)
What I'd like to do is something like this:
res<-POST(
"https://www.googleapis.com/upload/drive/v2/files?uploadType=multipart&convert=true",
config(token=google_token),
body=list(y=upload_file(file),
#add_headers("Content-Disposition" = "text/json"),
json=toJSON(data.frame(title))
),
encode="multipart",
add_headers("Content-Type" = "multipart/related"),
verbose()
)
The output I get shows that the content encoding of the individual parts is wrong, and it results in a 400 error:
-> POST /upload/drive/v2/files?uploadType=multipart&convert=true HTTP/1.1
-> User-Agent: curl/7.19.7 Rcurl/1.96.0 httr/0.6.1
-> Host: www.googleapis.com
-> Accept-Encoding: gzip
-> Accept: application/json, text/xml, application/xml, */*
-> Authorization: Bearer ya29.ngGLGA9iiOrEFt0ycMkPw7CZq23e6Dgx3Syjt3SXwJaQuH4B6dkDdFXyIC6roij2se7Fs-Ue_A9lfw
-> Content-Length: 371
-> Expect: 100-continue
-> Content-Type: multipart/related; boundary=----------------------------938934c053c6
->
<- HTTP/1.1 100 Continue
>> ------------------------------938934c053c6
>> Content-Disposition: form-data; name="y"; filename="db_biggest_tables.csv"
>> Content-Type: application/octet-stream
>>
>> table rows DATA idx total_size idxfrac
>>
>> ------------------------------938934c053c6
>> Content-Disposition: form-data; name="json"
>>
>> {"title":"db_biggest_tables"}
>> ------------------------------938934c053c6--
<- HTTP/1.1 400 Bad Request
<- Vary: Origin
<- Vary: X-Origin
<- Content-Type: application/json; charset=UTF-8
<- Content-Length: 259
<- Date: Fri, 26 Jun 2015 18:50:38 GMT
<- Server: UploadServer
<- Alternate-Protocol: 443:quic,p=1
<-
Is there any way to set the content encoding properly for individual parts? The second part should be "text/json", for example.
I have been through R documentation, Hadley's httr project pages at Github, this site and some general googling. I can't find any examples of how to do a multipart upload and set content-encoding.
You shoud be able to do this using curl::form_file or its alias httr::upload_file. See also the curl vignette. Following the example from the Google API doc:
library(httr)
media <- tempfile()
png(media, with = 800, height = 600)
plot(cars)
dev.off()
metadata <- tempfile()
writeLines(jsonlite::toJSON(list(title = unbox("My file"))), metadata)
#post
req <- POST("https://httpbin.org/post",
body = list(
metadata = upload_file(metadata, type = "application/json; charset=UTF-8"),
media = upload_file(media, type = "image/png")
),
add_headers("Content-Type" = "multipart/related"),
verbose()
)
unlink(media)
unlink(metadata)
The only difference here is that curl will automatically add a Content-Disposition header for each file, which is required for multipart/form-data but not for multipart/related. The server will probably just ignore this redundant header in this case.
For now there is no way to accomplish this without writing the content to a file. Perhaps we could add something like that in a future version of httr/curl, although this has not come up before.

Resources